[ensembl-dev] How to get the gene name and transcript name via MethodLinkSpeciesSet & GenomicAlignBlock?

Yan P. Yuan yuan at embl.de
Tue Jul 24 15:08:18 BST 2018


Hi Carla,

I meant that I've used the $slice->get_all_Genes in my other API scripts in the 
context of a chromosome as the slice so that they'd run in parallel over all 
chromosomes. What you mentioned is surely correct that 
$genomic_align->get_Slice->get_all_Genes() is directly applied to the 
corresponding genomic slice, rather to the whole chromosome.

OK, as I've interested in the orthologs' aligments, I'll try your mentioned method:

$genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss, 
$slice)

to retrieve their genomic aligments.

Thaks a lot & regards

Yan

On 24/07/18 15:23, Carla Cummins wrote:
>
> Hi Yan,
>
> Just to clarify, the method I mentioned fetches all genes for the range of the 
> genomic align object that you've called it on. So 
> $genomic_align->get_Slice->get_all_Genes() fetches all genes located on the 
> region covered by the $genomic_align, rather than the whole chromosome. A 
> slice simply refers to a region of genomic sequence.
>
> Regarding fetching genomic sequences for homologies and expanding to 
> surrounding sequence, it can be achieved like so (note: this a code snippet 
> and should be merged with the homolgy2.pl example):
>
> my $all_homologies = $homology_adaptor->fetch_all_by_MethodLinkSpeciesSet($this_mlss);
> foreach my $homology ( @$all_homologies ) {
>      my $gene_members = $homology->get_all_Members();
>      foreach my $gm ( @$gene_members ) {
> 	my $this_slice = $gm->get_Slice;
> 	# extend the slice by 100bp on 5' end and 200bp on 3' end
> 	my $expanded_slice = $this_slice->expand(100, 200);
>          print $gm->display_label . "\t" . $expanded_slice->seq . "\n";
>      }
> }
>
> Please note that this code will print the raw genomic sequence, not an aligned 
> sequence. There is also a method on the GenomicAlignBlockAdaptor object that 
> would allow you to fetch alignments for a given slice, meaning you could fetch 
> LASTZ genomic_align_blocks for each homology member slice above:
> $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss, 
> $slice);
>
>
> Hope this helps,
>
> Carla
>
> On 24/07/2018 13:41, Yan P. Yuan wrote:
>> Hi Carla,
>>
>> thank you for your reply.
>>
>> Yes, I'm actually interested more in the orthologs' CDS, sometimes also 
>> beyond their borders. The method you've mentioned was used actually to fetch 
>> all genes on a chromosome. Then after this, I'd then used the mentioned way 
>> to retrieve their genomic alignments.
>>
>> The example "homology2.pl" you've shown can be used to retrieve all ortholog 
>> pairs (esp. one-to-one's).  I saw this example before, but couldn't get it to 
>> retrieve their genomic sequences. Is there an access method in the object 
>> "Homology" to retrieve their genomic alignments (like what I've shown 
>> before), surrounding a certain region of a CDS?
>>
>> Best regards
>>
>> Yan
>>
>>
>> On 24/07/18 13:27, Carla Cummins wrote:
>>>
>>> Hi Yan,
>>>
>>> Since LASTZ alignments are on the whole genome, it's important to note that 
>>> genes may not be present on every part of the sequence. To see which gene(s) 
>>> are on a given aligned sequence region, you can use:
>>>
>>> $genomic_align->get_Slice->get_all_Genes();
>>>
>>> However, at the end of the message, you mention homologies. Homologies and 
>>> whole genome alignments are really separate objects in our system. Inference 
>>> of homologies is performed on a gene tree level and are not directly related 
>>> to the whole genome alignments 
>>> (http://www.ensembl.org/info/genome/compara/homology_types.html). Can you 
>>> confirm whether it is truly homologies that you are interested in, or 
>>> whether you are simply using the term to refer to well-aligned regions 
>>> (regardless of genes)? Here is a basic example of how to retrieve 
>>> human/mouse homologies using our API: 
>>> https://www.ebi.ac.uk/~muffato/workshops/2016_01_EBI/solutions_compara/homology2.pl
>>>
>>> Finally, since it's clear that there is some relationship between homology 
>>> and alignment (biologically speaking), we have a QC method for our 
>>> homologies that tests the quality of the alignment over a pair of homologous 
>>> genes (WGA score). This may be useful to you: 
>>> http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html
>>>
>>> Hope this helps?
>>>
>>> Best,
>>>
>>> Carla
>>>
>>>
>>> On 24/07/2018 12:00, Yan P. Yuan wrote:
>>>> Hi,
>>>>
>>>> Currently, I'm interested in genomic aligments between 2 species: homo 
>>>> sapience and mus musculus.
>>>>
>>>> If I use MethodLinkSpeciesSet-adaptor and GenomicAlignBlock-adaptor to 
>>>> fetch the genomic alignments, the output misses the corresponding gene 
>>>> names (or involved transcript names).
>>>>
>>>> Can someone point out how to get the gene names via the used methods? Or a 
>>>> different way to do it?
>>>>
>>>> Thanks a lot.
>>>>
>>>> Yan
>>>>
>>>> PS:
>>>>
>>>> The folllowing code was used:
>>>>
>>>>     use strict;
>>>>     use warnings;
>>>>
>>>>     use Bio::EnsEMBL::Registry;
>>>>     use Bio::AlignIO;
>>>>
>>>>     my $reg = 'Bio::EnsEMBL::Registry';
>>>>
>>>>     $reg->load_registry_from_db(
>>>>         -host=>'ensembldb.ensembl.org',
>>>>         -port=>5306,
>>>>         -user=>'anonymous',
>>>>     );
>>>>
>>>>
>>>>     my $mlss_adaptor = $reg->get_adaptor("Multi", "compara",
>>>>     "MethodLinkSpeciesSet");
>>>>
>>>>     my $mlss         =
>>>>     $mlss_adaptor->fetch_by_method_link_type_registry_aliases('LASTZ_NET',
>>>>     ['human', 'mouse']);
>>>>
>>>>     my $species_slice_adaptor       = $reg->get_adaptor("mouse", "core",
>>>>     "Slice");
>>>>
>>>>     my $genomic_align_block_adaptor = $reg->get_adaptor("Multi", "compara",
>>>>     "GenomicAlignBlock");
>>>>
>>>>     # Define a region to fetch
>>>>     my ($species_start, $species_end) = (143702058, 143702091);
>>>>     my $species_slice =
>>>>     $species_slice_adaptor->fetch_by_region("chromosome", 1,
>>>>     $species_start, $species_end, -1);
>>>>
>>>>     my $all_genomic_align_blocks =
>>>>     $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
>>>>     $species_slice);
>>>>
>>>>     my $alignIO = Bio::AlignIO->newFh(-interleaved=>0,
>>>>                       -fh=>\*STDOUT,
>>>>                       -format=>'clustalw',
>>>>                       -idlength=>20);
>>>>
>>>>     # print the restricted alignments
>>>>     foreach my $genomic_align_block ( @{ $all_genomic_align_blocks } ) {
>>>>         my $restricted_gab =
>>>>     $genomic_align_block->restrict_between_reference_positions($species_start,
>>>>     $species_end);
>>>>
>>>>         foreach my $genomic_align ( @{
>>>>     $restricted_gab->get_all_GenomicAligns() } ) {
>>>>             printf("%s %s:%d-%d:%s:%s\n", $genomic_align->genome_db->name,
>>>>     $genomic_align->dnafrag->name, $genomic_align->dnafrag_start,
>>>>     $genomic_align->dnafrag_end, $genomic_align->dnafrag_strand,
>>>>     $genomic_align->aligned_sequence());
>>>>         }
>>>>         print "\n";
>>>>     }
>>>>
>>>> The output is then:
>>>>
>>>>     homo_sapiens 1:193122704-193122731:1:TGTTCTTTATTTTTTTTTTCTCTT------TTCC
>>>>     mus_musculus 1:143702058-143702091:-1:TGTTCTTAGGTTTTTTTTTTTTTTGTAATCTCCC
>>>>
>>>>
>>>> One way to get to the gene name/transcripts I've used is GeneMember-Adaptor 
>>>> & Homology-Adaptor, but then I'd miss the genomic alignments.
>>>>
>>>> This was taken from an ENSEMBL example:
>>>>
>>>>     use strict;
>>>>     use warnings;
>>>>
>>>>     # using this script to get the ortholog of a given gene of mus musculus
>>>>
>>>>     use Bio::EnsEMBL::Registry;
>>>>
>>>>     my $reg = 'Bio::EnsEMBL::Registry';
>>>>
>>>>     $reg->load_registry_from_db(
>>>>         -host=>'ensembldb.ensembl.org',
>>>>         -port=>5306,
>>>>         -user=>'anonymous',
>>>>     );
>>>>
>>>>
>>>>     # Getting the gene_member_adaptor
>>>>     my $gene_member_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
>>>>     'compara', 'GeneMember');
>>>>
>>>>     my $gene_member =
>>>>     $gene_member_adaptor->fetch_by_stable_id('ENSMUSG00000065782');
>>>>
>>>>     # Getting the homology adaptor
>>>>     my $homology_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
>>>>     'compara', 'Homology');
>>>>
>>>>     my $homologies = $homology_adaptor->fetch_all_by_Member($gene_member);
>>>>     #homologies = hash array
>>>>
>>>>     foreach my $homology ( @{$homologies} ) {
>>>>         foreach my $member ( @{$homology->get_all_Members} ) {
>>>>             if ( $member->taxon->scientific_name eq 'Homo sapiens' ) {
>>>>                 print $member->description, "\n";
>>>>             }
>>>>         }
>>>>     }
>>>>
>>>>
>>>> Output:
>>>>
>>>> Transcript:ENST00000391309 Gene:ENSG00000212611 Chr:X Start:114017033 
>>>> End:114017102 Acc:RF00088
>>>> Transcript:ENST00000384693 Gene:ENSG00000277846 Chr:11 Start:62853663 
>>>> End:62853732 Acc:RF00088
>>>>
>>>> Or is there a better way to get homology members only between 2 species?
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing listDev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog:http://www.ensembl.info/
>>>
>>> -- 
>>> Carla Cummins, Ph.D.
>>> Ensembl Compara Developer
>>> European Bioinformatics Institute (EMBL-EBI),
>>> European Molecular Biology Laboratory,
>>> Wellcome Trust Genome Campus, Hinxton,
>>> Cambridge, CB10 1SD, United Kingdom
>>> Room A3-145
>>> Phone + 44 (0) 1223 49 4240
>>
>
> -- 
> Carla Cummins, Ph.D.
> Ensembl Compara Developer
> European Bioinformatics Institute (EMBL-EBI),
> European Molecular Biology Laboratory,
> Wellcome Trust Genome Campus, Hinxton,
> Cambridge, CB10 1SD, United Kingdom
> Room A3-145
> Phone + 44 (0) 1223 49 4240

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180724/ef159256/attachment.html>


More information about the Dev mailing list