[ensembl-dev] How to get the gene name and transcript name via MethodLinkSpeciesSet & GenomicAlignBlock?

Yan P. Yuan yuan at embl.de
Tue Jul 24 13:41:23 BST 2018


Hi Carla,

thank you for your reply.

Yes, I'm actually interested more in the orthologs' CDS, sometimes also beyond 
their borders. The method you've mentioned was used actually to fetch all genes 
on a chromosome. Then after this, I'd then used the mentioned way to retrieve 
their genomic alignments.

The example "homology2.pl" you've shown can be used to retrieve all ortholog 
pairs (esp. one-to-one's).  I saw this example before, but couldn't get it to 
retrieve their genomic sequences. Is there an access method in the object 
"Homology" to retrieve their genomic alignments (like what I've shown before), 
surrounding a certain region of a CDS?

Best regards

Yan


On 24/07/18 13:27, Carla Cummins wrote:
>
> Hi Yan,
>
> Since LASTZ alignments are on the whole genome, it's important to note that 
> genes may not be present on every part of the sequence. To see which gene(s) 
> are on a given aligned sequence region, you can use:
>
> $genomic_align->get_Slice->get_all_Genes();
>
> However, at the end of the message, you mention homologies. Homologies and 
> whole genome alignments are really separate objects in our system. Inference 
> of homologies is performed on a gene tree level and are not directly related 
> to the whole genome alignments 
> (http://www.ensembl.org/info/genome/compara/homology_types.html). Can you 
> confirm whether it is truly homologies that you are interested in, or whether 
> you are simply using the term to refer to well-aligned regions (regardless of 
> genes)? Here is a basic example of how to retrieve human/mouse homologies 
> using our API: 
> https://www.ebi.ac.uk/~muffato/workshops/2016_01_EBI/solutions_compara/homology2.pl
>
> Finally, since it's clear that there is some relationship between homology and 
> alignment (biologically speaking), we have a QC method for our homologies that 
> tests the quality of the alignment over a pair of homologous genes (WGA 
> score). This may be useful to you: 
> http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html
>
> Hope this helps?
>
> Best,
>
> Carla
>
>
> On 24/07/2018 12:00, Yan P. Yuan wrote:
>> Hi,
>>
>> Currently, I'm interested in genomic aligments between 2 species: homo 
>> sapience and mus musculus.
>>
>> If I use MethodLinkSpeciesSet-adaptor and GenomicAlignBlock-adaptor to fetch 
>> the genomic alignments, the output misses the corresponding gene names (or 
>> involved transcript names).
>>
>> Can someone point out how to get the gene names via the used methods? Or a 
>> different way to do it?
>>
>> Thanks a lot.
>>
>> Yan
>>
>> PS:
>>
>> The folllowing code was used:
>>
>>     use strict;
>>     use warnings;
>>
>>     use Bio::EnsEMBL::Registry;
>>     use Bio::AlignIO;
>>
>>     my $reg = 'Bio::EnsEMBL::Registry';
>>
>>     $reg->load_registry_from_db(
>>         -host=>'ensembldb.ensembl.org',
>>         -port=>5306,
>>         -user=>'anonymous',
>>     );
>>
>>
>>     my $mlss_adaptor = $reg->get_adaptor("Multi", "compara",
>>     "MethodLinkSpeciesSet");
>>
>>     my $mlss         =
>>     $mlss_adaptor->fetch_by_method_link_type_registry_aliases('LASTZ_NET',
>>     ['human', 'mouse']);
>>
>>     my $species_slice_adaptor       = $reg->get_adaptor("mouse", "core",
>>     "Slice");
>>
>>     my $genomic_align_block_adaptor = $reg->get_adaptor("Multi", "compara",
>>     "GenomicAlignBlock");
>>
>>     # Define a region to fetch
>>     my ($species_start, $species_end) = (143702058, 143702091);
>>     my $species_slice = $species_slice_adaptor->fetch_by_region("chromosome",
>>     1, $species_start, $species_end, -1);
>>
>>     my $all_genomic_align_blocks =
>>     $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
>>     $species_slice);
>>
>>     my $alignIO = Bio::AlignIO->newFh(-interleaved=>0,
>>                       -fh=>\*STDOUT,
>>                       -format=>'clustalw',
>>                       -idlength=>20);
>>
>>     # print the restricted alignments
>>     foreach my $genomic_align_block ( @{ $all_genomic_align_blocks } ) {
>>         my $restricted_gab =
>>     $genomic_align_block->restrict_between_reference_positions($species_start,
>>     $species_end);
>>
>>         foreach my $genomic_align ( @{
>>     $restricted_gab->get_all_GenomicAligns() } ) {
>>             printf("%s %s:%d-%d:%s:%s\n", $genomic_align->genome_db->name,
>>     $genomic_align->dnafrag->name, $genomic_align->dnafrag_start,
>>     $genomic_align->dnafrag_end, $genomic_align->dnafrag_strand,
>>     $genomic_align->aligned_sequence());
>>         }
>>         print "\n";
>>     }
>>
>> The output is then:
>>
>>     homo_sapiens 1:193122704-193122731:1:TGTTCTTTATTTTTTTTTTCTCTT------TTCC
>>     mus_musculus 1:143702058-143702091:-1:TGTTCTTAGGTTTTTTTTTTTTTTGTAATCTCCC
>>
>>
>> One way to get to the gene name/transcripts I've used is GeneMember-Adaptor & 
>> Homology-Adaptor, but then I'd miss the genomic alignments.
>>
>> This was taken from an ENSEMBL example:
>>
>>     use strict;
>>     use warnings;
>>
>>     # using this script to get the ortholog of a given gene of mus musculus
>>
>>     use Bio::EnsEMBL::Registry;
>>
>>     my $reg = 'Bio::EnsEMBL::Registry';
>>
>>     $reg->load_registry_from_db(
>>         -host=>'ensembldb.ensembl.org',
>>         -port=>5306,
>>         -user=>'anonymous',
>>     );
>>
>>
>>     # Getting the gene_member_adaptor
>>     my $gene_member_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
>>     'compara', 'GeneMember');
>>
>>     my $gene_member =
>>     $gene_member_adaptor->fetch_by_stable_id('ENSMUSG00000065782');
>>
>>     # Getting the homology adaptor
>>     my $homology_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
>>     'compara', 'Homology');
>>
>>     my $homologies = $homology_adaptor->fetch_all_by_Member($gene_member);
>>     #homologies = hash array
>>
>>     foreach my $homology ( @{$homologies} ) {
>>         foreach my $member ( @{$homology->get_all_Members} ) {
>>             if ( $member->taxon->scientific_name eq 'Homo sapiens' ) {
>>                 print $member->description, "\n";
>>             }
>>         }
>>     }
>>
>>
>> Output:
>>
>> Transcript:ENST00000391309 Gene:ENSG00000212611 Chr:X Start:114017033 
>> End:114017102 Acc:RF00088
>> Transcript:ENST00000384693 Gene:ENSG00000277846 Chr:11 Start:62853663 
>> End:62853732 Acc:RF00088
>>
>> Or is there a better way to get homology members only between 2 species?
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
> -- 
> Carla Cummins, Ph.D.
> Ensembl Compara Developer
> European Bioinformatics Institute (EMBL-EBI),
> European Molecular Biology Laboratory,
> Wellcome Trust Genome Campus, Hinxton,
> Cambridge, CB10 1SD, United Kingdom
> Room A3-145
> Phone + 44 (0) 1223 49 4240

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180724/49fbd499/attachment.html>


More information about the Dev mailing list