[ensembl-dev] How to get the gene name and transcript name via MethodLinkSpeciesSet & GenomicAlignBlock?

Carla Cummins carlac at ebi.ac.uk
Tue Jul 24 14:23:15 BST 2018


Hi Yan,

Just to clarify, the method I mentioned fetches all genes for the range 
of the genomic align object that you've called it on. So 
$genomic_align->get_Slice->get_all_Genes() fetches all genes located on 
the region covered by the $genomic_align, rather than the whole 
chromosome. A slice simply refers to a region of genomic sequence.

Regarding fetching genomic sequences for homologies and expanding to 
surrounding sequence, it can be achieved like so (note: this a code 
snippet and should be merged with the homolgy2.pl example):

my $all_homologies = $homology_adaptor->fetch_all_by_MethodLinkSpeciesSet($this_mlss);
foreach my $homology ( @$all_homologies ) {
     my $gene_members = $homology->get_all_Members();
     foreach my $gm ( @$gene_members ) {
	my $this_slice = $gm->get_Slice;
	# extend the slice by 100bp on 5' end and 200bp on 3' end
	my $expanded_slice = $this_slice->expand(100, 200);
         print $gm->display_label . "\t" . $expanded_slice->seq . "\n";
     }
}

Please note that this code will print the raw genomic sequence, not an 
aligned sequence. There is also a method on the GenomicAlignBlockAdaptor 
object that would allow you to fetch alignments for a given slice, 
meaning you could fetch LASTZ genomic_align_blocks for each homology 
member slice above:
$genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss, 
$slice);


Hope this helps,

Carla

On 24/07/2018 13:41, Yan P. Yuan wrote:
> Hi Carla,
>
> thank you for your reply.
>
> Yes, I'm actually interested more in the orthologs' CDS, sometimes 
> also beyond their borders. The method you've mentioned was used 
> actually to fetch all genes on a chromosome. Then after this, I'd then 
> used the mentioned way to retrieve their genomic alignments.
>
> The example "homology2.pl" you've shown can be used to retrieve all 
> ortholog pairs (esp. one-to-one's).  I saw this example before, but 
> couldn't get it to retrieve their genomic sequences. Is there an 
> access method in the object "Homology" to retrieve their genomic 
> alignments (like what I've shown before), surrounding a certain region 
> of a CDS?
>
> Best regards
>
> Yan
>
>
> On 24/07/18 13:27, Carla Cummins wrote:
>>
>> Hi Yan,
>>
>> Since LASTZ alignments are on the whole genome, it's important to 
>> note that genes may not be present on every part of the sequence. To 
>> see which gene(s) are on a given aligned sequence region, you can use:
>>
>> $genomic_align->get_Slice->get_all_Genes();
>>
>> However, at the end of the message, you mention homologies. 
>> Homologies and whole genome alignments are really separate objects in 
>> our system. Inference of homologies is performed on a gene tree level 
>> and are not directly related to the whole genome alignments 
>> (http://www.ensembl.org/info/genome/compara/homology_types.html). Can 
>> you confirm whether it is truly homologies that you are interested 
>> in, or whether you are simply using the term to refer to well-aligned 
>> regions (regardless of genes)? Here is a basic example of how to 
>> retrieve human/mouse homologies using our API: 
>> https://www.ebi.ac.uk/~muffato/workshops/2016_01_EBI/solutions_compara/homology2.pl
>>
>> Finally, since it's clear that there is some relationship between 
>> homology and alignment (biologically speaking), we have a QC method 
>> for our homologies that tests the quality of the alignment over a 
>> pair of homologous genes (WGA score). This may be useful to you: 
>> http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html
>>
>> Hope this helps?
>>
>> Best,
>>
>> Carla
>>
>>
>> On 24/07/2018 12:00, Yan P. Yuan wrote:
>>> Hi,
>>>
>>> Currently, I'm interested in genomic aligments between 2 species: 
>>> homo sapience and mus musculus.
>>>
>>> If I use MethodLinkSpeciesSet-adaptor and GenomicAlignBlock-adaptor 
>>> to fetch the genomic alignments, the output misses the corresponding 
>>> gene names (or involved transcript names).
>>>
>>> Can someone point out how to get the gene names via the used 
>>> methods? Or a different way to do it?
>>>
>>> Thanks a lot.
>>>
>>> Yan
>>>
>>> PS:
>>>
>>> The folllowing code was used:
>>>
>>>     use strict;
>>>     use warnings;
>>>
>>>     use Bio::EnsEMBL::Registry;
>>>     use Bio::AlignIO;
>>>
>>>     my $reg = 'Bio::EnsEMBL::Registry';
>>>
>>>     $reg->load_registry_from_db(
>>>         -host=>'ensembldb.ensembl.org',
>>>         -port=>5306,
>>>         -user=>'anonymous',
>>>     );
>>>
>>>
>>>     my $mlss_adaptor = $reg->get_adaptor("Multi", "compara",
>>>     "MethodLinkSpeciesSet");
>>>
>>>     my $mlss         =
>>>     $mlss_adaptor->fetch_by_method_link_type_registry_aliases('LASTZ_NET',
>>>     ['human', 'mouse']);
>>>
>>>     my $species_slice_adaptor       = $reg->get_adaptor("mouse",
>>>     "core", "Slice");
>>>
>>>     my $genomic_align_block_adaptor = $reg->get_adaptor("Multi",
>>>     "compara", "GenomicAlignBlock");
>>>
>>>     # Define a region to fetch
>>>     my ($species_start, $species_end) = (143702058, 143702091);
>>>     my $species_slice =
>>>     $species_slice_adaptor->fetch_by_region("chromosome", 1,
>>>     $species_start, $species_end, -1);
>>>
>>>     my $all_genomic_align_blocks =
>>>     $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
>>>     $species_slice);
>>>
>>>     my $alignIO = Bio::AlignIO->newFh(-interleaved=>0,
>>>                       -fh=>\*STDOUT,
>>>                       -format=>'clustalw',
>>>                       -idlength=>20);
>>>
>>>     # print the restricted alignments
>>>     foreach my $genomic_align_block ( @{ $all_genomic_align_blocks } ) {
>>>         my $restricted_gab =
>>>     $genomic_align_block->restrict_between_reference_positions($species_start,
>>>     $species_end);
>>>
>>>         foreach my $genomic_align ( @{
>>>     $restricted_gab->get_all_GenomicAligns() } ) {
>>>             printf("%s %s:%d-%d:%s:%s\n",
>>>     $genomic_align->genome_db->name, $genomic_align->dnafrag->name,
>>>     $genomic_align->dnafrag_start, $genomic_align->dnafrag_end,
>>>     $genomic_align->dnafrag_strand, $genomic_align->aligned_sequence());
>>>         }
>>>         print "\n";
>>>     }
>>>
>>> The output is then:
>>>
>>>     homo_sapiens
>>>     1:193122704-193122731:1:TGTTCTTTATTTTTTTTTTCTCTT------TTCC
>>>     mus_musculus
>>>     1:143702058-143702091:-1:TGTTCTTAGGTTTTTTTTTTTTTTGTAATCTCCC
>>>
>>>
>>> One way to get to the gene name/transcripts I've used is 
>>> GeneMember-Adaptor & Homology-Adaptor, but then I'd miss the genomic 
>>> alignments.
>>>
>>> This was taken from an ENSEMBL example:
>>>
>>>     use strict;
>>>     use warnings;
>>>
>>>     # using this script to get the ortholog of a given gene of mus
>>>     musculus
>>>
>>>     use Bio::EnsEMBL::Registry;
>>>
>>>     my $reg = 'Bio::EnsEMBL::Registry';
>>>
>>>     $reg->load_registry_from_db(
>>>         -host=>'ensembldb.ensembl.org',
>>>         -port=>5306,
>>>         -user=>'anonymous',
>>>     );
>>>
>>>
>>>     # Getting the gene_member_adaptor
>>>     my $gene_member_adaptor =
>>>     Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara',
>>>     'GeneMember');
>>>
>>>     my $gene_member =
>>>     $gene_member_adaptor->fetch_by_stable_id('ENSMUSG00000065782');
>>>
>>>     # Getting the homology adaptor
>>>     my $homology_adaptor =
>>>     Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara', 'Homology');
>>>
>>>     my $homologies =
>>>     $homology_adaptor->fetch_all_by_Member($gene_member);
>>>     #homologies = hash array
>>>
>>>     foreach my $homology ( @{$homologies} ) {
>>>         foreach my $member ( @{$homology->get_all_Members} ) {
>>>             if ( $member->taxon->scientific_name eq 'Homo sapiens' ) {
>>>                 print $member->description, "\n";
>>>             }
>>>         }
>>>     }
>>>
>>>
>>> Output:
>>>
>>> Transcript:ENST00000391309 Gene:ENSG00000212611 Chr:X 
>>> Start:114017033 End:114017102 Acc:RF00088
>>> Transcript:ENST00000384693 Gene:ENSG00000277846 Chr:11 
>>> Start:62853663 End:62853732 Acc:RF00088
>>>
>>> Or is there a better way to get homology members only between 2 
>>> species?
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing listDev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog:http://www.ensembl.info/
>>
>> -- 
>> Carla Cummins, Ph.D.
>> Ensembl Compara Developer
>> European Bioinformatics Institute (EMBL-EBI),
>> European Molecular Biology Laboratory,
>> Wellcome Trust Genome Campus, Hinxton,
>> Cambridge, CB10 1SD, United Kingdom
>> Room A3-145
>> Phone + 44 (0) 1223 49 4240
>

-- 
Carla Cummins, Ph.D.
Ensembl Compara Developer
European Bioinformatics Institute (EMBL-EBI),
European Molecular Biology Laboratory,
Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SD, United Kingdom
Room A3-145
Phone + 44 (0) 1223 49 4240
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180724/20e58e59/attachment.html>


More information about the Dev mailing list