[ensembl-dev] How to get the gene name and transcript name via MethodLinkSpeciesSet & GenomicAlignBlock?

Carla Cummins carlac at ebi.ac.uk
Tue Jul 24 12:27:43 BST 2018


Hi Yan,

Since LASTZ alignments are on the whole genome, it's important to note 
that genes may not be present on every part of the sequence. To see 
which gene(s) are on a given aligned sequence region, you can use:

$genomic_align->get_Slice->get_all_Genes();

However, at the end of the message, you mention homologies. Homologies 
and whole genome alignments are really separate objects in our system. 
Inference of homologies is performed on a gene tree level and are not 
directly related to the whole genome alignments 
(http://www.ensembl.org/info/genome/compara/homology_types.html). Can 
you confirm whether it is truly homologies that you are interested in, 
or whether you are simply using the term to refer to well-aligned 
regions (regardless of genes)? Here is a basic example of how to 
retrieve human/mouse homologies using our API: 
https://www.ebi.ac.uk/~muffato/workshops/2016_01_EBI/solutions_compara/homology2.pl

Finally, since it's clear that there is some relationship between 
homology and alignment (biologically speaking), we have a QC method for 
our homologies that tests the quality of the alignment over a pair of 
homologous genes (WGA score). This may be useful to you: 
http://www.ensembl.org/info/genome/compara/Ortholog_qc_manual.html

Hope this helps?

Best,

Carla


On 24/07/2018 12:00, Yan P. Yuan wrote:
> Hi,
>
> Currently, I'm interested in genomic aligments between 2 species: homo 
> sapience and mus musculus.
>
> If I use MethodLinkSpeciesSet-adaptor and GenomicAlignBlock-adaptor to 
> fetch the genomic alignments, the output misses the corresponding gene 
> names (or involved transcript names).
>
> Can someone point out how to get the gene names via the used methods? 
> Or a different way to do it?
>
> Thanks a lot.
>
> Yan
>
> PS:
>
> The folllowing code was used:
>
>     use strict;
>     use warnings;
>
>     use Bio::EnsEMBL::Registry;
>     use Bio::AlignIO;
>
>     my $reg = 'Bio::EnsEMBL::Registry';
>
>     $reg->load_registry_from_db(
>         -host=>'ensembldb.ensembl.org',
>         -port=>5306,
>         -user=>'anonymous',
>     );
>
>
>     my $mlss_adaptor = $reg->get_adaptor("Multi", "compara",
>     "MethodLinkSpeciesSet");
>
>     my $mlss         =
>     $mlss_adaptor->fetch_by_method_link_type_registry_aliases('LASTZ_NET',
>     ['human', 'mouse']);
>
>     my $species_slice_adaptor       = $reg->get_adaptor("mouse",
>     "core", "Slice");
>
>     my $genomic_align_block_adaptor = $reg->get_adaptor("Multi",
>     "compara", "GenomicAlignBlock");
>
>     # Define a region to fetch
>     my ($species_start, $species_end) = (143702058, 143702091);
>     my $species_slice =
>     $species_slice_adaptor->fetch_by_region("chromosome", 1,
>     $species_start, $species_end, -1);
>
>     my $all_genomic_align_blocks =
>     $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
>     $species_slice);
>
>     my $alignIO = Bio::AlignIO->newFh(-interleaved=>0,
>                       -fh=>\*STDOUT,
>                       -format=>'clustalw',
>                       -idlength=>20);
>
>     # print the restricted alignments
>     foreach my $genomic_align_block ( @{ $all_genomic_align_blocks } ) {
>         my $restricted_gab =
>     $genomic_align_block->restrict_between_reference_positions($species_start,
>     $species_end);
>
>         foreach my $genomic_align ( @{
>     $restricted_gab->get_all_GenomicAligns() } ) {
>             printf("%s %s:%d-%d:%s:%s\n",
>     $genomic_align->genome_db->name, $genomic_align->dnafrag->name,
>     $genomic_align->dnafrag_start, $genomic_align->dnafrag_end,
>     $genomic_align->dnafrag_strand, $genomic_align->aligned_sequence());
>         }
>         print "\n";
>     }
>
> The output is then:
>
>     homo_sapiens
>     1:193122704-193122731:1:TGTTCTTTATTTTTTTTTTCTCTT------TTCC
>     mus_musculus
>     1:143702058-143702091:-1:TGTTCTTAGGTTTTTTTTTTTTTTGTAATCTCCC
>
>
> One way to get to the gene name/transcripts I've used is 
> GeneMember-Adaptor & Homology-Adaptor, but then I'd miss the genomic 
> alignments.
>
> This was taken from an ENSEMBL example:
>
>     use strict;
>     use warnings;
>
>     # using this script to get the ortholog of a given gene of mus
>     musculus
>
>     use Bio::EnsEMBL::Registry;
>
>     my $reg = 'Bio::EnsEMBL::Registry';
>
>     $reg->load_registry_from_db(
>         -host=>'ensembldb.ensembl.org',
>         -port=>5306,
>         -user=>'anonymous',
>     );
>
>
>     # Getting the gene_member_adaptor
>     my $gene_member_adaptor =
>     Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara', 'GeneMember');
>
>     my $gene_member =
>     $gene_member_adaptor->fetch_by_stable_id('ENSMUSG00000065782');
>
>     # Getting the homology adaptor
>     my $homology_adaptor =
>     Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara', 'Homology');
>
>     my $homologies =
>     $homology_adaptor->fetch_all_by_Member($gene_member); #homologies
>     = hash array
>
>     foreach my $homology ( @{$homologies} ) {
>         foreach my $member ( @{$homology->get_all_Members} ) {
>             if ( $member->taxon->scientific_name eq 'Homo sapiens' ) {
>                 print $member->description, "\n";
>             }
>         }
>     }
>
>
> Output:
>
> Transcript:ENST00000391309 Gene:ENSG00000212611 Chr:X Start:114017033 
> End:114017102 Acc:RF00088
> Transcript:ENST00000384693 Gene:ENSG00000277846 Chr:11 Start:62853663 
> End:62853732 Acc:RF00088
>
> Or is there a better way to get homology members only between 2 species?
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
Carla Cummins, Ph.D.
Ensembl Compara Developer
European Bioinformatics Institute (EMBL-EBI),
European Molecular Biology Laboratory,
Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SD, United Kingdom
Room A3-145
Phone + 44 (0) 1223 49 4240
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180724/48538c18/attachment.html>


More information about the Dev mailing list