[ensembl-dev] How to get the gene name and transcript name via MethodLinkSpeciesSet & GenomicAlignBlock?

Yan P. Yuan yuan at embl.de
Tue Jul 24 12:00:28 BST 2018


Hi,

Currently, I'm interested in genomic aligments between 2 species: homo sapience 
and mus musculus.

If I use MethodLinkSpeciesSet-adaptor and GenomicAlignBlock-adaptor to fetch the 
genomic alignments, the output misses the corresponding gene names (or involved 
transcript names).

Can someone point out how to get the gene names via the used methods? Or a 
different way to do it?

Thanks a lot.

Yan

PS:

The folllowing code was used:

    use strict;
    use warnings;

    use Bio::EnsEMBL::Registry;
    use Bio::AlignIO;

    my $reg = 'Bio::EnsEMBL::Registry';

    $reg->load_registry_from_db(
         -host=>'ensembldb.ensembl.org',
         -port=>5306,
         -user=>'anonymous',
    );


    my $mlss_adaptor = $reg->get_adaptor("Multi", "compara",
    "MethodLinkSpeciesSet");

    my $mlss         =
    $mlss_adaptor->fetch_by_method_link_type_registry_aliases('LASTZ_NET',
    ['human', 'mouse']);

    my $species_slice_adaptor       = $reg->get_adaptor("mouse", "core", "Slice");

    my $genomic_align_block_adaptor = $reg->get_adaptor("Multi", "compara",
    "GenomicAlignBlock");

    # Define a region to fetch
    my ($species_start, $species_end) = (143702058, 143702091);
    my $species_slice = $species_slice_adaptor->fetch_by_region("chromosome", 1,
    $species_start, $species_end, -1);

    my $all_genomic_align_blocks =
    $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($mlss,
    $species_slice);

    my $alignIO = Bio::AlignIO->newFh(-interleaved=>0,
                       -fh=>\*STDOUT,
                       -format=>'clustalw',
                       -idlength=>20);

    # print the restricted alignments
    foreach my $genomic_align_block ( @{ $all_genomic_align_blocks } ) {
         my $restricted_gab =
    $genomic_align_block->restrict_between_reference_positions($species_start,
    $species_end);

         foreach my $genomic_align ( @{ $restricted_gab->get_all_GenomicAligns()
    } ) {
             printf("%s %s:%d-%d:%s:%s\n", $genomic_align->genome_db->name,
    $genomic_align->dnafrag->name, $genomic_align->dnafrag_start,
    $genomic_align->dnafrag_end, $genomic_align->dnafrag_strand,
    $genomic_align->aligned_sequence());
         }
         print "\n";
    }

The output is then:

    homo_sapiens 1:193122704-193122731:1:TGTTCTTTATTTTTTTTTTCTCTT------TTCC
    mus_musculus 1:143702058-143702091:-1:TGTTCTTAGGTTTTTTTTTTTTTTGTAATCTCCC


One way to get to the gene name/transcripts I've used is GeneMember-Adaptor & 
Homology-Adaptor, but then I'd miss the genomic alignments.

This was taken from an ENSEMBL example:

    use strict;
    use warnings;

    # using this script to get the ortholog of a given gene of mus musculus

    use Bio::EnsEMBL::Registry;

    my $reg = 'Bio::EnsEMBL::Registry';

    $reg->load_registry_from_db(
         -host=>'ensembldb.ensembl.org',
         -port=>5306,
         -user=>'anonymous',
    );


    # Getting the gene_member_adaptor
    my $gene_member_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
    'compara', 'GeneMember');

    my $gene_member =
    $gene_member_adaptor->fetch_by_stable_id('ENSMUSG00000065782');

    # Getting the homology adaptor
    my $homology_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi',
    'compara', 'Homology');

    my $homologies = $homology_adaptor->fetch_all_by_Member($gene_member);
    #homologies = hash array

    foreach my $homology ( @{$homologies} ) {
         foreach my $member ( @{$homology->get_all_Members} ) {
             if ( $member->taxon->scientific_name eq 'Homo sapiens' ) {
                 print $member->description, "\n";
             }
         }
    }


Output:

Transcript:ENST00000391309 Gene:ENSG00000212611 Chr:X Start:114017033 
End:114017102 Acc:RF00088
Transcript:ENST00000384693 Gene:ENSG00000277846 Chr:11 Start:62853663 
End:62853732 Acc:RF00088

Or is there a better way to get homology members only between 2 species?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20180724/af252683/attachment.html>


More information about the Dev mailing list