[ensembl-dev] How to retrieve "Age of Base" using a Perl API?

Tang, Haiming ningzhithm at gmail.com
Mon Sep 29 18:12:55 BST 2014


Thank you very much Matthieu.
Very helpful.

Thanks
Haiming

On Mon, Sep 29, 2014 at 10:02 AM, Matthieu Muffato <muffato at ebi.ac.uk>
wrote:

> Hi Haiming,
>
> You need to access the multiple alignments through the GenomicAlignTree
> objects. They put together the history of the extant regions and their
> ancestral sequences.
>
> You can have a look at the code we're using to generate the AgeOfBase track
> https://github.com/Ensembl/ensembl-compara/blob/release/
> 77/modules/Bio/EnsEMBL/Compara/RunnableDB/BaseAge/BaseAge.pm#L121
> especially those lines:
>  - L140: get all the GenomicAlignTree objects
>  - L160: iterate over list of trees
>  - L190: iterate over the internal nodes of a given tree
>  - L195+203: get the ancestral sequence of this node
>
> Hope this helps,
> Matthieu
>
> On 29/09/14 17:45, Tang, Haiming wrote:
>
>> Hi, Stephen
>>
>> Thank you very much for you help.
>>
>> This solves my problem. So column 4 like Ggor-Hsap-Hsap-Pabe[4] stands
>> for an ancestor of these listed species to which the base has been
>> preserved.
>>
>> May I also know the script you used to get the tree and alignment info
>> as seen in your email?
>>
>> I tried :
>>
>> "my $mlss =
>> $mlss_adaptor->fetch_by_method_link_type_species_set_name("EPO",
>> "mammals");
>>
>> my $slice = $slice_adaptor->fetch_by_region('toplevel', $seq_region,
>> $seq_region_start, $seq_region_end);
>>
>> my $genomic_align_blocks =$genomic_align_block_adaptor ->fetch_all_by_
>>
>> MethodLinkSpeciesSet_Slice($mlss,  $slice);
>>
>> " to fetch the ancestral sequences.
>>
>> But it doesn't seem to work.
>>
>> Thanks
>> Haiming
>>
>> On Mon, Sep 29, 2014 at 8:42 AM, Stephen Fitzgerald <stephenf at ebi.ac.uk
>> <mailto:stephenf at ebi.ac.uk>> wrote:
>>
>>     Hi Haiming, column 4 lists the set of species whose ancestor had the
>>     same base as human (we use a program called Ortheus to infer the
>>     sequence of the ancestral nodes in the tree connecting all the
>>     extant species).
>>
>>     For  example:
>>
>>     chr1    1031796 1031797 Mmul-Panu-Hsap-Ptro[4]  196     50,50,255
>>
>>     The ancestral sequence of the primates present in the alignment at
>>     this position in human (maked with a "*") is the most recent common
>>     ancestor to share a G base with human (this is at the root of the 4
>>     primates in the alignment). The next deepest ancestor (between
>>     rodents and primates, marked with a "**") is predicted to have a T
>>     at this position. So, somewhere between these two ancestors the base
>>     changed T->G. Hence, this position would be marked as primate
>> specific.
>>
>>
>>     Human ›         chromosome:GRCh38:1:1031796:__1031797:1
>>     Ancestral sequences ›   (homo_sapiens,pan_troglodytes)__;
>>     Chimpanzee ›    chromosome:CHIMP2.1.4:7:__159477370:159477371:1
>>     Ancestral sequences ›
>>       ((homo_sapiens,pan___troglodytes),(papio_anubis,__macaca_mulatta));
>> *
>>     Macaque ›       chromosome:MMUL_1:1:4106934:__4106935:1
>>     Ancestral sequences ›   (papio_anubis,macaca_mulatta);
>>     Olive baboon ›  scaffold:PapAnu2.0:JH684932.1:__192067:192068:1
>>     Ancestral sequences ›
>>       (((homo_sapiens,pan___troglodytes),(papio_anubis,__
>> macaca_mulatta)),(mus___musculus,rattus_norvegicus)); **
>>     Mouse ›         chromosome:GRCm38:4:156188534:__156188535:-1
>>     Ancestral sequences ›   (mus_musculus,rattus___norvegicus);
>>     Rat ›   chromosome:Rnor_5.0:5:__177087882:177087883:-1
>>     Ancestral sequences ›
>>       ((((homo_sapiens,pan___troglodytes),(papio_anubis,__
>> macaca_mulatta)),(mus___musculus,rattus_norvegicus)),(
>> __(sus_scrofa,bos_taurus),canis___familiaris));
>>     Cow ›   chromosome:UMD3.1:16:52694475:__52694476:-1
>>     Ancestral sequences ›   (sus_scrofa,bos_taurus);
>>     Pig ›   chromosome:Sscrofa10.2:6:__57872690:57872691:-1
>>     Ancestral sequences ›   ((sus_scrofa,bos_taurus),__canis_familiaris);
>>     Dog ›   chromosome:CanFam3.1:5:__56250642:56250643:1
>>
>>
>>     Human                G
>>     Ancestral sequences  G
>>     Chimpanzee           G
>>     Ancestral sequences  G *
>>     Macaque              G
>>     Ancestral sequences  G
>>     Olive baboon         G
>>     Ancestral sequences  T **
>>     Mouse                C
>>     Ancestral sequences  C
>>     Rat                  C
>>     Ancestral sequences  T
>>     Cow                  T
>>     Ancestral sequences  T
>>     Pig                  G
>>     Ancestral sequences  T
>>     Dog                  T
>>
>>
>>     We don't store speciation times for the age of base track.
>>     Information regarding speciation times can be obtained from sites
>>     such as Time Tree (http://www.timetree.org/).
>>
>>     HTH,
>>     Stephen.
>>
>>     On Fri, 26 Sep 2014, Tang, Haiming wrote:
>>
>>         HI, Stephen
>>         I followed your instructions and got the bed file.
>>
>>         Column 4 appears to list the species for which that base is the
>>         same as in human, since it looks like Hsap is in every line.
>>         The number in square brackets [] is just the number of species
>>         listed.
>>
>>         But the file doesn’t seem to give the age of the base.
>>
>>         For example: How to interpret Ggor-Hsap-Hsap-Pabe[4] in
>>
>>         "chrY 57107125 57107126 Ggor-Hsap-Hsap-Pabe[4] 120 30,30,255"?
>>
>>         Are Ggor and Hsap ancestral species?
>>
>>         Or Age of base is stored at somewhere else?
>>
>>         Thanks
>>
>>         Haiming
>>
>>         On Fri, Sep 26, 2014 at 2:47 AM, Stephen Fitzgerald
>>         <stephenf at ebi.ac.uk <mailto:stephenf at ebi.ac.uk>> wrote:
>>                Hi Haiming,
>>                the compara API is used to retrieve information from the
>>         compara database. However the "Age of Base" track is
>>                generated from a Bigbed binary file, so it is not part of
>>         the compara database. The Bigbed file is generated from a
>>                Bed file. I have transferred this Bed file (from release
>>         76) to our ftp site. You can retrieve this file using
>>                anonymous ftp from here:
>>
>>                ftp ftp.ebi.ac.uk <http://ftp.ebi.ac.uk>
>>
>>                cd pub/software/ensembl/stephen/__BaseAge/
>>
>>                get base_age_76.bed.gz
>>
>>                Hope this helps,
>>                Stephen.
>>
>>
>>                On Thu, 25 Sep 2014, Tang, Haiming wrote:
>>
>>
>>                      DEAR GROUP, MY NAME IS HAIMING TANG. I'M IN DR PAUL
>>         THOMAS'S GROUP IN UNIVERSITY OF SOUTHERN
>>                      CALIFORNIA.
>>
>>                      I'm trying to retrieve "Age of Base" using Perl API.
>>
>>                      As described in
>>         "http://www.ensembl.org/info/__genome/compara/analyses.html#
>> __age_of_base
>>         <http://www.ensembl.org/info/genome/compara/analyses.html#
>> age_of_base>"
>>
>>                      "Age of Base
>>
>>                      From these ancestral sequences, we infer the age of
>>         a base, i.e. the timing of the most recent mutation
>>                      for each
>>                      base of the genome. Each position of the human
>>         genome is compared to its immediate inferred ancestor,
>>                      then its
>>                      ancestor, etc. until a difference is found. The
>>         inferred substitution event therefore occurred on a
>>                      specific
>>                      branch of the tree, which is identified by all the
>>         extant species which eventually descended from that
>>                      branch, as
>>                      illustrated below."
>>
>>                      "Age of base" has close relation with EPO ancestral
>>         alignment. But I could find any related method in
>>                      Compara Perl
>>                      API Documentation or Compara API Tutorial.
>>
>>                      Can anyone show me how to do to retrieve "age of
>> base"?
>>
>>                      Thank you in advance.
>>
>>                      Haiming
>>
>>
>>
>>
>>                _________________________________________________
>>                Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>                Posting guidelines and subscribe/unsubscribe info:
>>         http://lists.ensembl.org/__mailman/listinfo/dev
>>         <http://lists.ensembl.org/mailman/listinfo/dev>
>>                Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:
>>     http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
> --
> Matthieu Muffato, Ph.D.
> Ensembl Compara Project Leader
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus, Hinxton
> Cambridge, CB10 1SD, United Kingdom
> Room  A3-145
> Phone + 44 (0) 1223 49 4631
> Fax   + 44 (0) 1223 49 4468
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140929/8dba004d/attachment.html>


More information about the Dev mailing list