[ensembl-dev] How to retrieve "Age of Base" using a Perl API?

Matthieu Muffato muffato at ebi.ac.uk
Mon Sep 29 18:02:13 BST 2014


Hi Haiming,

You need to access the multiple alignments through the GenomicAlignTree 
objects. They put together the history of the extant regions and their 
ancestral sequences.

You can have a look at the code we're using to generate the AgeOfBase track
https://github.com/Ensembl/ensembl-compara/blob/release/77/modules/Bio/EnsEMBL/Compara/RunnableDB/BaseAge/BaseAge.pm#L121
especially those lines:
  - L140: get all the GenomicAlignTree objects
  - L160: iterate over list of trees
  - L190: iterate over the internal nodes of a given tree
  - L195+203: get the ancestral sequence of this node

Hope this helps,
Matthieu

On 29/09/14 17:45, Tang, Haiming wrote:
> Hi, Stephen
>
> Thank you very much for you help.
>
> This solves my problem. So column 4 like Ggor-Hsap-Hsap-Pabe[4] stands
> for an ancestor of these listed species to which the base has been
> preserved.
>
> May I also know the script you used to get the tree and alignment info
> as seen in your email?
>
> I tried :
>
> "my $mlss =
> $mlss_adaptor->fetch_by_method_link_type_species_set_name("EPO", "mammals");
>
> my $slice = $slice_adaptor->fetch_by_region('toplevel', $seq_region,
> $seq_region_start, $seq_region_end);
>
> my $genomic_align_blocks =$genomic_align_block_adaptor ->fetch_all_by_
>
> MethodLinkSpeciesSet_Slice($mlss,  $slice);
>
> " to fetch the ancestral sequences.
>
> But it doesn't seem to work.
>
> Thanks
> Haiming
>
> On Mon, Sep 29, 2014 at 8:42 AM, Stephen Fitzgerald <stephenf at ebi.ac.uk
> <mailto:stephenf at ebi.ac.uk>> wrote:
>
>     Hi Haiming, column 4 lists the set of species whose ancestor had the
>     same base as human (we use a program called Ortheus to infer the
>     sequence of the ancestral nodes in the tree connecting all the
>     extant species).
>
>     For  example:
>
>     chr1    1031796 1031797 Mmul-Panu-Hsap-Ptro[4]  196     50,50,255
>
>     The ancestral sequence of the primates present in the alignment at
>     this position in human (maked with a "*") is the most recent common
>     ancestor to share a G base with human (this is at the root of the 4
>     primates in the alignment). The next deepest ancestor (between
>     rodents and primates, marked with a "**") is predicted to have a T
>     at this position. So, somewhere between these two ancestors the base
>     changed T->G. Hence, this position would be marked as primate specific.
>
>
>     Human ›         chromosome:GRCh38:1:1031796:__1031797:1
>     Ancestral sequences ›   (homo_sapiens,pan_troglodytes)__;
>     Chimpanzee ›    chromosome:CHIMP2.1.4:7:__159477370:159477371:1
>     Ancestral sequences ›
>       ((homo_sapiens,pan___troglodytes),(papio_anubis,__macaca_mulatta)); *
>     Macaque ›       chromosome:MMUL_1:1:4106934:__4106935:1
>     Ancestral sequences ›   (papio_anubis,macaca_mulatta);
>     Olive baboon ›  scaffold:PapAnu2.0:JH684932.1:__192067:192068:1
>     Ancestral sequences ›
>       (((homo_sapiens,pan___troglodytes),(papio_anubis,__macaca_mulatta)),(mus___musculus,rattus_norvegicus)); **
>     Mouse ›         chromosome:GRCm38:4:156188534:__156188535:-1
>     Ancestral sequences ›   (mus_musculus,rattus___norvegicus);
>     Rat ›   chromosome:Rnor_5.0:5:__177087882:177087883:-1
>     Ancestral sequences ›
>       ((((homo_sapiens,pan___troglodytes),(papio_anubis,__macaca_mulatta)),(mus___musculus,rattus_norvegicus)),(__(sus_scrofa,bos_taurus),canis___familiaris));
>     Cow ›   chromosome:UMD3.1:16:52694475:__52694476:-1
>     Ancestral sequences ›   (sus_scrofa,bos_taurus);
>     Pig ›   chromosome:Sscrofa10.2:6:__57872690:57872691:-1
>     Ancestral sequences ›   ((sus_scrofa,bos_taurus),__canis_familiaris);
>     Dog ›   chromosome:CanFam3.1:5:__56250642:56250643:1
>
>
>     Human                G
>     Ancestral sequences  G
>     Chimpanzee           G
>     Ancestral sequences  G *
>     Macaque              G
>     Ancestral sequences  G
>     Olive baboon         G
>     Ancestral sequences  T **
>     Mouse                C
>     Ancestral sequences  C
>     Rat                  C
>     Ancestral sequences  T
>     Cow                  T
>     Ancestral sequences  T
>     Pig                  G
>     Ancestral sequences  T
>     Dog                  T
>
>
>     We don't store speciation times for the age of base track.
>     Information regarding speciation times can be obtained from sites
>     such as Time Tree (http://www.timetree.org/).
>
>     HTH,
>     Stephen.
>
>     On Fri, 26 Sep 2014, Tang, Haiming wrote:
>
>         HI, Stephen
>         I followed your instructions and got the bed file.
>
>         Column 4 appears to list the species for which that base is the
>         same as in human, since it looks like Hsap is in every line.
>         The number in square brackets [] is just the number of species
>         listed.
>
>         But the file doesn’t seem to give the age of the base.
>
>         For example: How to interpret Ggor-Hsap-Hsap-Pabe[4] in
>
>         "chrY 57107125 57107126 Ggor-Hsap-Hsap-Pabe[4] 120 30,30,255"?
>
>         Are Ggor and Hsap ancestral species?
>
>         Or Age of base is stored at somewhere else?
>
>         Thanks
>
>         Haiming
>
>         On Fri, Sep 26, 2014 at 2:47 AM, Stephen Fitzgerald
>         <stephenf at ebi.ac.uk <mailto:stephenf at ebi.ac.uk>> wrote:
>                Hi Haiming,
>                the compara API is used to retrieve information from the
>         compara database. However the "Age of Base" track is
>                generated from a Bigbed binary file, so it is not part of
>         the compara database. The Bigbed file is generated from a
>                Bed file. I have transferred this Bed file (from release
>         76) to our ftp site. You can retrieve this file using
>                anonymous ftp from here:
>
>                ftp ftp.ebi.ac.uk <http://ftp.ebi.ac.uk>
>
>                cd pub/software/ensembl/stephen/__BaseAge/
>
>                get base_age_76.bed.gz
>
>                Hope this helps,
>                Stephen.
>
>
>                On Thu, 25 Sep 2014, Tang, Haiming wrote:
>
>
>                      DEAR GROUP, MY NAME IS HAIMING TANG. I'M IN DR PAUL
>         THOMAS'S GROUP IN UNIVERSITY OF SOUTHERN
>                      CALIFORNIA.
>
>                      I'm trying to retrieve "Age of Base" using Perl API.
>
>                      As described in
>         "http://www.ensembl.org/info/__genome/compara/analyses.html#__age_of_base
>         <http://www.ensembl.org/info/genome/compara/analyses.html#age_of_base>"
>
>                      "Age of Base
>
>                      From these ancestral sequences, we infer the age of
>         a base, i.e. the timing of the most recent mutation
>                      for each
>                      base of the genome. Each position of the human
>         genome is compared to its immediate inferred ancestor,
>                      then its
>                      ancestor, etc. until a difference is found. The
>         inferred substitution event therefore occurred on a
>                      specific
>                      branch of the tree, which is identified by all the
>         extant species which eventually descended from that
>                      branch, as
>                      illustrated below."
>
>                      "Age of base" has close relation with EPO ancestral
>         alignment. But I could find any related method in
>                      Compara Perl
>                      API Documentation or Compara API Tutorial.
>
>                      Can anyone show me how to do to retrieve "age of base"?
>
>                      Thank you in advance.
>
>                      Haiming
>
>
>
>
>                _________________________________________________
>                Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>                Posting guidelines and subscribe/unsubscribe info:
>         http://lists.ensembl.org/__mailman/listinfo/dev
>         <http://lists.ensembl.org/mailman/listinfo/dev>
>                Ensembl Blog: http://www.ensembl.info/
>
>
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

-- 
Matthieu Muffato, Ph.D.
Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468




More information about the Dev mailing list