[ensembl-dev] How to retrieve "Age of Base" using a Perl API?

Stephen Fitzgerald stephenf at ebi.ac.uk
Mon Sep 29 16:42:23 BST 2014


Hi Haiming, column 4 lists the set of species whose ancestor had the same 
base as human (we use a program called Ortheus to infer the sequence of 
the ancestral nodes in the tree connecting all the extant species).

For  example:

chr1    1031796 1031797 Mmul-Panu-Hsap-Ptro[4]  196     50,50,255

The ancestral sequence of the primates present in the alignment 
at this position in human (maked with a "*") is the most recent common 
ancestor to share a G base with human (this is at the root of the 4 
primates in the alignment). The next deepest ancestor (between rodents and 
primates, marked with a "**") is predicted to have a T at this position. 
So, somewhere between these two ancestors the base changed T->G. Hence, 
this position would be marked as primate specific.


Human › 	chromosome:GRCh38:1:1031796:1031797:1
Ancestral sequences › 	(homo_sapiens,pan_troglodytes);
Chimpanzee › 	chromosome:CHIMP2.1.4:7:159477370:159477371:1
Ancestral sequences › 	((homo_sapiens,pan_troglodytes),(papio_anubis,macaca_mulatta)); *
Macaque › 	chromosome:MMUL_1:1:4106934:4106935:1
Ancestral sequences › 	(papio_anubis,macaca_mulatta);
Olive baboon › 	scaffold:PapAnu2.0:JH684932.1:192067:192068:1
Ancestral sequences › 	(((homo_sapiens,pan_troglodytes),(papio_anubis,macaca_mulatta)),(mus_musculus,rattus_norvegicus)); **
Mouse › 	chromosome:GRCm38:4:156188534:156188535:-1
Ancestral sequences › 	(mus_musculus,rattus_norvegicus);
Rat › 	chromosome:Rnor_5.0:5:177087882:177087883:-1
Ancestral sequences › 	((((homo_sapiens,pan_troglodytes),(papio_anubis,macaca_mulatta)),(mus_musculus,rattus_norvegicus)),((sus_scrofa,bos_taurus),canis_familiaris));
Cow › 	chromosome:UMD3.1:16:52694475:52694476:-1
Ancestral sequences › 	(sus_scrofa,bos_taurus);
Pig › 	chromosome:Sscrofa10.2:6:57872690:57872691:-1
Ancestral sequences › 	((sus_scrofa,bos_taurus),canis_familiaris);
Dog › 	chromosome:CanFam3.1:5:56250642:56250643:1


Human                G
Ancestral sequences  G
Chimpanzee           G
Ancestral sequences  G *
Macaque              G
Ancestral sequences  G
Olive baboon         G
Ancestral sequences  T **
Mouse                C
Ancestral sequences  C
Rat                  C
Ancestral sequences  T
Cow                  T
Ancestral sequences  T
Pig                  G
Ancestral sequences  T
Dog                  T


We don't store speciation times for the age of base track. Information 
regarding speciation times can be obtained from sites such as Time Tree 
(http://www.timetree.org/).

HTH,
Stephen.

On Fri, 26 Sep 2014, Tang, Haiming wrote:

> HI, Stephen
> I followed your instructions and got the bed file.
> 
> Column 4 appears to list the species for which that base is the same as in human, since it looks like Hsap is in every line. 
> The number in square brackets [] is just the number of species listed.
> 
> But the file doesn’t seem to give the age of the base. 
> 
> For example: How to interpret Ggor-Hsap-Hsap-Pabe[4] in 
> 
> "chrY 57107125 57107126 Ggor-Hsap-Hsap-Pabe[4] 120 30,30,255"?
> 
> Are Ggor and Hsap ancestral species? 
> 
> Or Age of base is stored at somewhere else?
> 
> Thanks
> 
> Haiming
> 
> On Fri, Sep 26, 2014 at 2:47 AM, Stephen Fitzgerald <stephenf at ebi.ac.uk> wrote:
>       Hi Haiming,
>       the compara API is used to retrieve information from the compara database. However the "Age of Base" track is
>       generated from a Bigbed binary file, so it is not part of the compara database. The Bigbed file is generated from a
>       Bed file. I have transferred this Bed file (from release 76) to our ftp site. You can retrieve this file using
>       anonymous ftp from here:
>
>       ftp ftp.ebi.ac.uk
>
>       cd pub/software/ensembl/stephen/BaseAge/
>
>       get base_age_76.bed.gz
>
>       Hope this helps,
>       Stephen.
> 
>
>       On Thu, 25 Sep 2014, Tang, Haiming wrote:
> 
>
>             DEAR GROUP, MY NAME IS HAIMING TANG. I'M IN DR PAUL THOMAS'S GROUP IN UNIVERSITY OF SOUTHERN
>             CALIFORNIA.
>
>             I'm trying to retrieve "Age of Base" using Perl API.
>
>             As described in "http://www.ensembl.org/info/genome/compara/analyses.html#age_of_base"
>
>             "Age of Base
>
>             From these ancestral sequences, we infer the age of a base, i.e. the timing of the most recent mutation
>             for each
>             base of the genome. Each position of the human genome is compared to its immediate inferred ancestor,
>             then its
>             ancestor, etc. until a difference is found. The inferred substitution event therefore occurred on a
>             specific
>             branch of the tree, which is identified by all the extant species which eventually descended from that
>             branch, as
>             illustrated below."
>
>             "Age of base" has close relation with EPO ancestral alignment. But I could find any related method in
>             Compara Perl
>             API Documentation or Compara API Tutorial.
>
>             Can anyone show me how to do to retrieve "age of base"?
>
>             Thank you in advance.
>
>             Haiming
> 
> 
> 
>
>       _______________________________________________
>       Dev mailing list    Dev at ensembl.org
>       Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>       Ensembl Blog: http://www.ensembl.info/
> 
> 
> 
>


More information about the Dev mailing list