[ensembl-dev] How to retrieve "Age of Base" using a Perl API?
Stephen Fitzgerald
stephenf at ebi.ac.uk
Mon Sep 29 16:42:23 BST 2014
Hi Haiming, column 4 lists the set of species whose ancestor had the same
base as human (we use a program called Ortheus to infer the sequence of
the ancestral nodes in the tree connecting all the extant species).
For example:
chr1 1031796 1031797 Mmul-Panu-Hsap-Ptro[4] 196 50,50,255
The ancestral sequence of the primates present in the alignment
at this position in human (maked with a "*") is the most recent common
ancestor to share a G base with human (this is at the root of the 4
primates in the alignment). The next deepest ancestor (between rodents and
primates, marked with a "**") is predicted to have a T at this position.
So, somewhere between these two ancestors the base changed T->G. Hence,
this position would be marked as primate specific.
Human › chromosome:GRCh38:1:1031796:1031797:1
Ancestral sequences › (homo_sapiens,pan_troglodytes);
Chimpanzee › chromosome:CHIMP2.1.4:7:159477370:159477371:1
Ancestral sequences › ((homo_sapiens,pan_troglodytes),(papio_anubis,macaca_mulatta)); *
Macaque › chromosome:MMUL_1:1:4106934:4106935:1
Ancestral sequences › (papio_anubis,macaca_mulatta);
Olive baboon › scaffold:PapAnu2.0:JH684932.1:192067:192068:1
Ancestral sequences › (((homo_sapiens,pan_troglodytes),(papio_anubis,macaca_mulatta)),(mus_musculus,rattus_norvegicus)); **
Mouse › chromosome:GRCm38:4:156188534:156188535:-1
Ancestral sequences › (mus_musculus,rattus_norvegicus);
Rat › chromosome:Rnor_5.0:5:177087882:177087883:-1
Ancestral sequences › ((((homo_sapiens,pan_troglodytes),(papio_anubis,macaca_mulatta)),(mus_musculus,rattus_norvegicus)),((sus_scrofa,bos_taurus),canis_familiaris));
Cow › chromosome:UMD3.1:16:52694475:52694476:-1
Ancestral sequences › (sus_scrofa,bos_taurus);
Pig › chromosome:Sscrofa10.2:6:57872690:57872691:-1
Ancestral sequences › ((sus_scrofa,bos_taurus),canis_familiaris);
Dog › chromosome:CanFam3.1:5:56250642:56250643:1
Human G
Ancestral sequences G
Chimpanzee G
Ancestral sequences G *
Macaque G
Ancestral sequences G
Olive baboon G
Ancestral sequences T **
Mouse C
Ancestral sequences C
Rat C
Ancestral sequences T
Cow T
Ancestral sequences T
Pig G
Ancestral sequences T
Dog T
We don't store speciation times for the age of base track. Information
regarding speciation times can be obtained from sites such as Time Tree
(http://www.timetree.org/).
HTH,
Stephen.
On Fri, 26 Sep 2014, Tang, Haiming wrote:
> HI, Stephen
> I followed your instructions and got the bed file.
>
> Column 4 appears to list the species for which that base is the same as in human, since it looks like Hsap is in every line.
> The number in square brackets [] is just the number of species listed.
>
> But the file doesn’t seem to give the age of the base.
>
> For example: How to interpret Ggor-Hsap-Hsap-Pabe[4] in
>
> "chrY 57107125 57107126 Ggor-Hsap-Hsap-Pabe[4] 120 30,30,255"?
>
> Are Ggor and Hsap ancestral species?
>
> Or Age of base is stored at somewhere else?
>
> Thanks
>
> Haiming
>
> On Fri, Sep 26, 2014 at 2:47 AM, Stephen Fitzgerald <stephenf at ebi.ac.uk> wrote:
> Hi Haiming,
> the compara API is used to retrieve information from the compara database. However the "Age of Base" track is
> generated from a Bigbed binary file, so it is not part of the compara database. The Bigbed file is generated from a
> Bed file. I have transferred this Bed file (from release 76) to our ftp site. You can retrieve this file using
> anonymous ftp from here:
>
> ftp ftp.ebi.ac.uk
>
> cd pub/software/ensembl/stephen/BaseAge/
>
> get base_age_76.bed.gz
>
> Hope this helps,
> Stephen.
>
>
> On Thu, 25 Sep 2014, Tang, Haiming wrote:
>
>
> DEAR GROUP, MY NAME IS HAIMING TANG. I'M IN DR PAUL THOMAS'S GROUP IN UNIVERSITY OF SOUTHERN
> CALIFORNIA.
>
> I'm trying to retrieve "Age of Base" using Perl API.
>
> As described in "http://www.ensembl.org/info/genome/compara/analyses.html#age_of_base"
>
> "Age of Base
>
> From these ancestral sequences, we infer the age of a base, i.e. the timing of the most recent mutation
> for each
> base of the genome. Each position of the human genome is compared to its immediate inferred ancestor,
> then its
> ancestor, etc. until a difference is found. The inferred substitution event therefore occurred on a
> specific
> branch of the tree, which is identified by all the extant species which eventually descended from that
> branch, as
> illustrated below."
>
> "Age of base" has close relation with EPO ancestral alignment. But I could find any related method in
> Compara Perl
> API Documentation or Compara API Tutorial.
>
> Can anyone show me how to do to retrieve "age of base"?
>
> Thank you in advance.
>
> Haiming
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
More information about the Dev
mailing list