[ensembl-dev] Metazoa GeneTrees as PhyloXML dumps

Dan Staines dstaines at ebi.ac.uk
Sat Apr 21 09:10:59 BST 2012


Hi Gus,

The tarball contains 1 phyloxml file per gene tree, named for the gene 
tree stable identifier which is displayed on each gene tree in the 
interface and which can be used to reference a given tree (e.g. 
http://metazoa.ensembl.org/Multi/GeneTree?gt=EMGT00050000011996). As you 
correctly guessed, EMGT does indeed stand for Ensembl Metazoa Gene Tree 
and the number provides a stable identifier for a gene tree between 
releases of Ensembl Metazoa (likewise EFGT, EPrGT etc. in other 
divisions of EG). The directory structure you see is purely to prevent 
too many files from being dumped in a single directory (instead they are 
divided into a maximum of 1000 files per directory). I'll update our 
README to contain this information in future releases.

Having said that, if you're just after homologies between proteins in 
different species, we also provide a tab separated file (see the header 
line for details of the columns):
ftp://ftp.ensemblgenomes.org/pub/release-13/plants/tsv/compara/Compara.homologies.13.tsv.gz
You can also use BioMart to dump customised datasets including 
homologies if you need other properties not in this file.

Hope this helps - please let us know if there is anything else you need.

Dan.

-- 
Dan Staines, PhD               Ensembl Genomes Technical Coordinator
EMBL-EBI                       Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/




More information about the Dev mailing list