[ensembl-dev] Metazoa GeneTrees as PhyloXML dumps
Dan Staines
dstaines at ebi.ac.uk
Sat Apr 21 09:10:59 BST 2012
Hi Gus,
The tarball contains 1 phyloxml file per gene tree, named for the gene
tree stable identifier which is displayed on each gene tree in the
interface and which can be used to reference a given tree (e.g.
http://metazoa.ensembl.org/Multi/GeneTree?gt=EMGT00050000011996). As you
correctly guessed, EMGT does indeed stand for Ensembl Metazoa Gene Tree
and the number provides a stable identifier for a gene tree between
releases of Ensembl Metazoa (likewise EFGT, EPrGT etc. in other
divisions of EG). The directory structure you see is purely to prevent
too many files from being dumped in a single directory (instead they are
divided into a maximum of 1000 files per directory). I'll update our
README to contain this information in future releases.
Having said that, if you're just after homologies between proteins in
different species, we also provide a tab separated file (see the header
line for details of the columns):
ftp://ftp.ensemblgenomes.org/pub/release-13/plants/tsv/compara/Compara.homologies.13.tsv.gz
You can also use BioMart to dump customised datasets including
homologies if you need other properties not in this file.
Hope this helps - please let us know if there is anything else you need.
Dan.
--
Dan Staines, PhD Ensembl Genomes Technical Coordinator
EMBL-EBI Tel: +44-(0)1223-492507
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the Dev
mailing list