[ensembl-dev] consistency between Ensemble and EnsemblGenomes FTP sites
Jacques van Helden
Jacques.van-Helden at univ-amu.fr
Sun Feb 26 13:13:39 GMT 2017
Dear Ensembl and EnsemblGenomes teams,
Since several years I am downloading genomes from Ensembl in order to install them in the Regulatory Sequence Analysis Tools (RSAT: http://rsat.eu/). I used various access types (Perl API, REST Web services, FTP), and the most efficient way to download all the required information (basically, fasta sequences + gtf annotations) is via the FTP site.
I have however some problems of consistency with the FTP download:
1) Missing organism table on ftp://ftp.ensembl.org/
On EnsemblGenomes, there is a table providing the parameters of the available genomes (name, TAXID, assembly, GCA identifier);
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/species_EnsemblMetazoa.txt
I did not find any equivalent table for Ensembl.
ftp://ftp.ensembl.org/pub/release-87/
Can it be envisaged to release such a table with the next releases ?
2) Inconsistent file naming on ftp://ftp.ensemblgenomes.org
For EnsemblGenomes, the file names are built differently depending on the species.
For example, for Rhodnius prolixus they used the Assembly ID
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/fasta//rhodnius_prolixus/dna/
but for Bobmyx mori they use the GCA ID
ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/fasta//bombyx_mori/dna
This makes it very tricky for people who want to write a script to download each genome based on the fields of the EnsemblGenomes summary table (ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/species_EnsemblMetazoa.txt), since the file is sometimes built from the 5th column, sometime from the 6th column.
Would it be possible to use a homogeneous file naming rule ?
3) Unique access to all Metazoan genomes
I understand that Metazoan genomes are released either on the Ensembl or on the EnsemblGenomes database for historical reasons. Is there any hope to give access to all Metazoan genomes in a same FTP site (ftp://ftp.ensemblgenomes.org/pub/metazoa) ? This would not prevent from keeping the Ensembl server, but a priori it would seem logical to have all the Metazoan on metazoa.ensemblgenomes.org.
Many thanks,
Jacques van Helden
Aix-Marseille Université (AMU).
Lab. Technological Advances for Genomics and Clinics (TAGC)
INSERM Unit U1090, 163, Avenue de Luminy, 13288 MARSEILLE cedex 09. France
Office: INSERM building, block 6
Tel: +33 4 91 82 87 49
Fax: +33 4 91 82 87 01
Web: http://jacques.van-helden.perso.luminy.univ-amu.fr/
Email: Jacques.van-Helden at univ-amu.fr
More information about the Dev
mailing list