[ensembl-dev] consistency between Ensemble and EnsemblGenomes FTP sites

Jacques van Helden Jacques.van-Helden at univ-amu.fr
Sun Feb 26 13:13:39 GMT 2017


Dear Ensembl and EnsemblGenomes teams,

Since several years I am downloading genomes from Ensembl in order to install them in the Regulatory Sequence Analysis Tools (RSAT: http://rsat.eu/). I used various access types (Perl API, REST Web services, FTP), and the most efficient way to download all the required information (basically, fasta sequences + gtf annotations) is via the FTP site. 

I have however some problems of consistency with the FTP download: 

1) Missing organism table on ftp://ftp.ensembl.org/

On EnsemblGenomes, there is a table providing the parameters of the available genomes (name, TAXID, assembly, GCA identifier);
	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/species_EnsemblMetazoa.txt
I did not find any equivalent table for Ensembl. 
	ftp://ftp.ensembl.org/pub/release-87/

Can it be envisaged to release such a table with the next releases ?

2) Inconsistent file naming on ftp://ftp.ensemblgenomes.org

For EnsemblGenomes, the file names are built differently depending on the species. 
For example, for Rhodnius prolixus they used the Assembly ID
	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/fasta//rhodnius_prolixus/dna/

but for Bobmyx mori they use the GCA ID
	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/fasta//bombyx_mori/dna

This makes it very tricky for people who want to write a script to download each genome based on the fields of the EnsemblGenomes summary table  (ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/species_EnsemblMetazoa.txt), since the file is sometimes built from the 5th column, sometime from the 6th column. 

Would it be possible to use a homogeneous file naming rule ? 

3) Unique access to all Metazoan genomes

I understand that Metazoan genomes are released either on the Ensembl or on the EnsemblGenomes database for historical reasons. Is there any hope to give access to all Metazoan genomes in a same FTP site (ftp://ftp.ensemblgenomes.org/pub/metazoa) ? This would not prevent from keeping the Ensembl server, but a priori it would seem logical to have all the Metazoan on metazoa.ensemblgenomes.org. 

Many thanks,

Jacques van Helden

Aix-Marseille Université (AMU). 
Lab. Technological Advances for Genomics and Clinics (TAGC)
INSERM Unit U1090, 163, Avenue de Luminy, 13288 MARSEILLE cedex 09. France
Office: INSERM building, block 6
Tel: +33 4 91 82 87 49
Fax: +33 4 91 82 87 01
Web:  http://jacques.van-helden.perso.luminy.univ-amu.fr/
Email: Jacques.van-Helden at univ-amu.fr











More information about the Dev mailing list