[ensembl-dev] consistency between Ensemble and EnsemblGenomes FTP	sites
    Jacques van Helden 
    Jacques.van-Helden at univ-amu.fr
       
    Sun Feb 26 13:13:39 GMT 2017
    
    
  
Dear Ensembl and EnsemblGenomes teams,
Since several years I am downloading genomes from Ensembl in order to install them in the Regulatory Sequence Analysis Tools (RSAT: http://rsat.eu/). I used various access types (Perl API, REST Web services, FTP), and the most efficient way to download all the required information (basically, fasta sequences + gtf annotations) is via the FTP site. 
I have however some problems of consistency with the FTP download: 
1) Missing organism table on ftp://ftp.ensembl.org/
On EnsemblGenomes, there is a table providing the parameters of the available genomes (name, TAXID, assembly, GCA identifier);
	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/species_EnsemblMetazoa.txt
I did not find any equivalent table for Ensembl. 
	ftp://ftp.ensembl.org/pub/release-87/
Can it be envisaged to release such a table with the next releases ?
2) Inconsistent file naming on ftp://ftp.ensemblgenomes.org
For EnsemblGenomes, the file names are built differently depending on the species. 
For example, for Rhodnius prolixus they used the Assembly ID
	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/fasta//rhodnius_prolixus/dna/
but for Bobmyx mori they use the GCA ID
	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/fasta//bombyx_mori/dna
This makes it very tricky for people who want to write a script to download each genome based on the fields of the EnsemblGenomes summary table  (ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/species_EnsemblMetazoa.txt), since the file is sometimes built from the 5th column, sometime from the 6th column. 
Would it be possible to use a homogeneous file naming rule ? 
3) Unique access to all Metazoan genomes
I understand that Metazoan genomes are released either on the Ensembl or on the EnsemblGenomes database for historical reasons. Is there any hope to give access to all Metazoan genomes in a same FTP site (ftp://ftp.ensemblgenomes.org/pub/metazoa) ? This would not prevent from keeping the Ensembl server, but a priori it would seem logical to have all the Metazoan on metazoa.ensemblgenomes.org. 
Many thanks,
Jacques van Helden
Aix-Marseille Université (AMU). 
Lab. Technological Advances for Genomics and Clinics (TAGC)
INSERM Unit U1090, 163, Avenue de Luminy, 13288 MARSEILLE cedex 09. France
Office: INSERM building, block 6
Tel: +33 4 91 82 87 49
Fax: +33 4 91 82 87 01
Web:  http://jacques.van-helden.perso.luminy.univ-amu.fr/
Email: Jacques.van-Helden at univ-amu.fr
    
    
More information about the Dev
mailing list