[ensembl-dev] consistency between Ensemble and EnsemblGenomes FTP sites

Dan Staines dstaines at ebi.ac.uk
Mon Feb 27 11:11:14 GMT 2017


Hi Jacques,

> Since several years I am downloading genomes from Ensembl in order to install them in the Regulatory Sequence Analysis Tools (RSAT: http://rsat.eu/). I used various access types (Perl API, REST Web services, FTP), and the most efficient way to download all the required information (basically, fasta sequences + gtf annotations) is via the FTP site.
>
> I have however some problems of consistency with the FTP download:
>
> 1) Missing organism table on ftp://ftp.ensembl.org/
>
> On EnsemblGenomes, there is a table providing the parameters of the available genomes (name, TAXID, assembly, GCA identifier);
> 	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/species_EnsemblMetazoa.txt
> I did not find any equivalent table for Ensembl.
> 	ftp://ftp.ensembl.org/pub/release-87/
>
> Can it be envisaged to release such a table with the next releases ?

This will appear in a future release once the same metadata database & 
API is deployed for Ensembl and Ensembl Genomes. I don't know when this 
is likely to be, but probably later this year.

> 2) Inconsistent file naming on ftp://ftp.ensemblgenomes.org
>
> For EnsemblGenomes, the file names are built differently depending on the species.
> For example, for Rhodnius prolixus they used the Assembly ID
> 	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/fasta//rhodnius_prolixus/dna/
>
> but for Bobmyx mori they use the GCA ID
> 	ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/fasta//bombyx_mori/dna
>
> This makes it very tricky for people who want to write a script to download each genome based on the fields of the EnsemblGenomes summary table  (ftp://ftp.ensemblgenomes.org/pub/metazoa/release-34/species_EnsemblMetazoa.txt), since the file is sometimes built from the 5th column, sometime from the 6th column.
>
> Would it be possible to use a homogeneous file naming rule ?

There is a uniform naming rule which should use the assembly.default 
meta key so it should be entirely predictable for you. I don't know why 
you're seeing this inconsistency though - will look into it for the next 
release.

One of our plans for the future is to provide REST endpoints to give 
URLs for downloading files for different combinations of so this should 
make your life easier.

Cheers,

Dan.

-- 
Dan Staines, PhD
Genomics Technology Infrastructure Coordinator
EMBL-EBI, Wellcome Trust Genome Campus
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-492507




More information about the Dev mailing list