[ensembl-dev] Species metadata file ("assembly" or "assembly_accession" labels) to species ftp paths mapping

Stefano Giorgetti sgiorgetti at ebi.ac.uk
Fri Jul 11 19:45:05 BST 2025


Dear Allan,

Thanks for your message, and happy to help.

As per your last question, I am not aware of a file describing the 
entire directory tree TBH.
It is possibly something for us to consider providing.

This said, the directory tree for FASTA is similar to the GFF3 one

For plants
http://ftp.ensemblgenomes.org/pub/plants/release-60/fasta/<species>/[cdna,cds,dna,dna_index,ncrna,pep]

For bacteria
http://ftp.ensemblgenomes.org/pub/bacteria/release-60/fasta/<collection>/<species>/[cdna,cds,dna,dna_index,ncrna,pep]

The extra complexity layer here is because of the type of sequence you 
may be interested in; please, see the options in the square brackets above.

Regrettably, there is no explicit (and convenient) mapping assembly 
accession --> fasta: for the time being you have to create it using the 
species metadata file(s).

accession --> (collection,)species --> fasta files

Hope it helps

Kind regards,

Stefano

On 11/07/2025 2:42 pm, Allan Kamau wrote:
> Dear Stefano,
>
> Thank you for your reply and advice.
>
> I am now constructing the ftp urls to gtf resources using data from 
> the species field and data extracted from core_db field as you suggested.
>
> Regarding fasta data, is there metadata containing the entire 
> directory tree of the entire ftp directory by which I could easily 
> identify the specific fasta file type for a given accession?
>
> Regards,
> Allan.
>
> On Thu, Jul 10, 2025 at 1:58 PM Stefano Giorgetti 
> <sgiorgetti at ebi.ac.uk> wrote:
>
>     Dear Allan,
>
>     Thanks for your email and for using Ensembl services.
>
>     We have 2 main cases: stand-alone species (for instance all the
>     plants') and species sharing a "collection DB" - like bacteria.
>
>     For the stand-alone species - say plants - the path to the
>     (release 60) GTF would be
>     http://ftp.ensemblgenomes.org/pub/plants/release-60/gtf/<species>/
>     where <species> can be found from one of the species metadata files.
>
>     For species belonging to a collection - say bacteria - the path to
>     the (release 60) GTF would be
>     http://ftp.ensemblgenomes.org/pub/release-60/bacteria/gtf/<collection>/<species>/
>     Regrettably, there is no trivial way to get the collection the
>     species belongs to.
>     One hopefully not-too-cumbersome would be to extract it from the
>     "core_db" field from the species metadata file.
>     For instance for "acetobacter_syzygii_gca_002276805", we have core
>     db "bacteria_60_collection_core_60_113_1", the collection name
>     would be "bacteria_60_collection"; thus giving
>     http://ftp.ensemblgenomes.org/pub/release-60/bacteria/gtf/bacteria_60_collection/acetobacter_syzygii_gca_002276805//
>     /
>     /
>     /
>     Hope it helps.
>     Any questions, please do not hesitate to ask.
>
>     Kind regards,
>     Stefano on behalf of the Ensembl team/
>     /
>     /
>     /
>     On 10/07/2025 6:49 am, Allan Kamau wrote:
>>     Greetings,
>>
>>     Given an entry from one of the species metadata files such as
>>     "ftp.ensemblgenomes.org/pub/plants/release-60/species_EnsemblPlants.txt
>>     <http://ftp.ensemblgenomes.org/pub/plants/release-60/species_EnsemblPlants.txt>"
>>     I would like to determine the ftp path to the "gtf" data of the
>>     given species.
>>
>>     Is there such a mapping file or mechanism that I can use?
>>
>>     Or in short if I have an "assembly" value such as "ASM16007v2" or
>>     and an "assembly_accession" label for example "GCA_000160075.2"
>>     is there a way to determine the ftp path to the gtf data which is
>>     "tp.ensemblgenomes.org/pub/release-60/bacteria/gtf/bacteria_118_collection/abiotrophia_defectiva_atcc_49176_gca_000160075
>>     <http://tp.ensemblgenomes.org/pub/release-60/bacteria/gtf/bacteria_118_collection/abiotrophia_defectiva_atcc_49176_gca_000160075>"
>>     in this case?
>>
>>     Regards,
>>     - Allan.
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org
>>     Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>     Ensembl Blog:http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20250711/696358d7/attachment-0001.html>


More information about the Dev mailing list