[ensembl-dev] Species metadata file ("assembly" or "assembly_accession" labels) to species ftp paths mapping

Allan Kamau kamauallan at gmail.com
Fri Jul 11 14:42:18 BST 2025


Dear Stefano,

Thank you for your reply and advice.

I am now constructing the ftp urls to gtf resources using data from the
species field and data extracted from core_db field as you suggested.

Regarding fasta data, is there metadata containing the entire directory
tree of the entire ftp directory by which I could easily identify the
specific fasta file type for a given accession?

Regards,
Allan.

On Thu, Jul 10, 2025 at 1:58 PM Stefano Giorgetti <sgiorgetti at ebi.ac.uk>
wrote:

> Dear Allan,
>
> Thanks for your email and for using Ensembl services.
>
> We have 2 main cases: stand-alone species (for instance all the plants')
> and species sharing a "collection DB" - like bacteria.
>
> For the stand-alone species - say plants - the path to the (release 60)
> GTF would be
> http://ftp.ensemblgenomes.org/pub/plants/release-60/gtf/<species>/
> where <species> can be found from one of the species metadata files.
> For species belonging to a collection - say bacteria - the path to the
> (release 60) GTF would be
> http://ftp.ensemblgenomes.org/pub/release-60/bacteria/gtf/
> <collection>/<species>/
> Regrettably, there is no trivial way to get the collection the species
> belongs to.
> One hopefully not-too-cumbersome would be to extract it from the "core_db"
> field from the species metadata file.
> For instance for "acetobacter_syzygii_gca_002276805", we have core db
> "bacteria_60_collection_core_60_113_1", the collection name would be
> "bacteria_60_collection"; thus giving
> http://ftp.ensemblgenomes.org/pub/release-60/bacteria/gtf/bacteria_60_collection/acetobacter_syzygii_gca_002276805/
>
> Hope it helps.
> Any questions, please do not hesitate to ask.
>
> Kind regards,
> Stefano on behalf of the Ensembl team
>
> On 10/07/2025 6:49 am, Allan Kamau wrote:
>
> Greetings,
>
> Given an entry from one of the species metadata files such as "
> ftp.ensemblgenomes.org/pub/plants/release-60/species_EnsemblPlants.txt" I
> would like to determine the ftp path to the "gtf" data of the given
> species.
>
> Is there such a mapping file or mechanism that I can use?
>
> Or in short if I have an "assembly" value such as "ASM16007v2" or and an
> "assembly_accession" label for example "GCA_000160075.2" is there a way to
> determine the ftp path to the gtf data which is "
> tp.ensemblgenomes.org/pub/release-60/bacteria/gtf/bacteria_118_collection/abiotrophia_defectiva_atcc_49176_gca_000160075"
> in this case?
>
> Regards,
> - Allan.
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20250711/658943bb/attachment.html>


More information about the Dev mailing list