[ensembl-dev] Missing files on the FTP
Matthieu Muffato
mm49 at sanger.ac.uk
Fri Sep 12 15:08:48 BST 2025
Dear Ensembl team,
I parse https://ftp.ebi.ac.uk/pub/ensemblorganisms/species.json to find out what can be retrieved the FTP. I would like to report that among the ~65k files referenced, ~3.5k don't seem to exist.
For instance:
"chromosomes.tsv.gz": "Kalanchoe_fedtschenkoi/GCA_002312845.1/genome/chromosomes.tsv.gz",
The directory https://ftp.ebi.ac.uk/pub/ensemblorganisms/Kalanchoe_fedtschenkoi/GCA_002312845.1/genome/ exists but it has no chromosomes.tsv.gz file
I see that across a variety of files (cdna.fa.gz, genes.embl.gz, regulation.gff, variation.vcf.gz, etc). Sometimes the entire species directory is missing:
"genes.gtf.gz": "Melinaea_menophilus_n_ssp_AW-2005/GCA_918358695.1/ensembl/geneset/2022_07/genes.gtf.gz",
There is no https://ftp.ebi.ac.uk/pub/ensemblorganisms/Melinaea_menophilus_n_ssp_AW-2005/
I’d like to understand if the files are genuinely missing and may be added later, or perhaps species.json was malformed in the first place. I also wonder if the opposite may happen: files present on the FTP but not listed in species.json
Kind regards,
Matthieu (he/him)
--
Informatics Infrastructure Team Lead – Tree of Life programme
Wellcome Sanger Institute
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20250912/a9fe5cff/attachment-0001.html>
More information about the Dev
mailing list