[ensembl-dev] Using the Ensembl REST API to determine FTP URLs for genomes

Kurt Wheeler kurt.wheeler91 at gmail.com
Thu Dec 19 15:10:50 GMT 2019


Hello,

I'm trying to figure out how to programmatically find this URL:
ftp://ftp.ensemblgenomes.org/pub/bacteria/release-45/fasta/bacteria_13_collection/pseudomonas_aeruginosa_pao1/dna/

I got that URL by going to
https://bacteria.ensembl.org/Pseudomonas_aeruginosa_pao1/Info/Index/ and
clicking a link that said: "Download DNA sequence (FASTA)". However I can't
figure out how to get the API to tell me that and I don't want to scrape
the HTML for the link.

Does anyone know how to find that URL for a given organism/strain?

Thanks,

- Kurt

P.S. I solved this problem for divisions other than bacteria by building
the URLs with information that the API does provide:
https://github.com/AlexsLemonade/refinebio/blob/dev/foreman/data_refinery_foreman/surveyor/transcriptome_index.py#L48

However in the FTP server the bacteria are broken up into collections which
I'm having trouble figuring out how to determine.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20191219/d920b2bc/attachment.html>


More information about the Dev mailing list