[ensembl-dev] Get ftp url + path from a species name
Andy Yates
ayates at ebi.ac.uk
Tue Jan 17 09:51:29 GMT 2012
Hi Céline,
There are a number of elements you can extract from a core database's meta table which can allow you to reconstruct the path the protein dumps. For Ensembl databases you can use:
* species.production_name
* schema_version
This lets you build up a path like:
ftp://ftp.ensembl.org/pub/release-${schema_version}/fasta/${species.production_name}/pep
or for example:
ftp://ftp.ensembl.org/pub/release-65/fasta/ailuropoda_melanoleuca/pep/
The solution is a bit harder for Ensembl Genomes. Each core database has the division under the meta key
* species.division
This can be used but with some manipulation e.g. in d.mel you need to convert the meta value EnsemblMetazoa to metazoa. The issue then is the Ensembl Genomes release which for the moment would have to be a hardcoded Ensembl release -> EG release e.g. E!65 == EG12. You can then build the path up like so:
ftp://ftp.ensemblgenomes.org/pub/release-${eg_release}/${eg_division}/fasta/${species.production_name}/pep/
ftp://ftp.ensemblgenomes.org/pub/release-11/metazoa/fasta/acyrthosiphon_pisum/pep/
I hope this helps you & apologies for the long wait for a reply.
Best regards,
Andy
On 5 Jan 2012, at 16:27, Celine Noirot wrote:
> Hi,
> I'm trying to get the path to download the protein sequence from a species name, but I don't know if the species is in ensembl, plant, bacteria or else ...
> Does there is a way with the API to get from the species name the ftp url and the path to the current version ?
> Best,
> Céline
> --
>
> Céline Noirot
> Plateforme Bioinfo Genotoul- Unité BIA - INRA Toulouse 31326 Castanet-Tolosan
> Tel. 05 61 28 57 24
> http://bioinfo.genotoul.fr
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
---
Andrew Yates Ensembl Core Software Project Leader
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensembl.org/
More information about the Dev
mailing list