[ensembl-dev] REST API - unable to specify release version in requests

Kurt Wheeler kurt.wheeler91 at gmail.com
Mon May 21 19:35:28 BST 2018


TL;DR; I would like to able to specify the release version I am
querying for when using the REST API like so:

https://rest.ensembl.org/info/species?release=91&content-type=application/json

where the key part of that request is `?release=91`


Here's why:

I don't use Ensembl through the R client, so I can't run:

use Bio::EnsEMBL::ApiVersion;
printf( "The API version used is %s\n", software_version() );

to provide the API version I am using. However my issue is actually
directly linked to not being able to specify what version of the REST
API I am using. My project uses Ensembl's REST API to build an FTP URL
to then download from. Specifically, I use:

https://rest.ensembl.org/documentation/info/species

and

http://rest.ensemblgenomes.org/info/genomes/division/{division}?content-type=application/json
(replacing {division} with the division of Ensembl I am trying to
access)

>From that response, I use a few fields to construct the URL. For
example consider this species:

{
    division: "Ensembl",
    taxon_id: "7955",
    name: "danio_rerio",
    release: 92,
    display_name: "Zebrafish",
    accession: "GCA_000002035.4",
    strain_collection: null,
    common_name: "zebrafish",
    strain: null,
    aliases: [
        "drer",
        "danio rerio",
        "d_rerio",
        "danio",
        "zebrafish",
        "7955",
        "danrer",
        "drerio",
        "zfish"
    ],
    groups: [
        "core",
        "otherfeatures",
        "rnaseq",
        "variation",
        "funcgen"
    ],
    assembly: "GRCz11"
}

and its corresponding URLs for GTF and FASTA files:
ftp://ftp.ensembl.org/pub/release-92/gtf/danio_rerio/Danio_rerio.GRCz11.92.gtf.gz
ftp://ftp.ensembl.org/pub/release-92/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.toplevel.fa.gz

We use a few fields to do this, but as an example consider the
`assembly` field. This field changed from `GRCz10` to `GRCz11` between
release 91 and 92. Therefore the old URLs for the files I am
interested in are:

ftp://ftp.ensembl.org/pub/release-91/gtf/danio_rerio/Danio_rerio.GRCz10.91.gtf.gz
ftp://ftp.ensembl.org/pub/release-91/fasta/danio_rerio/dna/Danio_rerio.GRCz10.dna.toplevel.fa.gz

Since my code was trying to use release 91, but was receiving
information about the species from the REST API regarding release 92,
it generated the URLs:

ftp://ftp.ensembl.org/pub/release-91/gtf/danio_rerio/Danio_rerio.GRCz11.91.gtf.gz
ftp://ftp.ensembl.org/pub/release-91/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.toplevel.fa.gz

which do not exist. The REST API will only return information about
the latest release, which means that until I update my Ensembl version
to 92, my code breaks because the URL I build doesn't exist.

The fact that it breaks my code is annoying, but still manageable.
However what concerns me even more is that I think this endangers the
reproducibility of data generated by my project. It is very important
to my project that users be able to determine exactly what version of
everything was used so that if anyone wants to check the validity of
our work, we can tell them exactly how to replicate what we did. If
the code we're using to build transcriptome indices breaks whenever a
new Ensembl version is released, then any data we processed using
those transcriptome indices cannot be replicated exactly because
there's no way to make our code run with the old release of Ensembl.

So to summarize, I am inquiring about the possibility of adding a
`release` query parameter to the REST API. For my use case it would be
sufficient to have it on the /info/species endpoint, but it seems like
it would probably make sense API-wide.

At the end of the day, we have figured out workarounds if this query
parameter cannot be added so this isn't the end of the world. However
I think that this feature seems like something that should be
supported anyway.

Thanks,

- Kurt



More information about the Dev mailing list