[ensembl-dev] FEATURE REQUEST: Improving annotation of chromosome synonyms

Noah Dukler ndukler at gmail.com
Mon Jul 20 17:46:08 BST 2020


Thanks Andy, I really appreciate this! I don't see much support for UCSC
names outside humans though and that's honestly the other identifier we're
mostly worried about being in common use when our users try to subset the
genome for simulation. Would it be possible to add the UCSC identifiers to
the other species as well? For us the priority would be the species listed
in this catalog:  https://stdpopsim.readthedocs.io/en/latest/catalog.html +
Bos Taurus which we're adding soon. Thanks again for your time!

Noah

On Mon, Jul 20, 2020 at 7:31 AM Andy Yates <ayates at ebi.ac.uk> wrote:

> Also to say that Jose is working with Ensembl as part of GA4GH's refget
> standard to implement reverse lookup of identifiers to GA4GH identifiers,
> which Ensembl is also involved with.
>
> In addition Noah, synonyms are already available via the current REST API.
> Just not via the xref endpoints but the /info/assembly endpoints. However
> coverage will not be 100%. For example
>
>
> http://rest.ensembl.org/info/assembly/homo_sapiens/X?content-type=application/json;synonyms=1
>
> Returns
>
> {
>    "length":156040895,
>    "is_chromosome":1,
>    "is_circular":0,
>    "synonyms":[
>       {
>          "name":"CM000685.2",
>          "dbname":"INSDC"
>       },
>       {
>          "dbname":"UCSC",
>          "name":"chrX"
>       },
>       {
>          "dbname":"RefSeq_genomic",
>          "name":"NC_000023.11"
>       }
>    ],
>    "assembly_name":"GRCh38",
>    "assembly_exception_type":"REF",
>    "coordinate_system":"chromosome"
> }
>
> It is also possible to do the reverse lookup since all synonyms can be
> used as valid sequence region names in queries:
>
>
> http://rest.ensembl.org/info/assembly/homo_sapiens/NC_000023.11?content-type=application/json;synonyms=1
>
> {
>    "is_chromosome":1,
>    "length":156040895,
>    "is_circular":0,
>    "synonyms":[
>       {
>          "name":"CM000685.2",
>          "dbname":"INSDC"
>       },
>       {
>          "name":"chrX",
>          "dbname":"UCSC"
>       },
>       {
>          "dbname":"RefSeq_genomic",
>          "name":"NC_000023.11"
>       }
>    ],
>    "assembly_name":"GRCh38",
>    "assembly_exception_type":"REF",
>    "coordinate_system":"chromosome"
> }
>
> You can also request this for a single genome using
> http://rest.ensembl.org/info/assembly/homo_sapiens?content-type=application/json;synonyms=1.
> I'll not put the response in here for everyone's sake.
>
> Andy
>
>
> ------------
> Andrew Yates - Genomics Technology Infrastructure Team Leader
> The European Bioinformatics Institute (EMBL-EBI)
> Wellcome Genome Campus
> Hinxton, Cambridge
> CB10 1SD, United Kingdom
> Tel: +44-(0)1223-492538
> Fax: +44-(0)1223-494468
> Skype: andy.yates.ebi
> http://www.ebi.ac.uk/
> http://www.ensembl.org/
>
> > On 20 Jul 2020, at 12:20, jose miguel mut <jmmut at ebi.ac.uk> wrote:
> >
> > Hi Noah,
> >
> > Let me pop in the conversation. I work at EBI-EVA (
> https://www.ebi.ac.uk/eva/) and we are working on a REST webservice
> called contig-alias that matches your use case. Given a chromosome name or
> accession, you will get back its synonyms.
> >
> > It's in early stages and we don't have anything usable yet, but we plan
> to support GenBank and RefSeq chromosome accessions, as well as chromosome
> names (like "chr1"), and possibly UCSC names and GA4GH refget checksums.
> Some information about the assemblies will be available too.
> >
> > Let me know if you are willing to beta-test the system some time during
> next months and provide any feedback you find. Also, you can tell us which
> species/assemblies would you be interested in, so that we can support them
> early and have some extra focus on them.
> >
> > Regards
> > Jose
> >
> >
> > On 17/07/2020 , Noah Dukle wrote:
> >
> > Subject:      [ensembl-dev] FEATURE REQUEST: Improving annotation of
> chromosome synonyms
> > Date: Fri, 17 Jul 2020 13:41:44 -0400
> > From: Noah Dukler <ndukler at gmail.com>
> > Reply-To:     Ensembl developers list <dev at ensembl.org>
> > To:   dev at ensembl.org
> >
> >
> > Would it be possible for you to make alternative chromosome
> nomenclatures available under the `xrefs` endpoint? Such a feature would be
> immensely useful to a group I work with (stdpopsim) that is working to
> standardize population genetic simulations and improve the ease of
> realistic simulations. As of now there are relatively few resources for
> converting between different nomeclatures (eg. UCSC <--> Ensembl <--> NCBI
> <--> others). We are working on automating species specific annotations
> using the Ensembl REST API but one of our issues is that most chromosome
> synonyms are not available. The best resource I have found for converting
> IDs in in the R Bioconductor package GenomeInfoDb (
> https://github.com/terminiter/GenomeInfoDb/tree/release-3.3/inst/extdata/dataFiles).
> Thank you for your time.
> >
> > Noah Dukler
> > Post-Doc
> > Siepel Lab
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> > Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200720/4ffa8fc4/attachment.html>


More information about the Dev mailing list