[ensembl-dev] FEATURE REQUEST: Improving annotation of chromosome synonyms

Andy Yates ayates at ebi.ac.uk
Mon Jul 20 17:51:52 BST 2020


The synonyms are loaded directly into our databases & therefore any additional support will require us to integrate this into a new data production run. Those likely to be involved in that load are on this call and I'll reach out to see how we can do this.

Andy

------------
Andrew Yates - Genomics Technology Infrastructure Team Leader
The European Bioinformatics Institute (EMBL-EBI)
Wellcome Genome Campus
Hinxton, Cambridge
CB10 1SD, United Kingdom
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
Skype: andy.yates.ebi
http://www.ebi.ac.uk/
http://www.ensembl.org/

> On 20 Jul 2020, at 17:46, Noah Dukler <ndukler at gmail.com> wrote:
> 
> Thanks Andy, I really appreciate this! I don't see much support for UCSC names outside humans though and that's honestly the other identifier we're mostly worried about being in common use when our users try to subset the genome for simulation. Would it be possible to add the UCSC identifiers to the other species as well? For us the priority would be the species listed in this catalog:  https://stdpopsim.readthedocs.io/en/latest/catalog.html + Bos Taurus which we're adding soon. Thanks again for your time!
> 
> Noah
> 
> On Mon, Jul 20, 2020 at 7:31 AM Andy Yates <ayates at ebi.ac.uk> wrote:
> Also to say that Jose is working with Ensembl as part of GA4GH's refget standard to implement reverse lookup of identifiers to GA4GH identifiers, which Ensembl is also involved with.
> 
> In addition Noah, synonyms are already available via the current REST API. Just not via the xref endpoints but the /info/assembly endpoints. However coverage will not be 100%. For example
> 
> http://rest.ensembl.org/info/assembly/homo_sapiens/X?content-type=application/json;synonyms=1
> 
> Returns
> 
> {
>    "length":156040895,
>    "is_chromosome":1,
>    "is_circular":0,
>    "synonyms":[
>       {
>          "name":"CM000685.2",
>          "dbname":"INSDC"
>       },
>       {
>          "dbname":"UCSC",
>          "name":"chrX"
>       },
>       {
>          "dbname":"RefSeq_genomic",
>          "name":"NC_000023.11"
>       }
>    ],
>    "assembly_name":"GRCh38",
>    "assembly_exception_type":"REF",
>    "coordinate_system":"chromosome"
> }
> 
> It is also possible to do the reverse lookup since all synonyms can be used as valid sequence region names in queries:
> 
> http://rest.ensembl.org/info/assembly/homo_sapiens/NC_000023.11?content-type=application/json;synonyms=1
> 
> {
>    "is_chromosome":1,
>    "length":156040895,
>    "is_circular":0,
>    "synonyms":[
>       {
>          "name":"CM000685.2",
>          "dbname":"INSDC"
>       },
>       {
>          "name":"chrX",
>          "dbname":"UCSC"
>       },
>       {
>          "dbname":"RefSeq_genomic",
>          "name":"NC_000023.11"
>       }
>    ],
>    "assembly_name":"GRCh38",
>    "assembly_exception_type":"REF",
>    "coordinate_system":"chromosome"
> }
> 
> You can also request this for a single genome using http://rest.ensembl.org/info/assembly/homo_sapiens?content-type=application/json;synonyms=1. I'll not put the response in here for everyone's sake.
> 
> Andy
> 
> 
> ------------
> Andrew Yates - Genomics Technology Infrastructure Team Leader
> The European Bioinformatics Institute (EMBL-EBI)
> Wellcome Genome Campus
> Hinxton, Cambridge
> CB10 1SD, United Kingdom
> Tel: +44-(0)1223-492538
> Fax: +44-(0)1223-494468
> Skype: andy.yates.ebi
> http://www.ebi.ac.uk/
> http://www.ensembl.org/
> 
> > On 20 Jul 2020, at 12:20, jose miguel mut <jmmut at ebi.ac.uk> wrote:
> > 
> > Hi Noah,
> > 
> > Let me pop in the conversation. I work at EBI-EVA (https://www.ebi.ac.uk/eva/) and we are working on a REST webservice called contig-alias that matches your use case. Given a chromosome name or accession, you will get back its synonyms.
> > 
> > It's in early stages and we don't have anything usable yet, but we plan to support GenBank and RefSeq chromosome accessions, as well as chromosome names (like "chr1"), and possibly UCSC names and GA4GH refget checksums. Some information about the assemblies will be available too.
> > 
> > Let me know if you are willing to beta-test the system some time during next months and provide any feedback you find. Also, you can tell us which species/assemblies would you be interested in, so that we can support them early and have some extra focus on them.
> > 
> > Regards
> > Jose
> > 
> > 
> > On 17/07/2020 , Noah Dukle wrote:
> > 
> > Subject:      [ensembl-dev] FEATURE REQUEST: Improving annotation of chromosome synonyms
> > Date: Fri, 17 Jul 2020 13:41:44 -0400
> > From: Noah Dukler <ndukler at gmail.com>
> > Reply-To:     Ensembl developers list <dev at ensembl.org>
> > To:   dev at ensembl.org
> > 
> > 
> > Would it be possible for you to make alternative chromosome nomenclatures available under the `xrefs` endpoint? Such a feature would be immensely useful to a group I work with (stdpopsim) that is working to standardize population genetic simulations and improve the ease of realistic simulations. As of now there are relatively few resources for converting between different nomeclatures (eg. UCSC <--> Ensembl <--> NCBI <--> others). We are working on automating species specific annotations using the Ensembl REST API but one of our issues is that most chromosome synonyms are not available. The best resource I have found for converting IDs in in the R Bioconductor package GenomeInfoDb (https://github.com/terminiter/GenomeInfoDb/tree/release-3.3/inst/extdata/dataFiles). Thank you for your time.
> >  
> > Noah Dukler
> > Post-Doc
> > Siepel Lab
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> > Ensembl Blog: http://www.ensembl.info/
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list