[ensembl-dev] Question regarding gene Annotations discrepancies between GRCh38 and 37 on the latest API.

Andy Yates ayates at ebi.ac.uk
Wed Jul 1 14:49:10 BST 2015


Hi Duarte,

Maintaining consistent and up-to date information for non-live 
assemblies is a difficult task. Especially in the case of cross 
references where a number of, if not most, external resources maintain 
mappings to those assemblies on ensembl.org and not to the archived 
versions. I understand the annoyance but at the moment we feel it better 
to maintain the human GENCODE19 gene set and its associated annotation 
than to reprocess.

Andy

Duarte Molha wrote:
> I understand... But shouldn't at least genes that you have maintained
> from one assembly to the next be updated in terms of gene names? Or at
> least add the most up-to-date gene symbol to the alias field?
>
> It is a bit annoying because we will not be able to search for a correct
> gene name on the GRCh37 because it will only be listed on the database
> with its previous alias.
>
> =========================
>       Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
> On 1 July 2015 at 11:58, mag <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>> wrote:
>
>     Dear Duarte,
>
>     The gene set in the GRCh37 database is a freeze from release 75.
>     The variation and regulation annotation is updated when major data
>     sets are released.
>
>     The gene set however is not updated given our major evidence sources
>     are updated to GRCh38.
>
>
>     Regards,
>     Magali
>
>
>     On 01/07/2015 11:03, Duarte Molha wrote:
>>     Dear developers
>>
>>     It was my understanding that ensembl would be keeping 2 databases
>>     running in parallel for both GRCh38 and GRCh37 and so all genes
>>     would be receiving annotation updates and both databases could be
>>     queried with the latest and greatest ensembl perl api.
>>
>>     However I am finding inconsistencies in gene annotation what have
>>     left me puzzled.
>>
>>     Take for example gene DNAAF5
>>
>>     http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=26013
>>
>>     If you query ensembl GRCh38 V80, the gene is there properly annotated
>>
>>     http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000164818;r=7:726701-786475
>>
>>     However, on GRCh37 V80, the gene is still annotated with the
>>     previous HGNC symbol HEATR2
>>
>>     http://grch37.ensembl.org/Multi/Search/Results?q=HEATR2;site=ensembl;page=1
>>
>>     the ENSEMBL gene ID is the same in both cases :
>>
>>     ENSG00000164818
>>
>>     Is my thinking flawed? What is the reason for the out-of-date gene
>>     annotation for this gene?
>>
>>     Many thanks
>>
>>     Duarte
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
Andrew Yates - Genomics Technology Infrastructure Team Leader
European Molecular Biology Laboratory
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, United Kingdom
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
Skype: andrewyatz
http://www.ensembl.org/




More information about the Dev mailing list