[ensembl-dev] Question regarding gene Annotations discrepancies between GRCh38 and 37 on the latest API.
Andy Yates
ayates at ebi.ac.uk
Wed Jul 1 14:49:10 BST 2015
Hi Duarte,
Maintaining consistent and up-to date information for non-live
assemblies is a difficult task. Especially in the case of cross
references where a number of, if not most, external resources maintain
mappings to those assemblies on ensembl.org and not to the archived
versions. I understand the annoyance but at the moment we feel it better
to maintain the human GENCODE19 gene set and its associated annotation
than to reprocess.
Andy
Duarte Molha wrote:
> I understand... But shouldn't at least genes that you have maintained
> from one assembly to the next be updated in terms of gene names? Or at
> least add the most up-to-date gene symbol to the alias field?
>
> It is a bit annoying because we will not be able to search for a correct
> gene name on the GRCh37 because it will only be listed on the database
> with its previous alias.
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
> On 1 July 2015 at 11:58, mag <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>> wrote:
>
> Dear Duarte,
>
> The gene set in the GRCh37 database is a freeze from release 75.
> The variation and regulation annotation is updated when major data
> sets are released.
>
> The gene set however is not updated given our major evidence sources
> are updated to GRCh38.
>
>
> Regards,
> Magali
>
>
> On 01/07/2015 11:03, Duarte Molha wrote:
>> Dear developers
>>
>> It was my understanding that ensembl would be keeping 2 databases
>> running in parallel for both GRCh38 and GRCh37 and so all genes
>> would be receiving annotation updates and both databases could be
>> queried with the latest and greatest ensembl perl api.
>>
>> However I am finding inconsistencies in gene annotation what have
>> left me puzzled.
>>
>> Take for example gene DNAAF5
>>
>> http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=26013
>>
>> If you query ensembl GRCh38 V80, the gene is there properly annotated
>>
>> http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000164818;r=7:726701-786475
>>
>> However, on GRCh37 V80, the gene is still annotated with the
>> previous HGNC symbol HEATR2
>>
>> http://grch37.ensembl.org/Multi/Search/Results?q=HEATR2;site=ensembl;page=1
>>
>> the ENSEMBL gene ID is the same in both cases :
>>
>> ENSG00000164818
>>
>> Is my thinking flawed? What is the reason for the out-of-date gene
>> annotation for this gene?
>>
>> Many thanks
>>
>> Duarte
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
--
Andrew Yates - Genomics Technology Infrastructure Team Leader
European Molecular Biology Laboratory
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, United Kingdom
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
Skype: andrewyatz
http://www.ensembl.org/
More information about the Dev
mailing list