[ensembl-dev] Annotation discrepancy

Ewan Birney birney at ebi.ac.uk
Fri Nov 19 15:08:58 GMT 2010


Ian -

at least for 150000, the HGNC symbol exists, and maps via a gene with
Havana

http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000243064;r=21:15646120-15735075

But as this is a non-coding gene, it does not get the normal mapping I  
suspect.


The HGNC locus says that it is a pseudogene:

http://www.genenames.org/data/hgnc_data.php?hgnc_id=16022


As does the EntrezGene case:

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=ShowDetailView&TermToSearch=150000



So -

   I think at least some of these are pseudogenes with symbols. We  
should probably at
the very least inheriet the EntrezGene ID from the Havana->HGNC- 
 >EntrezGene linkage
in these scenarios.



Second one also the same:

http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000234608;r=12:112277571-112280706


Gavin -

   I suspect this is the main issue here.


On 19 Nov 2010, at 14:59, ian Longden wrote:

> 150000, 51275, 55449, 57126,  503646, 8693 are all unmapped in human.
> These will have entrys in the xref table but are not linked to any  
> genes.
> I am not sure how the search is done but this may affect it.
>
> 984:-
> Not sure what the problem is here we find the genes of interest but
> this gene also has other EntrezGene ids.
>
> 8857:-
> as above
>
> 9026:-
> as above.
>
> Were you expecting only 1 EntrezGene per gene? In time i hope this
> becomes true but as these are very similar the software cannot choose
> between them and uses both.
>
> I think the data is correct but maybe the search is not giving you
> exactly what you want.
> We need to look at having the unmapped cases searchable.
>
> 0Ian.
>
> On Fri, Nov 19, 2010 at 11:55 AM, Oliver, Gavin
> <gavin.oliver at almacgroup.com> wrote:
>> I have a few more examples of discrepancies which will hopefully  
>> help.
>>
>>
>>
>> For all examples, the search was performed on Entrez ID but returned
>> nothing.  I have looked a bit deeper into a handful of examples.   
>> Details
>> below:
>>
>>
>>
>> Entrez ID 150000           Associated gene Symbol ABCC13 in  
>> database but
>> with no associated entrez ID
>>
>> Entrez ID  51275            Associated gene symbol C12orf47 in  
>> database but
>> no associated entrez ID
>>
>> Entrez ID  55449            Associated gene symbol C14orf167 in  
>> database but
>> no associated entrez ID
>>
>> Entrez ID  57126            Associated gene symbol CD177 in  
>> database with no
>> associated entrez id
>>
>> Entrez ID  984               Associated gene symbol CDK11B is not in
>> database.  CDK11A is in database but is annotated as cyclin- 
>> dependent kinase
>> 11B with entrez id 100294398 which entrez describes as LOC100294398  
>> (cell
>> division protein kinase 11B-like).
>>
>> Entrez ID  503646          Neither this ID nor associated gene  
>> symbol DPRXP5
>> are in the database.
>>
>> Entrez ID  8857              Associated gene symbol FCGBP (Fc  
>> fragment of
>> IgG binding protein) is there but with Entrez gene ID 100133944 which
>> corresponds to LOC100133944 IgGFc-binding protein-like.
>>
>> Entrez ID  8693              Neither this ID nor associated gene  
>> symbol
>> GALNT4 are in the database.
>>
>> Entrez ID  9026              Gene symbol HIP1R (huntingtin  
>> interacting
>> protein 1 related) is in the database but with entrez ID 100294412  
>> which
>> corresponds to huntingtin-interacting protein 1-related protein-like
>>
>>
>>
>> Best,
>>
>>
>>
>> Gavin
>>
>>
>>
>>
>>
>>
>>
>> ________________________________
>>
>> From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On  
>> Behalf Of
>> Oliver, Gavin
>> Sent: 19 November 2010 10:29
>> To: dev at ensembl.org
>> Subject: [ensembl-dev] Annotation discrepancy
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I have been using Ensembl human for internal annotation of  
>> microarrays.
>>
>>
>>
>> Yesterday someone did a search for Entrez Gene ID 3336 in our  
>> database.  It
>> returned no hits.
>>
>>
>>
>> When they searched with the Gene symbol for this ID (HSPE1), they  
>> got 5 hits
>> but the Entrez ID associated with the gene was 100132346 (and not  
>> 3336 as
>> would be expected).
>>
>>
>>
>> I ran a search for 100132346 against the Ensembl genome browser and  
>> it
>> brings back 2 genes on 2 different chromosomes.
>>
>>
>>
>> Can someone explain what might be happening here?
>>
>>
>>
>> Best,
>>
>>
>>
>> Gavin
>>
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at ensembl.org
>> http://lists.ensembl.org/mailman/listinfo/dev
>>
>>
>
> _______________________________________________
> Dev mailing list
> Dev at ensembl.org
> http://lists.ensembl.org/mailman/listinfo/dev





More information about the Dev mailing list