[ensembl-dev] Difficulty with finding genes by HUGO name in ensembl database

Michael Schuster michaels at ebi.ac.uk
Mon Jan 9 15:05:51 GMT 2012


Dear Harish,

This is an exceptional case, as the BAGE3 gene is reported to be in a  
location that has rather low sequence quality. A sequence similarity  
search suggests it is the processed transcript labelled  
RP11-763B22.9-001 (ENST00000444424) in the following display:

http://www.ensembl.org/Homo_sapiens/Location/View?db=core;h=BLAST_NEW:BLA_S6nFdHaJW!!20111117;r=1:148838249-148898456;contigviewbottom=das_DS_775=normal

By following the link, a Genome Reference Consortium DAS track should  
be added, showing issues HG-980 (more than half of the BAGE3 gene does  
not align) and HG-515 (assembly gap) suggesting a larger problem in  
this region.

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-980

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-515

So the GRC looking after the genome sequence for human, mouse and  
eventually zebrafish is aware of the issue and awaiting further  
experimental data.

Because the BAGE3 gene could not be placed in its entirety and no  
coding sequence has been assigned, the closest match for the RefSeq  
BAGE3 mRNA is a BAGE2 transcript model. This can be seen on the  
following Transcript-Similarity page where NM_182481.1 (BAGE3) matches  
with 98% similarity to BAGE2-001 (ENST00000470054).

http://www.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000187172;r=21:10996026-11098980;t=ENST00000470054

This is most likely the reason why the search engine associated the  
string BAGE3 with this BAGE2 transcript and gene.

Regarding your question about the database schema, the following SQL  
query indicates how gene symbols are linked to genes. Please note the  
object_xref.ensembl_object_type = 'Gene' constraint and the link  
between gene.gene_id = object_xref.ensembl_id. Alternatively,  
object_xref.ensembl_id can also link to transcript.transcript_id and  
translation.translation_id according to the value in  
object_xref.ensembl_object_type. However, gene symbols are associated  
with genes.

mysql> select gene.stable_id, xref.display_label, xref.dbprimary_acc,  
external_db.db_name from gene, object_xref, xref, external_db where  
gene.gene_id = object_xref.ensembl_id  and  
object_xref.ensembl_object_type = 'Gene' and object_xref.xref_id =  
xref.xref_id and xref.external_db_id = external_db.external_db_id and  
xref.display_label = 'BAGE2';
+-----------------+---------------+---------------+---------+
| stable_id       | display_label | dbprimary_acc | db_name |
+-----------------+---------------+---------------+---------+
| ENSG00000187172 | BAGE2         | 15723         | HGNC    |
+-----------------+---------------+---------------+---------+
1 row in set (0.00 sec)

The place to see this on the web page would be here:

http://www.ensembl.org/Homo_sapiens/Gene/Matches?db=core;g=ENSG00000187172;r=21:11020842-11098925;t=ENST00000470054

Hope that helps,
Michael Schuster



On 27 Dec 2011, at 22:46, Harish Mahadevan wrote:

> Hi Ensembl dev team,
>
> I recently downloaded the ensembl database and am having some  
> difficulty finding genes by HUGO name. For example, there is a gene  
> named "BAGE3" that I couldn't find in the ensembl database;   
> however, when I searched the ensembl web site (i.e. http://useast.ensembl.org/Homo_sapiens/Search/Details?species=Homo_sapiens;idx=Gene;end=2;q=BAGE3) 
> , I found that "BAGE3" appears to be the same gene as "BAGE2".
>
> Could you point me at the right direction for retrieving gene  
> records from the ensembl database (i.e. gene table in "core" schema)  
> by HUGO name?
>
> regards
> Harish
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

--
Michael Schuster, Ph.D.
Ensembl Genome Browser Project
Vertebrate Genomics Team
EMBL - European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD
United Kingdom

http://www.ensembl.org/







More information about the Dev mailing list