[ensembl-dev] Gene Symbol
Andy Yates
ayates at ebi.ac.uk
Mon Feb 13 17:05:49 GMT 2012
Hi Nick,
Again it's an issue with synonyms. If we take the case of SF3B1 we get two hits back:
http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000115524;r=2:198256698-198299815
http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000087365;r=11:65818200-65836779;t=ENST00000528302
The first is the intended record. The second is SF3B2 but this has a synonym for SF3b1 which is returned because our MySQL tables are case insensitive. If you add a check on the display label that should remove the remaining stragglers.
Andy
Andrew Yates Ensembl Core Software Project Leader
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensembl.org/
On 13 Feb 2012, at 16:26, Nick Fankhauser wrote:
> Yes, thanks a lot! Like this it produces a lot less false hits.
> Especially when I combine it with rejecting all MHC chromosomes.
>
> But there's still for example SF3B1 and RAGE, for which I get results
> from two different chromosomes for some reason. Do you know why this can
> still be the case?
>
> Nick
>
>
> On 13/02/12 17:00, Andy Yates wrote:
>> Hi Nick,
>>
>> My guess is that you're hitting an issue with external synonyms. The method you are using will consult all xrefs linked to a gene (along with transcripts and translations) as well as consulting the external_synonym table. In the case of CLK2 we have the following external synonyms linked to the term CLK2.
>>
>> synonym db_name dbprimary_acc display_label
>> clk2 Vega_transcript OTTHUMT00000364143 OTTHUMT00000364143
>> clk2 Vega_transcript OTTHUMT00000365664 RP11-531A21.3-001
>> clk2 Vega_transcript OTTHUMT00000272912 OTTHUMT00000272912
>> clk2 OTTG OTTHUMG00000150164 OTTHUMG00000150164
>> CLK2 EntrezGene 9894 TELO2
>> clk2 HGNC 2069 CLK2
>>
>> If you change your query to limit by the external DB of the source then the hits will reduce massively e.g.
>>
>> my $genes = $gene_adaptor->fetch_all_by_external_name('CLK2', 'HGNC');
>>
>> All the best
>>
>> Andy
>>
>> On 13 Feb 2012, at 15:22, Nick Fankhauser wrote:
>>
>>> Hi!
>>>
>>> I'm trying to retrieve the chromosomal position for a list of
>>> gene-symbols. They are all official gene-symbols.
>>>
>>> Using a loop like this
>>>
>>> foreach my $gene
>>> (@{$gene_adaptor->fetch_all_by_external_name($gene_symbol)}) {
>>>
>>> I get one correct hit for some genes (e.g. USP8), but for others like
>>> CLK2, I get multiple results and have no idea how to select the correct
>>> one. Is there a way to just get position of just the official gene symbol?
>>>
>>> Thanks!
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list