[ensembl-dev] Hyphenated entrez xrefs?

Alexander Pico apico at gladstone.ucsf.edu
Sat Dec 14 20:19:22 GMT 2013


Hi Magali,

Thanks for the clarification. Our script actually extracts xrefs for a couple dozen external dbs per $gene, including UniProt, so I think that's why we use DBLinks rather than DBEntries. Is there documentation on which external dbs have xrefs associated with genes vs transcripts and translations? Or do I just need to run both periodically and compare the results?

In general, I'd like to offer the feedback that making up identifiers, such as 288264-201, and calling them Entrez Gene database xrefs is poor form. They are no longer reliable nor useful as an identifiers. Entrez Gene does not recognize the ID and it breaks downstream applications that expect a proper ID.

I haven't seen this with any other external databases in the Ensembl xref tables yet. Are there other cases of manufactured IDs I should look out for in the DBLinks system? Is this practice isolated to Entrez Gene so far?

Thanks!
 - Alex

On Dec 14, 2013, at 7:05 AM, mr6 at ebi.ac.uk wrote:

> Hi Alex,
> 
> These hyphenated extensions are used for transcripts.
> If a gene is associated to a given EntrezGene entry, we can use this to
> assign a name to all transcripts of that gene.
> To be able to distinguish those transcripts, we number them by adding
> -201, -202, etc..
> This is based on the numbering system already used for manual annotation.
> 
> In your query, you are using the method get_all_DBLinks, which will return
> all xrefs associated to the gene, as well as all DBEntries that are
> associated with the transcripts and corresponding translations of this
> gene.
> To retrieve only the DBEntries associated to the gene, you can use the
> method get_all_DBEntries.
> 
> For both method, get_all_DBLinks and get_all_DBEntries, you can add the
> external_db_name as an argument.
> $gene->get_all_DBEntries('EntrezGene') will return only EntrezGene xrefs
> for this gene.
> More information on the methods can be found here:
> http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Gene.html#a5aaf31a07a3d82c3841a411a0a55e81b
> 
> 
> Hope this helps,
> Magali
> 
> 
>> Dear Ensembl,
>> 
>> I've run across a number of examples of hyphenated entrez gene identifiers
>> in xref tables, starting back in release 72, for example:
>> 
>> rattus_norvegicus_core_72_5
>> 
>> +---------+----------------+---------------+---------------+---------+-----------------------------------------------+-----------+-----------+
>> | xref_id | external_db_id | dbprimary_acc | display_label | version |
>> description                                   | info_type | info_text |
>> +---------+----------------+---------------+---------------+---------+-----------------------------------------------+-----------+-----------+
>> |  576085 |           1300 | 288264        | Ifnar1        | 0       |
>> interferon (alpha, beta and omega) receptor 1 | DEPENDENT |           |
>> +---------+----------------+---------------+---------------+---------+-----------------------------------------------+-----------+-----------+
>> | 1143738 |           1300 | 288264-201    | Ifnar1-201    | 0       |
>> interferon (alpha, beta and omega) receptor 1 | MISC      | via gene name
>> |
>> +---------+----------------+---------------+---------------+---------+-----------------------------------------------+-----------+---------------+
>> 
>> The first result is accurate, but the second one is apparently
>> manufactored. This entry breaks a number of downstream uses for xrefs,
>> since the "-201" is not part of the official ID format for Entrez gene,
>> for example.
>> 
>> What are these? Are you planning on keeping these around in future xref
>> tables?
>> 
>> And how would you recommend avoiding these in xref queries using the Perl
>> API? Here's my current Perl psuedocode:
>> 
>> $gene->get_all_DBLinks();
>> foreach my $dbe (@$db_entries) {
>> 	if ($dbe->dbname() =~ /^\'EntrezGene\'$/){
>> 		//Collect xref associated with $gene
>> 	}
>> }
>> 
>> What other filters or checks should I do to exclude the manufactored
>> identifiers associated with your Entrez Gene records?
>> 
>> Thanks!
>> - Alex
>> 
>> ----------------------------------------
>> Alexander Pico, PhD
>> NRNB Executive Director
>> Bioinformatics Assoc. Director
>> Gladstone Institutes
>> http://nrnb.org
>> http://gladstoneinstitutes.org
>> ----------------------------------------
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>> 
> 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list