[ensembl-dev] Hyphenated entrez xrefs?

Alexander Pico apico at gladstone.ucsf.edu
Fri Dec 13 21:58:47 GMT 2013


Here are some counts of occurrences of hyphenated entrez genes in the xref table per species db where I've found these so far:

Bt - 287
Gg - 1138
Rn - 716
Xt - 731

 - Alex

On Dec 13, 2013, at 1:46 PM, Alexander Pico <apico at gladstone.ucsf.edu> wrote:

> Dear Ensembl,
> 
> I've run across a number of examples of hyphenated entrez gene identifiers in xref tables, starting back in release 72, for example:
> 
> rattus_norvegicus_core_72_5
> 
> +---------+----------------+---------------+---------------+---------+-----------------------------------------------+-----------+-----------+
> | xref_id | external_db_id | dbprimary_acc | display_label | version | description                                   | info_type | info_text |
> +---------+----------------+---------------+---------------+---------+-----------------------------------------------+-----------+-----------+
> |  576085 |           1300 | 288264        | Ifnar1        | 0       | interferon (alpha, beta and omega) receptor 1 | DEPENDENT |           |
> +---------+----------------+---------------+---------------+---------+-----------------------------------------------+-----------+-----------+
> | 1143738 |           1300 | 288264-201    | Ifnar1-201    | 0       | interferon (alpha, beta and omega) receptor 1 | MISC      | via gene name |
> +---------+----------------+---------------+---------------+---------+-----------------------------------------------+-----------+---------------+
> 
> The first result is accurate, but the second one is apparently manufactored. This entry breaks a number of downstream uses for xrefs, since the "-201" is not part of the official ID format for Entrez gene, for example.
> 
> What are these? Are you planning on keeping these around in future xref tables?
> 
> And how would you recommend avoiding these in xref queries using the Perl API? Here's my current Perl psuedocode:
> 
> $gene->get_all_DBLinks();
> foreach my $dbe (@$db_entries) {
> 	if ($dbe->dbname() =~ /^\'EntrezGene\'$/){
> 		//Collect xref associated with $gene
> 	}
> }
>  
> What other filters or checks should I do to exclude the manufactored identifiers associated with your Entrez Gene records?
> 
> Thanks!
> - Alex
> 
> ----------------------------------------
> Alexander Pico, PhD
> NRNB Executive Director
> Bioinformatics Assoc. Director
> Gladstone Institutes
> http://nrnb.org
> http://gladstoneinstitutes.org
> ----------------------------------------
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20131213/a4844ece/attachment.html>


More information about the Dev mailing list