[ensembl-dev] xref mapping

Genomeo Dev genomeodev at gmail.com
Thu Feb 27 11:20:57 GMT 2014


Hi,

I am interested in getting wide cross references to ensembl gene IDs. I
found two programmatic ways to do that which give consistent results but
different amount of details. Using ENSG00000223972 as an example:
(1)
Using this rest API Endpoint python code (
http://beta.rest.ensembl.org/documentation/info/xref_id)


   1. import httplib2, sys
   2.
   3. http = httplib2.Http(".cache")
   4.
   5. server = "http://beta.rest.ensembl.org"
   6. ext = "/xrefs/id/ENSG00000157764?"
   7. resp, content = http.request(server+ext, method="GET", headers={
   "Content-Type":"application/json"})
   8.
   9. if not resp.status == 200:
   10. print "Invalid response: ", resp.status
   11. sys.exit()
   12. import json
   13.
   14. decoded = json.loads(content)
   15. print repr(decoded)


I get:

{"display_id":"OTTHUMG00000000961","primary_id":"OTTHUMG00000000961","version":"2","description":null,"dbname":"OTTG","synonyms":[],"info_type":"NONE","info_text":"","db_display_name":"Havana
gene"}

{"primary_id":"Hs.714157","dbname":"UniGene","ensembl_identity":98,"synonyms":[],"ensembl_start":6,"xref_start":1,"xref_end":1639,"db_display_name":"UniGene","display_id":"Hs.714157","ensembl_end":1657,"version":"0","score":8055,"cigar_line":"1200M1D299M12D140M","description":"DEAD/H
(Asp-Glu-Ala-Asp/His) box helicase 11 like
1","xref_identity":97,"evalue":null,"info_text":"","info_type":"SEQUENCE_MATCH"}

{"primary_id":"Hs.618434","dbname":"UniGene","ensembl_identity":58,"synonyms":[],"ensembl_start":669,"xref_start":1,"xref_end":974,"db_display_name":"UniGene","display_id":"Hs.618434","ensembl_end":1655,"version":"0","score":4757,"cigar_line":"537M1D299M12D138M","description":"Similar
to DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 isoform 1, mRNA (cDNA
clone
IMAGE:6103207)","xref_identity":96,"evalue":null,"info_text":"","info_type":"SEQUENCE_MATCH"}

{"display_id":"DDX11L1","primary_id":"37102","version":"0","description":"DEAD/H
(Asp-Glu-Ala-Asp/His) box helicase 11 like
1","dbname":"HGNC","synonyms":[],"info_type":"DIRECT","info_text":"Generated
via ensembl_manual","db_display_name":"HGNC Symbol"}

{"display_id":"DDX11L5","primary_id":"100287596","version":"0","description":"DEAD/H
(Asp-Glu-Ala-Asp/His) box helicase 11 like
5","dbname":"EntrezGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"EntrezGene"}

{"display_id":"DDX11L1","primary_id":"100287102","version":"0","description":"DEAD/H
(Asp-Glu-Ala-Asp/His) box helicase 11 like
1","dbname":"EntrezGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"EntrezGene"}

{"display_id":"ENSG00000223972","primary_id":"ENSG00000223972","version":"0","description":"","dbname":"ArrayExpress","synonyms":[],"info_type":"DIRECT","info_text":"","db_display_name":"ArrayExpress"}

{"display_id":"DDX11L5","primary_id":"100287596","version":"0","description":"DEAD/H
(Asp-Glu-Ala-Asp/His) box helicase 11 like
5","dbname":"WikiGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"WikiGene"}

{"display_id":"DDX11L1","primary_id":"100287102","version":"0","description":"DEAD/H
(Asp-Glu-Ala-Asp/His) box helicase 11 like
1","dbname":"WikiGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"WikiGene"}]

(2)

Using this perl API code (based on
http://www.ensembl.org/info/docs/api/core/core_tutorial.html):

# Define a helper subroutine to print DBEntries
sub print_DBEntries
{
    my $db_entries = shift;

    foreach my $dbe ( @{$db_entries} ) {
        printf "\tXREF %s (%s)\n", $dbe->display_id(), $dbe->dbname();
    }
}

my $genes = $gene_adaptor->fetch_all_by_stable_id_list([@gene_list]);


...


print "GENE ", $gene->stable_id(), "\n";
print_DBEntries( $gene->get_all_DBEntries() );

I get:
XREF OTTHUMG00000000961 (OTTG)
XREF ENSG00000223972 (ArrayExpress)
XREF DDX11L1 (EntrezGene)
XREF DDX11L5 (EntrezGene)
XREF DDX11L1 (HGNC)
XREF Hs.618434 (UniGene)
XREF Hs.714157 (UniGene)
XREF DDX11L1 (WikiGene)
XREF DDX11L5 (WikiGene)


Questions:

1. am I correct in saying that the Rest code uses the latest Ensembl
release while the API code uses the Ensembl release currently installed as
part of the VM (I am using release 74)?

2. Rest code gives more extensive details (which I like) compared to the
perl API code. Could you suggest a simple way to use the API to get the
same details?

3. The Rest code output format. Is tab separated text supported?

4. Is there a  file in the Ensembl ftp area which contains pre generated
detailed cross ref mappings for all current Ensembl genes?
-- 

Thanks,

G.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140227/24a8a3a7/attachment.html>


More information about the Dev mailing list