[ensembl-dev] xref mapping

mr6 at ebi.ac.uk mr6 at ebi.ac.uk
Tue Mar 25 21:16:43 GMT 2014


Hi Genomeo,

The xref endpoint returns the whole list of external references associated
to an ensembl object.
This can be filtered for a given external source name, in this case HGNC.

The lookup endpoint returns some information on the input object,
including its location and display name, but excluding any external
references.

For this gene, the display name is an HGNC symbol.
This is the case for most of our genes, but in can also be
- Uniprot names (http://beta.rest.ensembl.org/lookup/id/ENSG00000261163)
- RFAM (http://beta.rest.ensembl.org/lookup/id/ENSG00000252365)
- miRBase (http://beta.rest.ensembl.org/lookup/id/ENSG00000265031)
- Havana (http://beta.rest.ensembl.org/lookup/id/ENSG00000228741)


Hope this helps,
Magali

> Hi,
>
> I was comparing the output from these lookupid and xref commands from
> ensembl REST endpoint for ENSG00000243485:
>
> wget -q --header='Content-type:application/json' '
> http://beta.rest.ensembl.org/xrefs/id/ENSG00000243485?external_db=HGNC'
> -O
> -
> ENSG00000243485 HGNC MIR1302-11 microRNA 1302-11 HGNC Symbol Generated via
> refseq_manual DEPENDENT 38246 hsa-mir-1302-11 0
> ENSG00000243485 HGNC MIR1302-10 microRNA 1302-10 HGNC Symbol Generated via
> refseq_manual DEPENDENT 38233 hsa-mir-1302-10 0
> ENSG00000243485 HGNC MIR1302-9 microRNA 1302-9 HGNC Symbol Generated via
> refseq_manual DEPENDENT 38218 hsa-mir-1302-9 0
> ENSG00000243485 HGNC MIR1302-2 microRNA 1302-2 HGNC Symbol Generated via
> refseq_manual DEPENDENT 35294 hsa-mir-1302-2, MIRN1302-2 0
>
> wget -q --header='Content-type:application/json' '
> http://beta.rest.ensembl.org/lookup/id/ENSG00000243485?expand=1' -O -
> ENSG00000243485 1 29554 31109 1 MIR1302-10 ensembl_havana
> ensembl_havana_lincrna microRNA 1302-10 [Source:HGNC Symbol;Acc:38233]
> lincRNA
>
> What is the reason for the lookup command to show only one of the four
> mapped HGNC  symbols?
>
> Thanks,
>
> G.
>
>
> On 27 February 2014 11:20, Genomeo Dev <genomeodev at gmail.com> wrote:
>
>> Hi,
>>
>> I am interested in getting wide cross references to ensembl gene IDs. I
>> found two programmatic ways to do that which give consistent results but
>> different amount of details. Using ENSG00000223972 as an example:
>> (1)
>> Using this rest API Endpoint python code (
>> http://beta.rest.ensembl.org/documentation/info/xref_id)
>>
>>
>>    1. import httplib2, sys
>>    2.
>>    3. http = httplib2.Http(".cache")
>>    4.
>>    5. server = "http://beta.rest.ensembl.org"
>>    6. ext = "/xrefs/id/ENSG00000157764?"
>>    7. resp, content = http.request(server+ext, method="GET", headers={
>>    "Content-Type":"application/json"})
>>    8.
>>    9. if not resp.status == 200:
>>    10. print "Invalid response: ", resp.status
>>    11. sys.exit()
>>    12. import json
>>    13.
>>    14. decoded = json.loads(content)
>>    15. print repr(decoded)
>>
>>
>> I get:
>>
>> {"display_id":"OTTHUMG00000000961","primary_id":"OTTHUMG00000000961","version":"2","description":null,"dbname":"OTTG","synonyms":[],"info_type":"NONE","info_text":"","db_display_name":"Havana
>> gene"}
>>
>> {"primary_id":"Hs.714157","dbname":"UniGene","ensembl_identity":98,"synonyms":[],"ensembl_start":6,"xref_start":1,"xref_end":1639,"db_display_name":"UniGene","display_id":"Hs.714157","ensembl_end":1657,"version":"0","score":8055,"cigar_line":"1200M1D299M12D140M","description":"DEAD/H
>> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>> 1","xref_identity":97,"evalue":null,"info_text":"","info_type":"SEQUENCE_MATCH"}
>>
>> {"primary_id":"Hs.618434","dbname":"UniGene","ensembl_identity":58,"synonyms":[],"ensembl_start":669,"xref_start":1,"xref_end":974,"db_display_name":"UniGene","display_id":"Hs.618434","ensembl_end":1655,"version":"0","score":4757,"cigar_line":"537M1D299M12D138M","description":"Similar
>> to DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 isoform 1, mRNA (cDNA
>> clone
>> IMAGE:6103207)","xref_identity":96,"evalue":null,"info_text":"","info_type":"SEQUENCE_MATCH"}
>>
>> {"display_id":"DDX11L1","primary_id":"37102","version":"0","description":"DEAD/H
>> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>> 1","dbname":"HGNC","synonyms":[],"info_type":"DIRECT","info_text":"Generated
>> via ensembl_manual","db_display_name":"HGNC Symbol"}
>>
>> {"display_id":"DDX11L5","primary_id":"100287596","version":"0","description":"DEAD/H
>> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>> 5","dbname":"EntrezGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"EntrezGene"}
>>
>> {"display_id":"DDX11L1","primary_id":"100287102","version":"0","description":"DEAD/H
>> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>> 1","dbname":"EntrezGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"EntrezGene"}
>>
>>
>> {"display_id":"ENSG00000223972","primary_id":"ENSG00000223972","version":"0","description":"","dbname":"ArrayExpress","synonyms":[],"info_type":"DIRECT","info_text":"","db_display_name":"ArrayExpress"}
>>
>> {"display_id":"DDX11L5","primary_id":"100287596","version":"0","description":"DEAD/H
>> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>> 5","dbname":"WikiGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"WikiGene"}
>>
>> {"display_id":"DDX11L1","primary_id":"100287102","version":"0","description":"DEAD/H
>> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>> 1","dbname":"WikiGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"WikiGene"}]
>>
>> (2)
>>
>> Using this perl API code (based on
>> http://www.ensembl.org/info/docs/api/core/core_tutorial.html):
>>
>> # Define a helper subroutine to print DBEntries
>> sub print_DBEntries
>> {
>>     my $db_entries = shift;
>>
>>     foreach my $dbe ( @{$db_entries} ) {
>>         printf "\tXREF %s (%s)\n", $dbe->display_id(), $dbe->dbname();
>>     }
>> }
>>
>> my $genes = $gene_adaptor->fetch_all_by_stable_id_list([@gene_list]);
>>
>>
>> ...
>>
>>
>> print "GENE ", $gene->stable_id(), "\n";
>> print_DBEntries( $gene->get_all_DBEntries() );
>>
>> I get:
>> XREF OTTHUMG00000000961 (OTTG)
>> XREF ENSG00000223972 (ArrayExpress)
>> XREF DDX11L1 (EntrezGene)
>> XREF DDX11L5 (EntrezGene)
>> XREF DDX11L1 (HGNC)
>> XREF Hs.618434 (UniGene)
>> XREF Hs.714157 (UniGene)
>> XREF DDX11L1 (WikiGene)
>> XREF DDX11L5 (WikiGene)
>>
>>
>> Questions:
>>
>> 1. am I correct in saying that the Rest code uses the latest Ensembl
>> release while the API code uses the Ensembl release currently installed
>> as
>> part of the VM (I am using release 74)?
>>
>> 2. Rest code gives more extensive details (which I like) compared to the
>> perl API code. Could you suggest a simple way to use the API to get the
>> same details?
>>
>> 3. The Rest code output format. Is tab separated text supported?
>>
>> 4. Is there a  file in the Ensembl ftp area which contains pre generated
>> detailed cross ref mappings for all current Ensembl genes?
>> --
>>
>> Thanks,
>>
>> G.
>>
>
>
>
> --
> G.
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>





More information about the Dev mailing list