[ensembl-dev] xref mapping

Thu Feb 27 13:41:04 GMT 2014

Thanks very much for the useful answer.

I noticed that cross ref also maps to genes from organisms other than that
of the query gene ID. Any comment on that?

Related to the previous question, I use the following Rest python code to
do id lookup for particular Ensembl IDs:

pref= "/lookup/id/"
ext = "?"

for line in inputfile1:
        geneid= line.rstrip('\n')

        resp, content = http.request(server+pref+geneid+ext, method="GET",
headers={"Content-Type":"application/json"})

        if not resp.status == 200:
                print "%s\t%s\t%s" %  (geneid, "Invalid response:",
resp.status)
                continue
                #sys.exit()
        print "%s\t%s" % (geneid,content)

And I get this output:

ENSG00000223972
{"source":"ensembl_havana","object_type":"Gene","logic_name":"ensembl_havana_gene","species":"homo_sapiens","description":"DEAD/H
(Asp-Glu-Ala-Asp/His) box helicase 11 like 1 [Source:HGNC
Symbol;Acc:37102]","display_name":"DDX11L1","biotype":"pseudogene","end":14412,"seq_region_name":"1","db_type":"core","strand":1,"id":"ENSG00000223972","start":11869}

What would be the classes/attributes to use under the Perl API to get that?
i.e:

source
object_type
logic_name
species
description
display_name
biotype
end
seq_region_name
db_type
strand
id
start

Thanks,

G.

On 27 February 2014 11:39, mag <mr6 at ebi.ac.uk> wrote:

>  Hi Genomeo,
>
> The REST server only display the current/latest release.
> The release version can be found with this endpoint:
> http://beta.rest.ensembl.org/documentation/info/software
>
> To get more details with the Ensembl API, you only need to update the
> print_DBEntries method to display all the attributes you are looking for.
> Compared to the output from REST, we have the following:
> - display_id is $dbe->display_id()
> - primary_id is $dbe->primary_id()
> - version is $dbe->version()
> - description is $dbe->description()
> - dbname is $dbe->dbname()
> - synonyms is $dbe->get_all_synonyms()
> - info_type is $dbe->info_type()
> - info_text is $dbe->info_text()
> - db_display_name is $dbe->db_display_name()
>
> You can chose what format the REST will output.
> Details of all formats can be found in our user guide:
> http://beta.rest.ensembl.org/documentation/user_guide
> For tab-delimited output, content_type=text/x-gff3 is used, but it is only
> available for the /feature endpoint.
>
> There is no file in the Ensembl ftp dumps that contains all the external
> references produced.
>
>
> Regards,
> Magali
>
>
> On 27/02/2014 11:20, Genomeo Dev wrote:
>
>  Hi,
>
>  I am interested in getting wide cross references to ensembl gene IDs. I
> found two programmatic ways to do that which give consistent results but
> different amount of details. Using ENSG00000223972 as an example:
>  (1)
> Using this rest API Endpoint python code (
> http://beta.rest.ensembl.org/documentation/info/xref_id)
>
>
>    1. import httplib2, sys
>    2.
>    3. http = httplib2.Http(".cache")
>    4.
>    5. server = "http://beta.rest.ensembl.org"
>    6. ext = "/xrefs/id/ENSG00000157764?"
>    7. resp, content = http.request(server+ext, method="GET", headers={
>    "Content-Type":"application/json"})
>    8.
>    9. if not resp.status == 200:
>    10. print "Invalid response: ", resp.status
>    11. sys.exit()
>    12. import json
>    13.
>    14. decoded = json.loads(content)
>    15. print repr(decoded)
>
>
>  I get:
>
>  {"display_id":"OTTHUMG00000000961","primary_id":"OTTHUMG00000000961","version":"2","description":null,"dbname":"OTTG","synonyms":[],"info_type":"NONE","info_text":"","db_display_name":"Havana
> gene"}
>
>  {"primary_id":"Hs.714157","dbname":"UniGene","ensembl_identity":98,"synonyms":[],"ensembl_start":6,"xref_start":1,"xref_end":1639,"db_display_name":"UniGene","display_id":"Hs.714157","ensembl_end":1657,"version":"0","score":8055,"cigar_line":"1200M1D299M12D140M","description":"DEAD/H
> (Asp-Glu-Ala-Asp/His) box helicase 11 like
> 1","xref_identity":97,"evalue":null,"info_text":"","info_type":"SEQUENCE_MATCH"}
>
>  {"primary_id":"Hs.618434","dbname":"UniGene","ensembl_identity":58,"synonyms":[],"ensembl_start":669,"xref_start":1,"xref_end":974,"db_display_name":"UniGene","display_id":"Hs.618434","ensembl_end":1655,"version":"0","score":4757,"cigar_line":"537M1D299M12D138M","description":"Similar
> to DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 isoform 1, mRNA (cDNA
> clone
> IMAGE:6103207)","xref_identity":96,"evalue":null,"info_text":"","info_type":"SEQUENCE_MATCH"}
>
>  {"display_id":"DDX11L1","primary_id":"37102","version":"0","description":"DEAD/H
> (Asp-Glu-Ala-Asp/His) box helicase 11 like
> 1","dbname":"HGNC","synonyms":[],"info_type":"DIRECT","info_text":"Generated
> via ensembl_manual","db_display_name":"HGNC Symbol"}
>
>  {"display_id":"DDX11L5","primary_id":"100287596","version":"0","description":"DEAD/H
> (Asp-Glu-Ala-Asp/His) box helicase 11 like
> 5","dbname":"EntrezGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"EntrezGene"}
>
>  {"display_id":"DDX11L1","primary_id":"100287102","version":"0","description":"DEAD/H
> (Asp-Glu-Ala-Asp/His) box helicase 11 like
> 1","dbname":"EntrezGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"EntrezGene"}
>
>
> {"display_id":"ENSG00000223972","primary_id":"ENSG00000223972","version":"0","description":"","dbname":"ArrayExpress","synonyms":[],"info_type":"DIRECT","info_text":"","db_display_name":"ArrayExpress"}
>
>  {"display_id":"DDX11L5","primary_id":"100287596","version":"0","description":"DEAD/H
> (Asp-Glu-Ala-Asp/His) box helicase 11 like
> 5","dbname":"WikiGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"WikiGene"}
>
>  {"display_id":"DDX11L1","primary_id":"100287102","version":"0","description":"DEAD/H
> (Asp-Glu-Ala-Asp/His) box helicase 11 like
> 1","dbname":"WikiGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"WikiGene"}]
>
>  (2)
>
>  Using this perl API code (based on
> http://www.ensembl.org/info/docs/api/core/core_tutorial.html):
>
>  # Define a helper subroutine to print DBEntries
> sub print_DBEntries
> {
>     my $db_entries = shift;
>
>     foreach my $dbe ( @{$db_entries} ) {
>         printf "\tXREF %s (%s)\n", $dbe->display_id(), $dbe->dbname();
>     }
> }
>
> my $genes = $gene_adaptor->fetch_all_by_stable_id_list([@gene_list]);
>
>
> ...
>
>
> print "GENE ", $gene->stable_id(), "\n";
> print_DBEntries( $gene->get_all_DBEntries() );
>
>  I get:
>  XREF OTTHUMG00000000961 (OTTG)
> XREF ENSG00000223972 (ArrayExpress)
> XREF DDX11L1 (EntrezGene)
> XREF DDX11L5 (EntrezGene)
> XREF DDX11L1 (HGNC)
> XREF Hs.618434 (UniGene)
> XREF Hs.714157 (UniGene)
>  XREF DDX11L1 (WikiGene)
> XREF DDX11L5 (WikiGene)
>
>
>  Questions:
>
>  1. am I correct in saying that the Rest code uses the latest Ensembl
> release while the API code uses the Ensembl release currently installed as
> part of the VM (I am using release 74)?
>
>  2. Rest code gives more extensive details (which I like) compared to the
> perl API code. Could you suggest a simple way to use the API to get the
> same details?
>
>  3. The Rest code output format. Is tab separated text supported?
>
>  4. Is there a  file in the Ensembl ftp area which contains pre generated
> detailed cross ref mappings for all current Ensembl genes?
> --
>
>  Thanks,
>
>  G.
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>

-- 
G.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140227/13ad8534/attachment.html>