[ensembl-dev] xref mapping

mag mr6 at ebi.ac.uk
Thu Mar 27 09:31:56 GMT 2014


Hi Genomeo,

Indeed, we have an order of priority for sources.
We check the first source and if there is a match, we keep it. Else, we 
look at the next source.
The order for all species is as follows:
- official naming source where available (HGNC for human, MGI for mouse, 
RGD for rat, etc)
- RFAM
- miRBase
- Uniprot gene names
- EntrezGene names
- Havana names

For ENSG00000243485, the other MIR symbols have been aligned because 
these are short sequences and there is a lot of similarity.
Hence, they match well enough to be added as external references.
However, they all have a better match for another Ensembl gene.
http://beta.rest.ensembl.org/lookup/symbol/homo_sapiens/MIR1302-2
http://beta.rest.ensembl.org/lookup/symbol/homo_sapiens/MIR1302-9
http://beta.rest.ensembl.org/lookup/symbol/homo_sapiens/MIR1302-11


Hope that helps,
Magali

On 26/03/2014 22:45, Genomeo Dev wrote:
> Hi Magali,
>
> Thanks for the response.
>
> Is there a rule for how the display name is assigned for a given 
> Ensembl gene ID? Something like use HGNC symbol if exists, otherwise 
> Uniprot, otherwise PFAM, otherwise miRBASE otherwise Havana..
>
> The other question is: In the case of the display name of 
> ENSG00000243485 which is the HGNC symbol is MIR1302-10, how was this 
> one HGNC symbol chosen from the set of four mapped HGNC 
> symbols retrievable with the xref command?
>
> G.
>
>
> On 25 March 2014 21:16, <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>> wrote:
>
>     Hi Genomeo,
>
>     The xref endpoint returns the whole list of external references
>     associated
>     to an ensembl object.
>     This can be filtered for a given external source name, in this
>     case HGNC.
>
>     The lookup endpoint returns some information on the input object,
>     including its location and display name, but excluding any external
>     references.
>
>     For this gene, the display name is an HGNC symbol.
>     This is the case for most of our genes, but in can also be
>     - Uniprot names
>     (http://beta.rest.ensembl.org/lookup/id/ENSG00000261163)
>     - RFAM (http://beta.rest.ensembl.org/lookup/id/ENSG00000252365)
>     - miRBase (http://beta.rest.ensembl.org/lookup/id/ENSG00000265031)
>     - Havana (http://beta.rest.ensembl.org/lookup/id/ENSG00000228741)
>
>
>     Hope this helps,
>     Magali
>
>     > Hi,
>     >
>     > I was comparing the output from these lookupid and xref commands
>     from
>     > ensembl REST endpoint for ENSG00000243485:
>     >
>     > wget -q --header='Content-type:application/json' '
>     >
>     http://beta.rest.ensembl.org/xrefs/id/ENSG00000243485?external_db=HGNC'
>     > -O
>     > -
>     > ENSG00000243485 HGNC MIR1302-11 microRNA 1302-11 HGNC Symbol
>     Generated via
>     > refseq_manual DEPENDENT 38246 hsa-mir-1302-11 0
>     > ENSG00000243485 HGNC MIR1302-10 microRNA 1302-10 HGNC Symbol
>     Generated via
>     > refseq_manual DEPENDENT 38233 hsa-mir-1302-10 0
>     > ENSG00000243485 HGNC MIR1302-9 microRNA 1302-9 HGNC Symbol
>     Generated via
>     > refseq_manual DEPENDENT 38218 hsa-mir-1302-9 0
>     > ENSG00000243485 HGNC MIR1302-2 microRNA 1302-2 HGNC Symbol
>     Generated via
>     > refseq_manual DEPENDENT 35294 hsa-mir-1302-2, MIRN1302-2 0
>     >
>     > wget -q --header='Content-type:application/json' '
>     > http://beta.rest.ensembl.org/lookup/id/ENSG00000243485?expand=1'
>     -O -
>     > ENSG00000243485 1 29554 31109 1 MIR1302-10 ensembl_havana
>     > ensembl_havana_lincrna microRNA 1302-10 [Source:HGNC
>     Symbol;Acc:38233]
>     > lincRNA
>     >
>     > What is the reason for the lookup command to show only one of
>     the four
>     > mapped HGNC  symbols?
>     >
>     > Thanks,
>     >
>     > G.
>     >
>     >
>     > On 27 February 2014 11:20, Genomeo Dev <genomeodev at gmail.com
>     <mailto:genomeodev at gmail.com>> wrote:
>     >
>     >> Hi,
>     >>
>     >> I am interested in getting wide cross references to ensembl
>     gene IDs. I
>     >> found two programmatic ways to do that which give consistent
>     results but
>     >> different amount of details. Using ENSG00000223972 as an example:
>     >> (1)
>     >> Using this rest API Endpoint python code (
>     >> http://beta.rest.ensembl.org/documentation/info/xref_id)
>     >>
>     >>
>     >>    1. import httplib2, sys
>     >>    2.
>     >>    3. http = httplib2.Http(".cache")
>     >>    4.
>     >>    5. server = "http://beta.rest.ensembl.org"
>     >>    6. ext = "/xrefs/id/ENSG00000157764?"
>     >>    7. resp, content = http.request(server+ext, method="GET",
>     headers={
>     >>    "Content-Type":"application/json"})
>     >>    8.
>     >>    9. if not resp.status == 200:
>     >>    10. print "Invalid response: ", resp.status
>     >>    11. sys.exit()
>     >>    12. import json
>     >>    13.
>     >>    14. decoded = json.loads(content)
>     >>    15. print repr(decoded)
>     >>
>     >>
>     >> I get:
>     >>
>     >>
>     {"display_id":"OTTHUMG00000000961","primary_id":"OTTHUMG00000000961","version":"2","description":null,"dbname":"OTTG","synonyms":[],"info_type":"NONE","info_text":"","db_display_name":"Havana
>     >> gene"}
>     >>
>     >>
>     {"primary_id":"Hs.714157","dbname":"UniGene","ensembl_identity":98,"synonyms":[],"ensembl_start":6,"xref_start":1,"xref_end":1639,"db_display_name":"UniGene","display_id":"Hs.714157","ensembl_end":1657,"version":"0","score":8055,"cigar_line":"1200M1D299M12D140M","description":"DEAD/H
>     >> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>     >>
>     1","xref_identity":97,"evalue":null,"info_text":"","info_type":"SEQUENCE_MATCH"}
>     >>
>     >>
>     {"primary_id":"Hs.618434","dbname":"UniGene","ensembl_identity":58,"synonyms":[],"ensembl_start":669,"xref_start":1,"xref_end":974,"db_display_name":"UniGene","display_id":"Hs.618434","ensembl_end":1655,"version":"0","score":4757,"cigar_line":"537M1D299M12D138M","description":"Similar
>     >> to DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 isoform 1,
>     mRNA (cDNA
>     >> clone
>     >>
>     IMAGE:6103207)","xref_identity":96,"evalue":null,"info_text":"","info_type":"SEQUENCE_MATCH"}
>     >>
>     >>
>     {"display_id":"DDX11L1","primary_id":"37102","version":"0","description":"DEAD/H
>     >> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>     >>
>     1","dbname":"HGNC","synonyms":[],"info_type":"DIRECT","info_text":"Generated
>     >> via ensembl_manual","db_display_name":"HGNC Symbol"}
>     >>
>     >>
>     {"display_id":"DDX11L5","primary_id":"100287596","version":"0","description":"DEAD/H
>     >> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>     >>
>     5","dbname":"EntrezGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"EntrezGene"}
>     >>
>     >>
>     {"display_id":"DDX11L1","primary_id":"100287102","version":"0","description":"DEAD/H
>     >> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>     >>
>     1","dbname":"EntrezGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"EntrezGene"}
>     >>
>     >>
>     >>
>     {"display_id":"ENSG00000223972","primary_id":"ENSG00000223972","version":"0","description":"","dbname":"ArrayExpress","synonyms":[],"info_type":"DIRECT","info_text":"","db_display_name":"ArrayExpress"}
>     >>
>     >>
>     {"display_id":"DDX11L5","primary_id":"100287596","version":"0","description":"DEAD/H
>     >> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>     >>
>     5","dbname":"WikiGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"WikiGene"}
>     >>
>     >>
>     {"display_id":"DDX11L1","primary_id":"100287102","version":"0","description":"DEAD/H
>     >> (Asp-Glu-Ala-Asp/His) box helicase 11 like
>     >>
>     1","dbname":"WikiGene","synonyms":[],"info_type":"DEPENDENT","info_text":"","db_display_name":"WikiGene"}]
>     >>
>     >> (2)
>     >>
>     >> Using this perl API code (based on
>     >> http://www.ensembl.org/info/docs/api/core/core_tutorial.html):
>     >>
>     >> # Define a helper subroutine to print DBEntries
>     >> sub print_DBEntries
>     >> {
>     >>     my $db_entries = shift;
>     >>
>     >>     foreach my $dbe ( @{$db_entries} ) {
>     >>         printf "\tXREF %s (%s)\n", $dbe->display_id(),
>     $dbe->dbname();
>     >>     }
>     >> }
>     >>
>     >> my $genes =
>     $gene_adaptor->fetch_all_by_stable_id_list([@gene_list]);
>     >>
>     >>
>     >> ...
>     >>
>     >>
>     >> print "GENE ", $gene->stable_id(), "\n";
>     >> print_DBEntries( $gene->get_all_DBEntries() );
>     >>
>     >> I get:
>     >> XREF OTTHUMG00000000961 (OTTG)
>     >> XREF ENSG00000223972 (ArrayExpress)
>     >> XREF DDX11L1 (EntrezGene)
>     >> XREF DDX11L5 (EntrezGene)
>     >> XREF DDX11L1 (HGNC)
>     >> XREF Hs.618434 (UniGene)
>     >> XREF Hs.714157 (UniGene)
>     >> XREF DDX11L1 (WikiGene)
>     >> XREF DDX11L5 (WikiGene)
>     >>
>     >>
>     >> Questions:
>     >>
>     >> 1. am I correct in saying that the Rest code uses the latest
>     Ensembl
>     >> release while the API code uses the Ensembl release currently
>     installed
>     >> as
>     >> part of the VM (I am using release 74)?
>     >>
>     >> 2. Rest code gives more extensive details (which I like)
>     compared to the
>     >> perl API code. Could you suggest a simple way to use the API to
>     get the
>     >> same details?
>     >>
>     >> 3. The Rest code output format. Is tab separated text supported?
>     >>
>     >> 4. Is there a  file in the Ensembl ftp area which contains pre
>     generated
>     >> detailed cross ref mappings for all current Ensembl genes?
>     >> --
>     >>
>     >> Thanks,
>     >>
>     >> G.
>     >>
>     >
>     >
>     >
>     > --
>     > G.
>     > _______________________________________________
>     > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     > Posting guidelines and subscribe/unsubscribe info:
>     > http://lists.ensembl.org/mailman/listinfo/dev
>     > Ensembl Blog: http://www.ensembl.info/
>     >
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> -- 
> G.
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140327/bdd7fa87/attachment.html>


More information about the Dev mailing list