[ensembl-dev] Ensembl ids and RefSeq ids

Rhoda Kinsella rhoda at ebi.ac.uk
Fri Aug 24 09:25:52 BST 2012


Hi Gustavo
The reason you cannot get the RefSeq ID with the version attached in BioMart is because we use the dbprimary_acc from the xref table which does not have the version. If you take a look at the public MySQL server (see information on how to connect here: http://www.ensembl.org/info/data/mysql.html) you will see that the corresponding display_label contains the version. The display_label is used for the website.

mysql> select * from xref where dbprimary_acc like "NM_203373%";
+---------+----------------+---------------+---------------+---------+-----------------------------------------------------------------------+-----------+--------------------+
| xref_id | external_db_id | dbprimary_acc | display_label | version | description                                                           | info_type | info_text          |
+---------+----------------+---------------+---------------+---------+-----------------------------------------------------------------------+-----------+--------------------+
|    1359 |           1800 | NM_203373     | NM_203373.2   | 2       | NULL                                                                  | NONE      |                    | 
| 4928719 |           1801 | NM_203373     | NM_203373.2   | 2       | Homo sapiens F-box and leucine-rich repeat protein 22 (FBXL22), mRNA. | DIRECT    | Generated via ccds | 
+---------+----------------+---------------+---------------+---------+-----------------------------------------------------------------------+-----------+--------------------+
2 rows in set (1.86 sec)

To get the information pertaining to your specific query you could do something like this:

mysql> select distinct t.stable_id, x.dbprimary_acc, x.display_label, x.version from transcript t, object_xref ox, xref x where x.dbprimary_acc like "NM_203373%" and ox.xref_id=x.xref_id and t.transcript_id=ox.ensembl_id and ox.ensembl_object_type="transcript";
+-----------------+---------------+---------------+---------+
| stable_id       | dbprimary_acc | display_label | version |
+-----------------+---------------+---------------+---------+
| ENST00000539570 | NM_203373     | NM_203373.2   | 2       | 
+-----------------+---------------+---------------+---------+
1 row in set (0.14 sec)


I hope that helps
Regards
Rhoda


On 24 Aug 2012, at 01:59, Gustavo Franca <gsfranca at gmail.com> wrote:

> Hi ib,
> 
> Thank you for all the information. I got your point. Despite the NM_203373.2 and NM_203373 refer to the same transcript, the .2 denotes the annotated version of the RefSeq accession, which can change over time, so, it is important to me to keep the correct RefSeq version associated with each Ensembl Transcript. Anyway, I appreciate your kind help!
> 
> Best,
> Gustavo
> 
> On Thu, Aug 23, 2012 at 7:42 PM, i b <ibseq12 at gmail.com> wrote:
> Hi Gustavo,
> if you know which accession number you want in this "format" you can
> just find it on Pubmed. The number after the accession number, e.g. in
> NM_203373.2, the number 2, does not change the identity of the
> protein/transcript. I have seen this only on pubmed, thus I don't
> think you will find it since NM_203373.2 is the same as NM_203373 as
> RefSeq.
> You can easily see this if on Pubmed, under nucleotide, you digit
> NM_203373 or NM_203373.2....the link is the same
> (http://www.ncbi.nlm.nih.gov/nuccore/NM_203373 and
> http://www.ncbi.nlm.nih.gov/nuccore/NM_203373.2)
> 
> hope it helps,
> ib
> 
> On Thu, Aug 23, 2012 at 10:34 PM, Gustavo Franca <gsfranca at gmail.com> wrote:
> > Hi ib,
> >
> > I have chosen RefSeq mRNA and RefSeq protein ID in Attributes -> External
> > References, but still, I didn't get RefSeq version. For example, via
> > BioMart, I've got:
> >
> > ENSG00000259662    ENST00000539570     NP_976307     NM_203373
> >
> > Note that there are no RefSeq versions. Instead, I would like to get:
> >
> > ENSG00000259662    ENST00000539570    NP_976307.2   NM_203373.2
> >
> > Do you know how to get data this way?
> > Regards,
> > Gustavo
> >
> >
> > On Thu, Aug 23, 2012 at 6:18 PM, i b <ibseq12 at gmail.com> wrote:
> >>
> >> hi gustavo,
> >> I have done a similar thing on biomart and was ok...did you choose
> >> refseq on the list on biomart while doing the conversion.if im not
> >> wrong shoudl be under attributes/external...
> >>
> >> let me know i might be able to do it and see if it works
> >>
> >> regards,
> >> ib
> >>
> >> On Thu, Aug 23, 2012 at 9:52 PM, Gustavo Franca <gsfranca at gmail.com>
> >> wrote:
> >> > Hello,
> >> >
> >> > I would like to retrieve a table with all human genes, containing
> >> > Ensembl
> >> > Gene IDs, Ensembl Transcript IDs, Ensembl Protein IDs and their
> >> > respective
> >> > RefSeq mRNA IDs and RefSeq peptide IDs. As an example, see the last
> >> > table
> >> > (External references) shown in KIR2DL5A page:
> >> >
> >> > http://www.ensembl.org/Homo_sapiens/Gene/Matches?g=ENSG00000215764;r=GL000209.1:7891-96246
> >> >
> >> > I realized that it is possible to get ID conversions via BioMart,
> >> > however,
> >> > the BioMart output does not provide me the version of the RefSeq entry
> >> > as
> >> > shown in the above example. (e.g: ENST00000344867    NP_055034.2
> >> > NM_014219.2). So, I would like to retrieve Ensembl transcripts and their
> >> > corresponding RefSeq versions.
> >> > Anyone could help me out on this?
> >> >
> >> > Thank you very much,
> >> > Gustavo
> >> >
> >> > _______________________________________________
> >> > Dev mailing list    Dev at ensembl.org
> >> > List admin (including subscribe/unsubscribe):
> >> > http://lists.ensembl.org/mailman/listinfo/dev
> >> > Ensembl Blog: http://www.ensembl.info/
> >> >
> >>
> >> _______________________________________________
> >> Dev mailing list    Dev at ensembl.org
> >> List admin (including subscribe/unsubscribe):
> >> http://lists.ensembl.org/mailman/listinfo/dev
> >> Ensembl Blog: http://www.ensembl.info/
> >
> >
> >
> > _______________________________________________
> > Dev mailing list    Dev at ensembl.org
> > List admin (including subscribe/unsubscribe):
> > http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
> >
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

Rhoda Kinsella Ph.D.
Ensembl Production Project Leader,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton,
Cambridge,
CB10 1SD



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120824/7bd24c5a/attachment.html>


More information about the Dev mailing list