[ensembl-dev] Ensembl ids and RefSeq ids

Gustavo Franca gsfranca at gmail.com
Fri Aug 24 13:29:08 BST 2012


Hi Rhoda,
Thank you very much for the explanation. I will try to access via MySQL and
make the selections according to the example.
Regards,
Gustavo

On Fri, Aug 24, 2012 at 5:25 AM, Rhoda Kinsella <rhoda at ebi.ac.uk> wrote:

> Hi Gustavo
> The reason you cannot get the RefSeq ID with the version attached in
> BioMart is because we use the dbprimary_acc from the xref table which does
> not have the version. If you take a look at the public MySQL server (see
> information on how to connect here:
> http://www.ensembl.org/info/data/mysql.html) you will see that the
> corresponding display_label contains the version. The display_label is used
> for the website.
>
> mysql> select * from xref where dbprimary_acc like "NM_203373%";
>
> +---------+----------------+---------------+---------------+---------+-----------------------------------------------------------------------+-----------+--------------------+
> | xref_id | external_db_id | dbprimary_acc | display_label | version |
> description                                                           |
> info_type | info_text          |
>
> +---------+----------------+---------------+---------------+---------+-----------------------------------------------------------------------+-----------+--------------------+
> |    1359 |           1800 | NM_203373     | NM_203373.2   | 2       |
> NULL                                                                  |
> NONE      |                    |
> | 4928719 |           1801 | NM_203373     | NM_203373.2   | 2       |
> Homo sapiens F-box and leucine-rich repeat protein 22 (FBXL22), mRNA. |
> DIRECT    | Generated via ccds |
>
> +---------+----------------+---------------+---------------+---------+-----------------------------------------------------------------------+-----------+--------------------+
> 2 rows in set (1.86 sec)
>
> To get the information pertaining to your specific query you could do
> something like this:
>
> mysql> select distinct t.stable_id, x.dbprimary_acc, x.display_label,
> x.version from transcript t, object_xref ox, xref x where x.dbprimary_acc
> like "NM_203373%" and ox.xref_id=x.xref_id and
> t.transcript_id=ox.ensembl_id and ox.ensembl_object_type="transcript";
> +-----------------+---------------+---------------+---------+
> | stable_id       | dbprimary_acc | display_label | version |
> +-----------------+---------------+---------------+---------+
> | ENST00000539570 | NM_203373     | NM_203373.2   | 2       |
> +-----------------+---------------+---------------+---------+
> 1 row in set (0.14 sec)
>
>
> I hope that helps
> Regards
> Rhoda
>
>
> On 24 Aug 2012, at 01:59, Gustavo Franca <gsfranca at gmail.com> wrote:
>
> Hi ib,
>
> Thank you for all the information. I got your point. Despite
> the NM_203373.2 and NM_203373 refer to the same transcript, the .2 denotes
> the annotated version of the RefSeq accession, which can change over time,
> so, it is important to me to keep the correct RefSeq version associated
> with each Ensembl Transcript. Anyway, I appreciate your kind help!
>
> Best,
> Gustavo
>
> On Thu, Aug 23, 2012 at 7:42 PM, i b <ibseq12 at gmail.com> wrote:
>
>> Hi Gustavo,
>> if you know which accession number you want in this "format" you can
>> just find it on Pubmed. The number after the accession number, e.g. in
>> NM_203373.2, the number 2, does not change the identity of the
>> protein/transcript. I have seen this only on pubmed, thus I don't
>> think you will find it since NM_203373.2 is the same as NM_203373 as
>> RefSeq.
>> You can easily see this if on Pubmed, under nucleotide, you digit
>> NM_203373 or NM_203373.2....the link is the same
>> (http://www.ncbi.nlm.nih.gov/nuccore/NM_203373 and
>> http://www.ncbi.nlm.nih.gov/nuccore/NM_203373.2)
>>
>> hope it helps,
>> ib
>>
>> On Thu, Aug 23, 2012 at 10:34 PM, Gustavo Franca <gsfranca at gmail.com>
>> wrote:
>> > Hi ib,
>> >
>> > I have chosen RefSeq mRNA and RefSeq protein ID in Attributes ->
>> External
>> > References, but still, I didn't get RefSeq version. For example, via
>> > BioMart, I've got:
>> >
>> > ENSG00000259662    ENST00000539570     NP_976307     NM_203373
>> >
>> > Note that there are no RefSeq versions. Instead, I would like to get:
>> >
>> > ENSG00000259662    ENST00000539570    NP_976307.2   NM_203373.2
>> >
>> > Do you know how to get data this way?
>> > Regards,
>> > Gustavo
>> >
>> >
>> > On Thu, Aug 23, 2012 at 6:18 PM, i b <ibseq12 at gmail.com> wrote:
>> >>
>> >> hi gustavo,
>> >> I have done a similar thing on biomart and was ok...did you choose
>> >> refseq on the list on biomart while doing the conversion.if im not
>> >> wrong shoudl be under attributes/external...
>> >>
>> >> let me know i might be able to do it and see if it works
>> >>
>> >> regards,
>> >> ib
>> >>
>> >> On Thu, Aug 23, 2012 at 9:52 PM, Gustavo Franca <gsfranca at gmail.com>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > I would like to retrieve a table with all human genes, containing
>> >> > Ensembl
>> >> > Gene IDs, Ensembl Transcript IDs, Ensembl Protein IDs and their
>> >> > respective
>> >> > RefSeq mRNA IDs and RefSeq peptide IDs. As an example, see the last
>> >> > table
>> >> > (External references) shown in KIR2DL5A page:
>> >> >
>> >> >
>> http://www.ensembl.org/Homo_sapiens/Gene/Matches?g=ENSG00000215764;r=GL000209.1:7891-96246
>> >> >
>> >> > I realized that it is possible to get ID conversions via BioMart,
>> >> > however,
>> >> > the BioMart output does not provide me the version of the RefSeq
>> entry
>> >> > as
>> >> > shown in the above example. (e.g: ENST00000344867    NP_055034.2
>> >> > NM_014219.2). So, I would like to retrieve Ensembl transcripts and
>> their
>> >> > corresponding RefSeq versions.
>> >> > Anyone could help me out on this?
>> >> >
>> >> > Thank you very much,
>> >> > Gustavo
>> >> >
>> >> > _______________________________________________
>> >> > Dev mailing list    Dev at ensembl.org
>> >> > List admin (including subscribe/unsubscribe):
>> >> > http://lists.ensembl.org/mailman/listinfo/dev
>> >> > Ensembl Blog: http://www.ensembl.info/
>> >> >
>> >>
>> >> _______________________________________________
>> >> Dev mailing list    Dev at ensembl.org
>> >> List admin (including subscribe/unsubscribe):
>> >> http://lists.ensembl.org/mailman/listinfo/dev
>> >> Ensembl Blog: http://www.ensembl.info/
>> >
>> >
>> >
>> > _______________________________________________
>> > Dev mailing list    Dev at ensembl.org
>> > List admin (including subscribe/unsubscribe):
>> > http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>> >
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> Rhoda Kinsella Ph.D.
> Ensembl Production Project Leader,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton,
> Cambridge,
> CB10 1SD
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe):
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20120824/b23d469f/attachment.html>


More information about the Dev mailing list