[ensembl-dev] question regarding refseq exons retreival

mag mr6 at ebi.ac.uk
Tue Mar 10 16:20:08 GMT 2015


Hi Duarte,

It is important to bear in mind that Ensembl and RefSeq transcripts are 
different objects.

There is a large overlap between the two resources, but small 
differences in coding sequence and UTRs mean that there is not always a 
one-to-one mapping between an Ensembl transcript and a RefSeq transcript.
This also means that an Ensembl transcript might overlap some RefSeq 
exons, but not all.

In your use-case however, you should be able to get the information you 
want by replacing the following call:
$gene->get_all_DBLinks( 'RefSeq_mRNA')
with $transcript->get_all_DBEntries('RefSeq_mRNA')

RefSeq_mRNA corresponds to RefSeq transcripts, which we consequently map 
to Ensembl transcripts.
With your current script, you are fetching all genes where at least one 
transcript is mapped to a RefSeq transcript.
Instead, you can directly fetch only the transcripts which have a 
mapping to RefSeq.


Hope that helps,
Magali

On 10/03/2015 15:30, Duarte Molha wrote:
> Thanks Keiron
>
> But this still leaves me with a question.
>
> Say that I have a gene, and I retreive the correct gene object from 
> the ensembl database. How can I output only the transcripts that are 
> referenced in Refseq is not my the way I have done it?
>
> If I go the normal way, the  $gene->get_all_Transcripts(); method will 
> retrieve all ensembl transcripts. How can I limit it to only get 
> transcripts that are refseq?
>
> Thanks
>
> Duarte
>
> =========================
>      Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
> On 10 March 2015 at 15:22, Kieron Taylor <ktaylor at ebi.ac.uk 
> <mailto:ktaylor at ebi.ac.uk>> wrote:
>
>     Dear Duarte,
>
>     The issue you have exposed is subtle. You seem to be printing
>     “exon stable IDs” but expecting them to be RefSeq accessions. Our
>     mistake was to use the RefSeq IDs as arbitrary identifiers for
>     internal use, but I must stress the what Ensembl calls a Stable ID
>     must never be assumed to have any meaning outside of an Ensembl
>     database. What you want are display labels. The exon labels were
>     generated by picking only the first of any possible RefSeq IDs,
>     hence you cannot get everything you want in this way.
>
>     The correct way to handle this in your code is to fetch the
>     transcript name and print that in each exon, as RefSeq IDs refer
>     to transcripts and not exons.
>
>
>     Regards,
>
>     Kieron
>
>
>     Kieron Taylor PhD.
>     Ensembl Core senior software developer
>
>     EMBL, European Bioinformatics Institute
>
>
>
>
>
>     > On 10 Mar 2015, at 11:57, Duarte Molha <duartemolha at gmail.com
>     <mailto:duartemolha at gmail.com>> wrote:
>     >
>     > Dear developers
>     >
>     > I have a script that I wrote (in attachment)  that gets me the
>     refseq exons for give input gene
>     >
>     > However when I use this code using the gene ASXL1 as an example is:
>     >
>     > test_query.pl <http://test_query.pl> ASXL1
>     >
>     > QueryName     feature_type    common_name  Biotype id      chr 
>        start   end     strand
>     > ASXL1 Exon    ASXL1   protein_coding NM_001164603.1.1       
>     chr20   30946147 30946635        +
>     > ASXL1 Exon    ASXL1   protein_coding NM_001164603.1.2       
>     chr20   30954187 30954269        +
>     > ASXL1 Exon    ASXL1   protein_coding NM_001164603.1.3       
>     chr20   30955530 30955532        +
>     > ASXL1 Exon    ASXL1   protein_coding NM_001164603.1.4       
>     chr20   30956818 30956926        +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.5   chr20 
>      31015931        31016051 +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.6   chr20 
>      31016128        31016225 +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.7   chr20 
>      31017141        31017234 +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.8   chr20 
>      31017704        31017856 +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.9   chr20 
>      31019124        31019287 +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.10  chr20 
>      31019386        31019482 +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.11  chr20 
>      31020683        31020788 +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.12  chr20 
>      31021087        31021720 +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.13  chr20 
>      31022235        31027122 +
>     >
>     >
>     > As you can see, I am missing some of the exons for transcript
>     NM_015338.5
>     > In this case, the 1st 3 exons of transcript NM_015338.5 are
>     identical to NM_001164603.1, but I would expect to have them
>     listed as :
>     >
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.1   chr20 
>      30946147        30946635 +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.2   chr20 
>      30954187        30954269 +
>     > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.3   chr20 
>      30955530        30955532 +
>     >
>     > Can you tell me what is wrong with my approach and how I can
>     retrieve the missing data?
>     >
>     > Best regards
>     >
>     > Duarte
>     > <test_query.pl
>     <http://test_query.pl>>_______________________________________________
>     > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     > Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     > Ensembl Blog: http://www.ensembl.info/
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150310/16fdea14/attachment.html>


More information about the Dev mailing list