[ensembl-dev] question regarding refseq exons retreival

mag mr6 at ebi.ac.uk
Wed Mar 11 09:35:30 GMT 2015


Hi Duarte,

I am not convinced all genes in Ensembl will have at least one mapping 
to RefSeq, but your snippet of code should work regardless.


Regards,
Magali

On 10/03/2015 17:05, Duarte Molha wrote:
> Thanks ... I think I have understood
>
> Just confirm one thing to me ...
>
> if I get all ensembl transcripts of any given gene at least one of 
> those transcripts will have a database mapping to refseq correct?
>
> for example ... consider the code:
>
> $transcripts = $gene->get_all_Transcripts(); while ( my $transcript = 
> shift @{$transcripts} ) { my %transcripts_refseq_ids = (); foreach my 
> $dbe (@{ $transcript->get_all_DBEntries() }) { if($dbe->dbname() eq 
> "RefSeq_mRNA") { $transcripts_refseq_ids{ $dbe->display_id() } = 1; } } }
>
> I should be confident that by cycling through all ensembl transcripts 
> of a gene and checking for a mRNA refseq entry I should be able to 
> pull out all transcripts that map . Correct?
>
> Thanks
>
> Duarte
>
> =========================
>      Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
> On 10 March 2015 at 16:20, mag <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>> 
> wrote:
>
>     Hi Duarte,
>
>     It is important to bear in mind that Ensembl and RefSeq
>     transcripts are different objects.
>
>     There is a large overlap between the two resources, but small
>     differences in coding sequence and UTRs mean that there is not
>     always a one-to-one mapping between an Ensembl transcript and a
>     RefSeq transcript.
>     This also means that an Ensembl transcript might overlap some
>     RefSeq exons, but not all.
>
>     In your use-case however, you should be able to get the
>     information you want by replacing the following call:
>     $gene->get_all_DBLinks( 'RefSeq_mRNA')
>     with $transcript->get_all_DBEntries('RefSeq_mRNA')
>
>     RefSeq_mRNA corresponds to RefSeq transcripts, which we
>     consequently map to Ensembl transcripts.
>     With your current script, you are fetching all genes where at
>     least one transcript is mapped to a RefSeq transcript.
>     Instead, you can directly fetch only the transcripts which have a
>     mapping to RefSeq.
>
>
>     Hope that helps,
>     Magali
>
>     On 10/03/2015 15:30, Duarte Molha wrote:
>>     Thanks Keiron
>>
>>     But this still leaves me with a question.
>>
>>     Say that I have a gene, and I retreive the correct gene object
>>     from the ensembl database. How can I output only the transcripts
>>     that are referenced in Refseq is not my the way I have done it?
>>
>>     If I go the normal way, the  $gene->get_all_Transcripts(); method
>>     will retrieve all ensembl transcripts. How can I limit it to only
>>     get transcripts that are refseq?
>>
>>     Thanks
>>
>>     Duarte
>>
>>     =========================
>>          Duarte Miguel Paulo Molha
>>     http://about.me/duarte
>>     =========================
>>
>>     On 10 March 2015 at 15:22, Kieron Taylor <ktaylor at ebi.ac.uk
>>     <mailto:ktaylor at ebi.ac.uk>> wrote:
>>
>>         Dear Duarte,
>>
>>         The issue you have exposed is subtle. You seem to be printing
>>         “exon stable IDs” but expecting them to be RefSeq accessions.
>>         Our mistake was to use the RefSeq IDs as arbitrary
>>         identifiers for internal use, but I must stress the what
>>         Ensembl calls a Stable ID must never be assumed to have any
>>         meaning outside of an Ensembl database. What you want are
>>         display labels. The exon labels were generated by picking
>>         only the first of any possible RefSeq IDs, hence you cannot
>>         get everything you want in this way.
>>
>>         The correct way to handle this in your code is to fetch the
>>         transcript name and print that in each exon, as RefSeq IDs
>>         refer to transcripts and not exons.
>>
>>
>>         Regards,
>>
>>         Kieron
>>
>>
>>         Kieron Taylor PhD.
>>         Ensembl Core senior software developer
>>
>>         EMBL, European Bioinformatics Institute
>>
>>
>>
>>
>>
>>         > On 10 Mar 2015, at 11:57, Duarte Molha
>>         <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>>         >
>>         > Dear developers
>>         >
>>         > I have a script that I wrote (in attachment)  that gets me
>>         the refseq exons for give input gene
>>         >
>>         > However when I use this code using the gene ASXL1 as an
>>         example is:
>>         >
>>         > test_query.pl <http://test_query.pl> ASXL1
>>         >
>>         > QueryName     feature_type common_name     Biotype id     
>>         chr  start   end     strand
>>         > ASXL1 Exon    ASXL1   protein_coding NM_001164603.1.1     
>>           chr20   30946147       30946635        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_001164603.1.2     
>>           chr20   30954187       30954269        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_001164603.1.3     
>>           chr20   30955530       30955532        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_001164603.1.4     
>>           chr20   30956818       30956926        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.5   chr20 
>>          31015931 31016051        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.6   chr20 
>>          31016128 31016225        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.7   chr20 
>>          31017141 31017234        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.8   chr20 
>>          31017704 31017856        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.9   chr20 
>>          31019124 31019287        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.10  chr20 
>>          31019386 31019482        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.11  chr20 
>>          31020683 31020788        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.12  chr20 
>>          31021087 31021720        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.13  chr20 
>>          31022235 31027122        +
>>         >
>>         >
>>         > As you can see, I am missing some of the exons for
>>         transcript NM_015338.5
>>         > In this case, the 1st 3 exons of transcript  NM_015338.5
>>         are identical to NM_001164603.1, but I would expect to have
>>         them listed as :
>>         >
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.1   chr20 
>>          30946147 30946635        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.2   chr20 
>>          30954187 30954269        +
>>         > ASXL1 Exon    ASXL1   protein_coding NM_015338.5.3   chr20 
>>          30955530 30955532        +
>>         >
>>         > Can you tell me what is wrong with my approach and how I
>>         can retrieve the missing data?
>>         >
>>         > Best regards
>>         >
>>         > Duarte
>>         > <test_query.pl
>>         <http://test_query.pl>>_______________________________________________
>>         > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         > Posting guidelines and subscribe/unsubscribe info:
>>         http://lists.ensembl.org/mailman/listinfo/dev
>>         > Ensembl Blog: http://www.ensembl.info/
>>
>>
>>         _______________________________________________
>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         Posting guidelines and subscribe/unsubscribe info:
>>         http://lists.ensembl.org/mailman/listinfo/dev
>>         Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150311/6835973a/attachment.html>


More information about the Dev mailing list