[ensembl-dev] question regarding refseq exons retreival

Duarte Molha duartemolha at gmail.com
Tue Mar 10 17:05:07 GMT 2015


Thanks ... I think I have understood

Just confirm one thing to me ...

if I get all ensembl transcripts of any given gene at least one of those
transcripts will have a database mapping to refseq correct?

for example ... consider the code:

$transcripts = $gene->get_all_Transcripts(); while ( my $transcript = shift
@{$transcripts} ) { my %transcripts_refseq_ids = (); foreach my $dbe (@{
$transcript->get_all_DBEntries() }) { if($dbe->dbname() eq "RefSeq_mRNA") {
$transcripts_refseq_ids{ $dbe->display_id() } = 1; } } }

I should be confident that by cycling through all ensembl transcripts of a
gene and checking for a mRNA refseq entry I should be able to pull out all
transcripts that map . Correct?

Thanks

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On 10 March 2015 at 16:20, mag <mr6 at ebi.ac.uk> wrote:

>  Hi Duarte,
>
> It is important to bear in mind that Ensembl and RefSeq transcripts are
> different objects.
>
> There is a large overlap between the two resources, but small differences
> in coding sequence and UTRs mean that there is not always a one-to-one
> mapping between an Ensembl transcript and a RefSeq transcript.
> This also means that an Ensembl transcript might overlap some RefSeq
> exons, but not all.
>
> In your use-case however, you should be able to get the information you
> want by replacing the following call:
> $gene->get_all_DBLinks( 'RefSeq_mRNA')
> with $transcript->get_all_DBEntries('RefSeq_mRNA')
>
> RefSeq_mRNA corresponds to RefSeq transcripts, which we consequently map
> to Ensembl transcripts.
> With your current script, you are fetching all genes where at least one
> transcript is mapped to a RefSeq transcript.
> Instead, you can directly fetch only the transcripts which have a mapping
> to RefSeq.
>
>
> Hope that helps,
> Magali
>
> On 10/03/2015 15:30, Duarte Molha wrote:
>
> Thanks Keiron
>
>  But this still leaves me with a question.
>
>  Say that I have a gene, and I retreive the correct gene object from the
> ensembl database. How can I output only the transcripts that are referenced
> in Refseq is not my the way I have done it?
>
>  If I go the normal way, the  $gene->get_all_Transcripts(); method will
> retrieve all ensembl transcripts. How can I limit it to only get
> transcripts that are refseq?
>
>  Thanks
>
>  Duarte
>
>  =========================
>      Duarte Miguel Paulo Molha
>           http://about.me/duarte
> =========================
>
> On 10 March 2015 at 15:22, Kieron Taylor <ktaylor at ebi.ac.uk> wrote:
>
>> Dear Duarte,
>>
>> The issue you have exposed is subtle. You seem to be printing “exon
>> stable IDs” but expecting them to be RefSeq accessions. Our mistake was to
>> use the RefSeq IDs as arbitrary identifiers for internal use, but I must
>> stress the what Ensembl calls a Stable ID must never be assumed to have any
>> meaning outside of an Ensembl database. What you want are display labels.
>> The exon labels were generated by picking only the first of any possible
>> RefSeq IDs, hence you cannot get everything you want in this way.
>>
>> The correct way to handle this in your code is to fetch the transcript
>> name and print that in each exon, as RefSeq IDs refer to transcripts and
>> not exons.
>>
>>
>> Regards,
>>
>> Kieron
>>
>>
>> Kieron Taylor PhD.
>> Ensembl Core senior software developer
>>
>> EMBL, European Bioinformatics Institute
>>
>>
>>
>>
>>
>> > On 10 Mar 2015, at 11:57, Duarte Molha <duartemolha at gmail.com> wrote:
>> >
>> > Dear developers
>> >
>> > I have a script that I wrote (in attachment)  that gets me the refseq
>> exons for give input gene
>> >
>> > However when I use this code using the gene ASXL1 as an example is:
>> >
>> > test_query.pl ASXL1
>> >
>> > QueryName     feature_type    common_name     Biotype id      chr
>>  start   end     strand
>> > ASXL1 Exon    ASXL1   protein_coding  NM_001164603.1.1        chr20
>>  30946147        30946635        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_001164603.1.2        chr20
>>  30954187        30954269        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_001164603.1.3        chr20
>>  30955530        30955532        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_001164603.1.4        chr20
>>  30956818        30956926        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.5   chr20   31015931
>>       31016051        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.6   chr20   31016128
>>       31016225        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.7   chr20   31017141
>>       31017234        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.8   chr20   31017704
>>       31017856        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.9   chr20   31019124
>>       31019287        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.10  chr20   31019386
>>       31019482        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.11  chr20   31020683
>>       31020788        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.12  chr20   31021087
>>       31021720        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.13  chr20   31022235
>>       31027122        +
>> >
>> >
>> > As you can see, I am missing some of the exons for transcript
>> NM_015338.5
>> > In this case, the 1st 3 exons of transcript  NM_015338.5 are identical
>> to NM_001164603.1, but I would expect to have them listed as :
>> >
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.1   chr20   30946147
>>       30946635        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.2   chr20   30954187
>>       30954269        +
>> > ASXL1 Exon    ASXL1   protein_coding  NM_015338.5.3   chr20   30955530
>>       30955532        +
>> >
>> > Can you tell me what is wrong with my approach and how I can retrieve
>> the missing data?
>> >
>> > Best regards
>> >
>> > Duarte
>>  > <test_query.pl>_______________________________________________
>> > Dev mailing list    Dev at ensembl.org
>> > Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150310/9d362eaa/attachment.html>


More information about the Dev mailing list