[ensembl-dev] question regarding refseq exons retreival
mag
mr6 at ebi.ac.uk
Wed Mar 11 09:35:30 GMT 2015
Hi Duarte,
I am not convinced all genes in Ensembl will have at least one mapping
to RefSeq, but your snippet of code should work regardless.
Regards,
Magali
On 10/03/2015 17:05, Duarte Molha wrote:
> Thanks ... I think I have understood
>
> Just confirm one thing to me ...
>
> if I get all ensembl transcripts of any given gene at least one of
> those transcripts will have a database mapping to refseq correct?
>
> for example ... consider the code:
>
> $transcripts = $gene->get_all_Transcripts(); while ( my $transcript =
> shift @{$transcripts} ) { my %transcripts_refseq_ids = (); foreach my
> $dbe (@{ $transcript->get_all_DBEntries() }) { if($dbe->dbname() eq
> "RefSeq_mRNA") { $transcripts_refseq_ids{ $dbe->display_id() } = 1; } } }
>
> I should be confident that by cycling through all ensembl transcripts
> of a gene and checking for a mRNA refseq entry I should be able to
> pull out all transcripts that map . Correct?
>
> Thanks
>
> Duarte
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
> On 10 March 2015 at 16:20, mag <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>>
> wrote:
>
> Hi Duarte,
>
> It is important to bear in mind that Ensembl and RefSeq
> transcripts are different objects.
>
> There is a large overlap between the two resources, but small
> differences in coding sequence and UTRs mean that there is not
> always a one-to-one mapping between an Ensembl transcript and a
> RefSeq transcript.
> This also means that an Ensembl transcript might overlap some
> RefSeq exons, but not all.
>
> In your use-case however, you should be able to get the
> information you want by replacing the following call:
> $gene->get_all_DBLinks( 'RefSeq_mRNA')
> with $transcript->get_all_DBEntries('RefSeq_mRNA')
>
> RefSeq_mRNA corresponds to RefSeq transcripts, which we
> consequently map to Ensembl transcripts.
> With your current script, you are fetching all genes where at
> least one transcript is mapped to a RefSeq transcript.
> Instead, you can directly fetch only the transcripts which have a
> mapping to RefSeq.
>
>
> Hope that helps,
> Magali
>
> On 10/03/2015 15:30, Duarte Molha wrote:
>> Thanks Keiron
>>
>> But this still leaves me with a question.
>>
>> Say that I have a gene, and I retreive the correct gene object
>> from the ensembl database. How can I output only the transcripts
>> that are referenced in Refseq is not my the way I have done it?
>>
>> If I go the normal way, the $gene->get_all_Transcripts(); method
>> will retrieve all ensembl transcripts. How can I limit it to only
>> get transcripts that are refseq?
>>
>> Thanks
>>
>> Duarte
>>
>> =========================
>> Duarte Miguel Paulo Molha
>> http://about.me/duarte
>> =========================
>>
>> On 10 March 2015 at 15:22, Kieron Taylor <ktaylor at ebi.ac.uk
>> <mailto:ktaylor at ebi.ac.uk>> wrote:
>>
>> Dear Duarte,
>>
>> The issue you have exposed is subtle. You seem to be printing
>> “exon stable IDs” but expecting them to be RefSeq accessions.
>> Our mistake was to use the RefSeq IDs as arbitrary
>> identifiers for internal use, but I must stress the what
>> Ensembl calls a Stable ID must never be assumed to have any
>> meaning outside of an Ensembl database. What you want are
>> display labels. The exon labels were generated by picking
>> only the first of any possible RefSeq IDs, hence you cannot
>> get everything you want in this way.
>>
>> The correct way to handle this in your code is to fetch the
>> transcript name and print that in each exon, as RefSeq IDs
>> refer to transcripts and not exons.
>>
>>
>> Regards,
>>
>> Kieron
>>
>>
>> Kieron Taylor PhD.
>> Ensembl Core senior software developer
>>
>> EMBL, European Bioinformatics Institute
>>
>>
>>
>>
>>
>> > On 10 Mar 2015, at 11:57, Duarte Molha
>> <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>> >
>> > Dear developers
>> >
>> > I have a script that I wrote (in attachment) that gets me
>> the refseq exons for give input gene
>> >
>> > However when I use this code using the gene ASXL1 as an
>> example is:
>> >
>> > test_query.pl <http://test_query.pl> ASXL1
>> >
>> > QueryName feature_type common_name Biotype id
>> chr start end strand
>> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.1
>> chr20 30946147 30946635 +
>> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.2
>> chr20 30954187 30954269 +
>> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.3
>> chr20 30955530 30955532 +
>> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.4
>> chr20 30956818 30956926 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.5 chr20
>> 31015931 31016051 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.6 chr20
>> 31016128 31016225 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.7 chr20
>> 31017141 31017234 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.8 chr20
>> 31017704 31017856 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.9 chr20
>> 31019124 31019287 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.10 chr20
>> 31019386 31019482 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.11 chr20
>> 31020683 31020788 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.12 chr20
>> 31021087 31021720 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.13 chr20
>> 31022235 31027122 +
>> >
>> >
>> > As you can see, I am missing some of the exons for
>> transcript NM_015338.5
>> > In this case, the 1st 3 exons of transcript NM_015338.5
>> are identical to NM_001164603.1, but I would expect to have
>> them listed as :
>> >
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.1 chr20
>> 30946147 30946635 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.2 chr20
>> 30954187 30954269 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.3 chr20
>> 30955530 30955532 +
>> >
>> > Can you tell me what is wrong with my approach and how I
>> can retrieve the missing data?
>> >
>> > Best regards
>> >
>> > Duarte
>> > <test_query.pl
>> <http://test_query.pl>>_______________________________________________
>> > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> > Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150311/6835973a/attachment.html>
More information about the Dev
mailing list