[ensembl-dev] question regarding refseq exons retreival
Duarte Molha
duartemolha at gmail.com
Tue Mar 10 17:05:07 GMT 2015
Thanks ... I think I have understood
Just confirm one thing to me ...
if I get all ensembl transcripts of any given gene at least one of those
transcripts will have a database mapping to refseq correct?
for example ... consider the code:
$transcripts = $gene->get_all_Transcripts(); while ( my $transcript = shift
@{$transcripts} ) { my %transcripts_refseq_ids = (); foreach my $dbe (@{
$transcript->get_all_DBEntries() }) { if($dbe->dbname() eq "RefSeq_mRNA") {
$transcripts_refseq_ids{ $dbe->display_id() } = 1; } } }
I should be confident that by cycling through all ensembl transcripts of a
gene and checking for a mRNA refseq entry I should be able to pull out all
transcripts that map . Correct?
Thanks
Duarte
=========================
Duarte Miguel Paulo Molha
http://about.me/duarte
=========================
On 10 March 2015 at 16:20, mag <mr6 at ebi.ac.uk> wrote:
> Hi Duarte,
>
> It is important to bear in mind that Ensembl and RefSeq transcripts are
> different objects.
>
> There is a large overlap between the two resources, but small differences
> in coding sequence and UTRs mean that there is not always a one-to-one
> mapping between an Ensembl transcript and a RefSeq transcript.
> This also means that an Ensembl transcript might overlap some RefSeq
> exons, but not all.
>
> In your use-case however, you should be able to get the information you
> want by replacing the following call:
> $gene->get_all_DBLinks( 'RefSeq_mRNA')
> with $transcript->get_all_DBEntries('RefSeq_mRNA')
>
> RefSeq_mRNA corresponds to RefSeq transcripts, which we consequently map
> to Ensembl transcripts.
> With your current script, you are fetching all genes where at least one
> transcript is mapped to a RefSeq transcript.
> Instead, you can directly fetch only the transcripts which have a mapping
> to RefSeq.
>
>
> Hope that helps,
> Magali
>
> On 10/03/2015 15:30, Duarte Molha wrote:
>
> Thanks Keiron
>
> But this still leaves me with a question.
>
> Say that I have a gene, and I retreive the correct gene object from the
> ensembl database. How can I output only the transcripts that are referenced
> in Refseq is not my the way I have done it?
>
> If I go the normal way, the $gene->get_all_Transcripts(); method will
> retrieve all ensembl transcripts. How can I limit it to only get
> transcripts that are refseq?
>
> Thanks
>
> Duarte
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
> On 10 March 2015 at 15:22, Kieron Taylor <ktaylor at ebi.ac.uk> wrote:
>
>> Dear Duarte,
>>
>> The issue you have exposed is subtle. You seem to be printing “exon
>> stable IDs” but expecting them to be RefSeq accessions. Our mistake was to
>> use the RefSeq IDs as arbitrary identifiers for internal use, but I must
>> stress the what Ensembl calls a Stable ID must never be assumed to have any
>> meaning outside of an Ensembl database. What you want are display labels.
>> The exon labels were generated by picking only the first of any possible
>> RefSeq IDs, hence you cannot get everything you want in this way.
>>
>> The correct way to handle this in your code is to fetch the transcript
>> name and print that in each exon, as RefSeq IDs refer to transcripts and
>> not exons.
>>
>>
>> Regards,
>>
>> Kieron
>>
>>
>> Kieron Taylor PhD.
>> Ensembl Core senior software developer
>>
>> EMBL, European Bioinformatics Institute
>>
>>
>>
>>
>>
>> > On 10 Mar 2015, at 11:57, Duarte Molha <duartemolha at gmail.com> wrote:
>> >
>> > Dear developers
>> >
>> > I have a script that I wrote (in attachment) that gets me the refseq
>> exons for give input gene
>> >
>> > However when I use this code using the gene ASXL1 as an example is:
>> >
>> > test_query.pl ASXL1
>> >
>> > QueryName feature_type common_name Biotype id chr
>> start end strand
>> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.1 chr20
>> 30946147 30946635 +
>> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.2 chr20
>> 30954187 30954269 +
>> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.3 chr20
>> 30955530 30955532 +
>> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.4 chr20
>> 30956818 30956926 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.5 chr20 31015931
>> 31016051 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.6 chr20 31016128
>> 31016225 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.7 chr20 31017141
>> 31017234 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.8 chr20 31017704
>> 31017856 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.9 chr20 31019124
>> 31019287 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.10 chr20 31019386
>> 31019482 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.11 chr20 31020683
>> 31020788 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.12 chr20 31021087
>> 31021720 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.13 chr20 31022235
>> 31027122 +
>> >
>> >
>> > As you can see, I am missing some of the exons for transcript
>> NM_015338.5
>> > In this case, the 1st 3 exons of transcript NM_015338.5 are identical
>> to NM_001164603.1, but I would expect to have them listed as :
>> >
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.1 chr20 30946147
>> 30946635 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.2 chr20 30954187
>> 30954269 +
>> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.3 chr20 30955530
>> 30955532 +
>> >
>> > Can you tell me what is wrong with my approach and how I can retrieve
>> the missing data?
>> >
>> > Best regards
>> >
>> > Duarte
>> > <test_query.pl>_______________________________________________
>> > Dev mailing list Dev at ensembl.org
>> > Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150310/9d362eaa/attachment.html>
More information about the Dev
mailing list