[ensembl-dev] question regarding refseq exons retreival
mag
mr6 at ebi.ac.uk
Tue Mar 10 16:20:08 GMT 2015
Hi Duarte,
It is important to bear in mind that Ensembl and RefSeq transcripts are
different objects.
There is a large overlap between the two resources, but small
differences in coding sequence and UTRs mean that there is not always a
one-to-one mapping between an Ensembl transcript and a RefSeq transcript.
This also means that an Ensembl transcript might overlap some RefSeq
exons, but not all.
In your use-case however, you should be able to get the information you
want by replacing the following call:
$gene->get_all_DBLinks( 'RefSeq_mRNA')
with $transcript->get_all_DBEntries('RefSeq_mRNA')
RefSeq_mRNA corresponds to RefSeq transcripts, which we consequently map
to Ensembl transcripts.
With your current script, you are fetching all genes where at least one
transcript is mapped to a RefSeq transcript.
Instead, you can directly fetch only the transcripts which have a
mapping to RefSeq.
Hope that helps,
Magali
On 10/03/2015 15:30, Duarte Molha wrote:
> Thanks Keiron
>
> But this still leaves me with a question.
>
> Say that I have a gene, and I retreive the correct gene object from
> the ensembl database. How can I output only the transcripts that are
> referenced in Refseq is not my the way I have done it?
>
> If I go the normal way, the $gene->get_all_Transcripts(); method will
> retrieve all ensembl transcripts. How can I limit it to only get
> transcripts that are refseq?
>
> Thanks
>
> Duarte
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
> On 10 March 2015 at 15:22, Kieron Taylor <ktaylor at ebi.ac.uk
> <mailto:ktaylor at ebi.ac.uk>> wrote:
>
> Dear Duarte,
>
> The issue you have exposed is subtle. You seem to be printing
> “exon stable IDs” but expecting them to be RefSeq accessions. Our
> mistake was to use the RefSeq IDs as arbitrary identifiers for
> internal use, but I must stress the what Ensembl calls a Stable ID
> must never be assumed to have any meaning outside of an Ensembl
> database. What you want are display labels. The exon labels were
> generated by picking only the first of any possible RefSeq IDs,
> hence you cannot get everything you want in this way.
>
> The correct way to handle this in your code is to fetch the
> transcript name and print that in each exon, as RefSeq IDs refer
> to transcripts and not exons.
>
>
> Regards,
>
> Kieron
>
>
> Kieron Taylor PhD.
> Ensembl Core senior software developer
>
> EMBL, European Bioinformatics Institute
>
>
>
>
>
> > On 10 Mar 2015, at 11:57, Duarte Molha <duartemolha at gmail.com
> <mailto:duartemolha at gmail.com>> wrote:
> >
> > Dear developers
> >
> > I have a script that I wrote (in attachment) that gets me the
> refseq exons for give input gene
> >
> > However when I use this code using the gene ASXL1 as an example is:
> >
> > test_query.pl <http://test_query.pl> ASXL1
> >
> > QueryName feature_type common_name Biotype id chr
> start end strand
> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.1
> chr20 30946147 30946635 +
> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.2
> chr20 30954187 30954269 +
> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.3
> chr20 30955530 30955532 +
> > ASXL1 Exon ASXL1 protein_coding NM_001164603.1.4
> chr20 30956818 30956926 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.5 chr20
> 31015931 31016051 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.6 chr20
> 31016128 31016225 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.7 chr20
> 31017141 31017234 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.8 chr20
> 31017704 31017856 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.9 chr20
> 31019124 31019287 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.10 chr20
> 31019386 31019482 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.11 chr20
> 31020683 31020788 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.12 chr20
> 31021087 31021720 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.13 chr20
> 31022235 31027122 +
> >
> >
> > As you can see, I am missing some of the exons for transcript
> NM_015338.5
> > In this case, the 1st 3 exons of transcript NM_015338.5 are
> identical to NM_001164603.1, but I would expect to have them
> listed as :
> >
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.1 chr20
> 30946147 30946635 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.2 chr20
> 30954187 30954269 +
> > ASXL1 Exon ASXL1 protein_coding NM_015338.5.3 chr20
> 30955530 30955532 +
> >
> > Can you tell me what is wrong with my approach and how I can
> retrieve the missing data?
> >
> > Best regards
> >
> > Duarte
> > <test_query.pl
> <http://test_query.pl>>_______________________________________________
> > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> > Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> > Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150310/16fdea14/attachment.html>
More information about the Dev
mailing list