[ensembl-dev] Question regarding UTR retrieval from database

Andy Yates ayates at ebi.ac.uk
Fri Apr 17 11:16:19 BST 2015


Hi Duarte

Indeed part of our move to GitHub was to encourage external developer 
communication by making the process easier for both parties involved. We 
have been lucky to receive a number of enhancements across all Ensembl 
projects for which we are grateful to the wider Ensembl community.

As with all open source projects appropriate contributions are appreciated

Andy

Duarte Molha wrote:
> Well... that was probably the whole point to moving your development to
> github no ?
>
> Don't know if you are taking pull requests from contributions from the
> general public, but if you aren't ... you should. ;-)
>
> Thanks
>
> Duarte
>
> =========================
>       Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
> On 17 April 2015 at 09:51, Kieron Taylor <ktaylor at ebi.ac.uk
> <mailto:ktaylor at ebi.ac.uk>> wrote:
>
>     Hi Duarte,
>
>     Given the namespace of the method you’ve found, I would strongly
>     recommend caution. Bioinformatics formats are notoriously diverse
>     and loosely specified, hence code to handle them is often somewhat
>     bespoke. At any rate, that method is not helpful for your case.
>
>     Thank you for bringing your needs to our attention, we can perhaps
>     add support in future releases should time allow. Convenient methods
>     are often missing from our API, due to lack of apparent need and a
>     general shortage of developer time.
>
>     Kieron
>
>
>     Kieron Taylor PhD.
>     Ensembl Core senior software developer
>
>     EMBL, European Bioinformatics Institute
>
>
>
>
>
>      > On 16 Apr 2015, at 21:38, Duarte Molha <duartemolha at gmail.com
>     <mailto:duartemolha at gmail.com>> wrote:
>      >
>      > Actualy do have an undocumented method called public
>     Bio::EnsEMBL::Utils::IO::GTFSerializer::get_all_UTR_features()
>      >
>      > What is that all about ?
>      >
>      > =========================
>      >      Duarte Miguel Paulo Molha
>      > http://about.me/duarte
>      > =========================
>      >
>      > On 16 April 2015 at 21:35, Duarte Molha <duartemolha at gmail.com
>     <mailto:duartemolha at gmail.com>> wrote:
>      > Yes... I knew of that method... had just forgot it. I still think
>     the reverse of it would be useful on it own.
>     get_all_untranslateable_Exons.
>      >
>      >
>      > =========================
>      >      Duarte Miguel Paulo Molha
>      > http://about.me/duarte
>      > =========================
>      >
>      > On 16 April 2015 at 21:29, <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>>
>     wrote:
>      > Hi Duarte,
>      >
>      > You might find the get_all_translateable_Exons method useful.
>      >
>     http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Transcript.html#a17e718ddd3d054de7b358029e6d48d20
>      >
>      > This would correspond to the get_coding_regions you are looking
>     for, as
>      > the exons returned are truncated to their coding region.
>      > For the get_noncoding_regions however, you would need to look at
>     all the
>      > features from get_all_Exons that are not in
>     get_all_translateable_Exons.
>      >
>      >
>      > Hope that helps,
>      > Magali
>      >
>      > > Thanks Kieron
>      > >
>      > > I understand your point of view... but I still think there is a
>     case for a
>      > > couple of methods to be implemented in the transcript object:
>      > > @{$transcript->get_coding_regions} and
>      > > @{$transcript->get_noncoding_regions}
>      > >
>      > > Both returning feature objects. Am I the only one to find these
>     useful? I
>      > > hope not :)
>      > >
>      > > Thanks
>      > >
>      > > Duarte
>      > >
>      > >
>      > >
>      > > =========================
>      > >      Duarte Miguel Paulo Molha
>      > > http://about.me/duarte
>      > > =========================
>      > >
>      > > On 16 April 2015 at 16:19, Kieron Taylor <ktaylor at ebi.ac.uk
>     <mailto:ktaylor at ebi.ac.uk>> wrote:
>      > >
>      > >> Hi Duarte,
>      > >>
>      > >> The coordinates you’re getting back are pre-splicing. The
>     method you’re
>      > >> calling is from the Transcript class, hence the response is with
>      > >> reference
>      > >> to that object. If you’re after exon coordinates, you should be
>      > >> attempting
>      > >> to work with exon objects, such as fetching the exons of the
>     transcript
>      > >> and
>      > >> asking them for coding_region_start($transcript) until numbers
>     start
>      > >> appearing. Your workaround is also a valid approach.
>      > >>
>      > >> My explanation isn’t very satisfactory, but we try to avoid
>     writing
>      > >> methods that need complex return types, such as the list of lists
>      > >> required
>      > >> for your usecase. More often than not, users require other
>     attributes of
>      > >> the objects too, so you would still end up with a list of
>     exons. I hope
>      > >> that helps.
>      > >>
>      > >> Regards,
>      > >>
>      > >> Kieron
>      > >>
>      > >>
>      > >> Kieron Taylor PhD.
>      > >> Ensembl Core senior software developer
>      > >>
>      > >> EMBL, European Bioinformatics Institute
>      > >>
>      > >>
>      > >>
>      > >>
>      > >>
>      > >> > On 16 Apr 2015, at 09:15, Duarte Molha
>     <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>      > >> >
>      > >> > Anyone able to provide me some help on this?
>      > >> >
>      > >> > I have now found away around this issue by finding the
>     exonic regions
>      > >> within the reported URT, but would very much like to
>     understand the
>      > >> thinking behind this.
>      > >> >
>      > >> > Best regards
>      > >> >
>      > >> > Duarte
>      > >> >
>      > >> >
>      > >> > =========================
>      > >> >      Duarte Miguel Paulo Molha
>      > >> > http://about.me/duarte
>      > >> > =========================
>      > >> >
>      > >> > On 14 April 2015 at 13:40, Duarte Molha
>     <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>      > >> > Dear Developers
>      > >> >
>      > >> > Please consider the transcript :
>      > >> >
>      > >> > ENST00000470357
>      > >> >
>      > >> > I am trying to retrieve the coordinates of UTR regions of this
>      > >> transcript
>      > >> > To this end I have a script that basicaly starts with the
>     transcript
>      > >> feature object $transcript
>      > >> >
>      > >> > my $five_prime  = $transcript->five_prime_utr_Feature;
>      > >> >
>      > >> > $feature_params->{start} = $five_prime->start;
>      > >> > $feature_params->{end}  = $five_prime->end;
>      > >> >
>      > >> > However, in this case the script will output the coordinates
>     from the
>      > >> start of the 1st non_coding exon to the end of the non-coding
>     portion of
>      > >> the 3rd exon (chr1     7772707 7777171).
>      > >> > How can I change this so that the script will only output the
>      > >> coordinates of the non-coding exon portions?
>      > >> >
>      > >> > In this case I would like to output:
>      > >> >
>      > >> > chr1  7772707 7773198
>      > >> > chr1  7773442 7773511
>      > >> > chr1  7777160 7777171
>      > >> >
>      > >> > This there a simple way of achieving this?
>      > >> >
>      > >> > Many thanks
>      > >> >
>      > >> > Duarte
>      > >> >
>      > >> >
>      > >> > _______________________________________________
>      > >> > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>      > >> > Posting guidelines and subscribe/unsubscribe info:
>      > >> http://lists.ensembl.org/mailman/listinfo/dev
>      > >> > Ensembl Blog: http://www.ensembl.info/
>      > >>
>      > >>
>      > >> _______________________________________________
>      > >> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>      > >> Posting guidelines and subscribe/unsubscribe info:
>      > >> http://lists.ensembl.org/mailman/listinfo/dev
>      > >> Ensembl Blog: http://www.ensembl.info/
>      > >>
>      > > _______________________________________________
>      > > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>      > > Posting guidelines and subscribe/unsubscribe info:
>      > > http://lists.ensembl.org/mailman/listinfo/dev
>      > > Ensembl Blog: http://www.ensembl.info/
>      > >
>      >
>      >
>      >
>      > _______________________________________________
>      > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>      > Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>      > Ensembl Blog: http://www.ensembl.info/
>      >
>      >
>      > _______________________________________________
>      > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>      > Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>      > Ensembl Blog: http://www.ensembl.info/
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
Andrew Yates - Genomics Technology Infrastructure Team Leader
European Molecular Biology Laboratory
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge
CB10 1SD, United Kingdom
Tel: +44-(0)1223-492538
Fax: +44-(0)1223-494468
Skype: andrewyatz
http://www.ensembl.org/




More information about the Dev mailing list