[ensembl-dev] bug in the Ensembl core API?

Andy Yates ayates at ebi.ac.uk
Thu May 2 17:07:32 BST 2013


Hi Electra,

I will see what we can do to reduce any confusion with this behaviour

Cheers,

Andy

Andrew Yates                   Ensembl Core Software Project Leader
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensembl.org/

On 2 May 2013, at 17:04, Electra Tapanari <et3 at sanger.ac.uk> wrote:

> Hi Andy,
> 
> Thanks for your mail.
> 
> I see your point. Maybe it would be good then to be more explicitly explained in the API documentation.
> 
> many thanks,
> Electra
> 
> On 02/05/13 16:52, Andy Yates wrote:
>> Hi Electra,
>> 
>> SliceAdaptor's fetch_by_region method allows you to specify strand since we still need to retrieve Slices with a -ve orientation. Operations like -ve stranded DNA retrieval rely on this behaviour. As for making one method call over two we attempt to keep the number of database operations to a minimum in favour of performing in memory filters when a single strand is required. I would suggest switching away from using -ve stranded Slices [1] in favour of an in-memory strand filter like so:
>> 
>> my $slice = $slice_adaptor->fetch_by_region('chromosome', "3", 20000, 600000);
>> my @genes = grep { $_->strand() == -1 } @{$slice->get_all_Genes};
>> 
>> Sorry for the confusion this has caused,
>> 
>> Andy
>> 
>> Andrew Yates                   Ensembl Core Software Project Leader
>> EMBL-EBI                       Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK         http://www.ensembl.org/
>> 
>> [1] I say move away from -ve stranded slices as your features will be reported with reference to the given Slice. So features on the -ve strand on the sequence region will be reported on the +ve strand. This could confuse your filtering process
>> 
>> On 2 May 2013, at 16:27, Electra Tapanari <et3 at sanger.ac.uk> wrote:
>> 
>>> Hi all,
>>> 
>>> thanks Bert and Dan for your replies.
>>> 
>>> I see.. I just think that this is confusing and I don't see the benefits of having the strand as a parameter in the "$slice_adaptor->fetch_by_region" method if it always returns both strands.
>>> 
>>> It would be more explicit if you would put the actual strand that you are looking for and in case you are interested in both strands call the method twice, once for the positive and one for the negative. in my opinion of course...
>>> 
>>> thanks for you input
>>> 
>>> cheers,
>>> Electra
>>> 
>>> On 02/05/13 16:15, Daniel Hughes wrote:
>>>> The strand should be refer to the local orientation of your slice object relative to the reference genome and not a specific strand of the underlying sequence.
>>>> 
>>>> Dan.
>>>> 
>>>> 
>>>> Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)
>>>> -------------------------------------------------------------------------------------
>>>> dsth at cantab.net
>>>> dsth at cpan.org
>>>> 
>>>> 
>>>> 2013/5/2 Bert Overduin <bert at ebi.ac.uk>
>>>> Hello Electra,
>>>> 
>>>> I don't think this is wrong / a bug as, as far as I know, a slice always encompasses both strands. So, you should do the filtering for features that are located on the forward or reverse strand after retrieving them from the slice.
>>>> 
>>>> But maybe I'm wrong ....
>>>> 
>>>> Cheers,
>>>> Bert
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Thu, May 2, 2013 at 4:04 PM, Electra Tapanari <et3 at sanger.ac.uk> wrote:
>>>> 
>>>> 
>>>> 
>>>> Hi all,
>>>> 
>>>> I came across a problem when using the ensembl core API.
>>>> 
>>>> When I am getting a slice where I define chromosome, start, end and
>>>> strand and then I get objects from the slice ie,
>>>> genes/transcripts/introns etc it returns me objects from both strands.
>>>> 
>>>> 
>>>> I used this code to test this:
>>>> 
>>>>   use strict;
>>>>   use Gencode::Default;
>>>> 
>>>> 
>>>>   #connect to database
>>>>   my $db = Gencode::Default->dbconnect("ens-livemirror", 3306,
>>>> "homo_sapiens_core_71_37", "ensro", undef);
>>>> 
>>>>   my $sa = $db->get_SliceAdaptor();
>>>> 
>>>>   my $slice=$sa->fetch_by_region('chromosome',"3",20000,600000,-1) or
>>>> die "\n 1.Couldn't get slice\n";
>>>> 
>>>>   foreach my $gene (@{$slice->get_all_Genes}){
>>>> 
>>>>       $gene=$gene->transform('chromosome');
>>>>       print $gene->stable_id."\t".$gene->strand."\n";
>>>> 
>>>>    }
>>>> 
>>>> 
>>>> This is the output:
>>>> 
>>>> ENSG00000223587    1
>>>> ENSG00000224918    -1
>>>> ENSG00000224318    -1
>>>> ENSG00000134121    1
>>>> ENSG00000252017    -1
>>>> ENSG00000231660    -1
>>>> ENSG00000234661    -1
>>>> ENSG00000224957    1
>>>> 
>>>> 
>>>> Isn't this wrong? that it returns me genes from both strands since I am
>>>> defining that I am interested in the slice of the negative strand only?
>>>> 
>>>> many thanks in advance,
>>>> Electra
>>>> 
>>>> -- 
>>>> Electra Tapanari
>>>> Bioinformatician-Gencode Data Coordinator
>>>> Havana Group
>>>> Morgan Building
>>>> Wellcome Trust Sanger Institute
>>>> Wellcome Trust Genome Campus
>>>> Hinxton
>>>> Cambridgeshire
>>>> CB10 1HH
>>>> 
>>>> E: electra.tapanari at sanger.ac.uk
>>>> T: +44 1223 496827
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Bert Overduin, Ph.D.
>>>> Vertebrate Genomics Team
>>>> 
>>>> EMBL - European Bioinformatics Institute
>>>> Wellcome Trust Genome Campus
>>>> Hinxton, Cambridge CB10 1SD
>>>> United Kingdom
>>>> 
>>>> http://www.ebi.ac.uk/~bert
>>>> 
>>>> Ensembl browser: http://www.ensembl.org
>>>> Mailing lists: http://www.ensembl.org/info/about/contact/mailing.html
>>>> Blog: http://www.ensembl.info
>>>> YouTube: http://www.youtube.com/user/EnsemblHelpdesk
>>>> Facebook: http://www.facebook.com/Ensembl.org
>>>> Twitter: http://twitter.com/Ensembl
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list
>>>> Dev at ensembl.org
>>>> 
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> 
>>>> Ensembl Blog:
>>>> http://www.ensembl.info/
>>> 
>>> -- 
>>> Electra Tapanari
>>> Bioinformatician-Gencode Data Coordinator
>>> Havana Group
>>> Morgan Building
>>> Wellcome Trust Sanger Institute
>>> Wellcome Trust Genome Campus
>>> Hinxton
>>> Cambridgeshire
>>> CB10 1HH
>>> 
>>> E:
>>> electra.tapanari at sanger.ac.uk
>>> 
>>> T: +44 1223 496827
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> 
> 
> -- 
> Electra Tapanari
> Bioinformatician-Gencode Data Coordinator
> Havana Group
> Morgan Building
> Wellcome Trust Sanger Institute
> Wellcome Trust Genome Campus
> Hinxton
> Cambridgeshire
> CB10 1HH
> 
> E: electra.tapanari at sanger.ac.uk
> T: +44 1223 496827
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list