[ensembl-dev] accessing the tilepath entries programatically

Duarte Molha duartemolha at gmail.com
Tue Jul 7 09:20:14 BST 2015


Anyone? Could you help me understand why the changing behavior of the same
API between datasets?

=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On 3 July 2015 at 17:01, Duarte Molha <duartemolha at gmail.com> wrote:

> Thanks Magali
>
>
>
> Can you explain something to me?
>
>
>
> You are now keeping the api compatible with both GRCH37 and GRCH38. This
> is great because I can use my scripts with the latest API and not worry
> about having to use an older API to query the older assembly. However I do
> not understand why, in this case changing ‘clone_name’ to ‘Name’* works
> when querying GRCh38 but fails when querying GRCh37.*
>
>
>
>  Shouldn't the API calls be the same for both datasets. This means  I
> have to change my code depending on what database I am querying. Isn't this
> what the move to update the api for both datasets is trying to avoid?
>
>
> Best regards
>
>
> Duarte
>
>
>
> =========================
>      Duarte Miguel Paulo Molha
>          http://about.me/duarte
> =========================
>
> On 2 July 2015 at 16:40, mag <mr6 at ebi.ac.uk> wrote:
>
>>  Hi Duarte,
>>
>> Replacing 'clone_name' with 'Name' as Thibaut suggested works for me for
>> GRCh38.
>>
>> my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value( 'Name',
>> $query );
>>
>> while ( my $clone = shift @{$clones} ) {
>>   my $slice = $clone->slice();
>>   print join "\t", ("chr".$slice->seq_region_name(), $clone->start(),
>> $clone->end() , $query."\n");
>> }
>>
>>
>> Regards,
>> Magali
>>
>>
>> On 01/07/2015 18:15, Duarte Molha wrote:
>>
>> I would still appreciate some help with this query. If possible.
>> On 30 Jun 2015 16:29, "Duarte Molha" <duartemolha at gmail.com> wrote:
>>
>>>  Thibaut... Could you expand on how I can change my script to make it
>>> work with the new assembly?
>>> I have just realised that the reason I am no getting 60 BAC entries is
>>> because their are only present in GRCh38 and not on the GRCh37
>>>
>>>  Can you tell me how I can modify my script to work with the new
>>> assembly?
>>>
>>>  I don't seem to understand the projection method you are using.
>>>  Here is the relevant part of my script
>>>
>>>  my $mf_adaptor         = $registry->get_adaptor( 'Human', 'Core',
>>> 'MiscFeature' );
>>>
>>>  open (IN, ,"<", $options->{list})|| die "Could not open
>>> ".$options->{list}." for reading \n";
>>> my @input_queries = <IN>;
>>> close IN;
>>>
>>>  foreach my $query (@input_queries){
>>>  chomp $query;
>>>  my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value(
>>> 'clone_name', $query );
>>>
>>>  while ( my $clone = shift @{$clones} ) {
>>>  my $slice = $clone->slice();
>>>  print join "\t", ("chr".$slice->seq_region_name(), $clone->start(),
>>> $clone->end() , $query."\n");
>>>  }
>>> }
>>>
>>>
>>>  Best regards
>>>
>>>  Duarte
>>>
>>>  =========================
>>>      Duarte Miguel Paulo Molha
>>>           http://about.me/duarte
>>> =========================
>>>
>>> On 30 June 2015 at 15:46, Duarte Molha <duartemolha at gmail.com> wrote:
>>>
>>>> no. That does not get anything.
>>>>
>>>>
>>>>
>>>>  =========================
>>>>      Duarte Miguel Paulo Molha
>>>>           http://about.me/duarte
>>>> =========================
>>>>
>>>>   On 30 June 2015 at 14:50, Thibaut Hourlier <thibaut at ebi.ac.uk> wrote:
>>>>
>>>>> If you use name instead of clone_name, does it fetches the missing
>>>>> one?
>>>>>
>>>>>  Cheers
>>>>>  Thibaut
>>>>>
>>>>>  On 30 Jun 2015, at 14:27, Duarte Molha <duartemolha at gmail.com> wrote:
>>>>>
>>>>>  Yes I am using the GRCh37 Thibaut  ... so I am ok for now... but it
>>>>> is good to know this does not work with the latest assembly.
>>>>> However... can you please answer my question regarding the missing
>>>>> clones like  RP11-155D3 ... why can I not fetch this when it is
>>>>> clearly on the database?
>>>>>
>>>>>  Thanks
>>>>>
>>>>>  Duarte
>>>>>
>>>>>
>>>>>
>>>>>  =========================
>>>>>      Duarte Miguel Paulo Molha
>>>>>           http://about.me/duarte
>>>>> =========================
>>>>>
>>>>> On 30 June 2015 at 14:12, Thibaut Hourlier <thibaut at ebi.ac.uk> wrote:
>>>>>
>>>>>> My first question should have been which assembly are you using...
>>>>>>
>>>>>>  So yes this will work for GRCh37. Unfortunately it will not work
>>>>>> for GRCh38 but this is something that we will fix for release 82.
>>>>>>
>>>>>>  So in the case of GRCh38, it is still possible but more
>>>>>> complicated. It should work by getting the slice then projecting on the
>>>>>> clone coordinate system
>>>>>>
>>>>>>  $subSlice = $misc_clone->feature_Slice;
>>>>>> $projectionSegment = $subSlice->project('clone')
>>>>>>
>>>>>>  Cheers
>>>>>>  Thibaut
>>>>>>
>>>>>>  On 30 Jun 2015, at 13:56, Duarte Molha <duartemolha at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  Nevermind... after searching for miscFeatures information I found
>>>>>> the relevant part in the api tutorial
>>>>>>
>>>>>>  Just for reference to anyone that has the same difficulties here is
>>>>>> the relevant portion of the code I used:
>>>>>> (please let me know if there is something I did wrong Thibaut)
>>>>>>
>>>>>>  my $mf_adaptor         = $registry->get_adaptor( 'Human', 'Core',
>>>>>> 'MiscFeature' );
>>>>>>
>>>>>>  open (IN, ,"<", $options->{list})|| die "Could not open
>>>>>> ".$options->{list}." for reading \n";
>>>>>> my @input_queries = <IN>;
>>>>>> close IN;
>>>>>>
>>>>>>  foreach my $query (@input_queries){
>>>>>>  chomp $query;
>>>>>>  my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value(
>>>>>> 'clone_name', $query );
>>>>>>
>>>>>>  while ( my $clone = shift @{$clones} ) {
>>>>>>  my $slice = $clone->slice();
>>>>>>  print join "\t", ("chr".$slice->seq_region_name(), $clone->start(),
>>>>>> $clone->end() , $query."\n");
>>>>>>  }
>>>>>> }
>>>>>>
>>>>>>
>>>>>>  Best regards
>>>>>>
>>>>>>  Duarte
>>>>>>
>>>>>>  =========================
>>>>>>      Duarte Miguel Paulo Molha
>>>>>>           http://about.me/duarte
>>>>>> =========================
>>>>>>
>>>>>> On 30 June 2015 at 13:26, Duarte Molha <duartemolha at gmail.com> wrote:
>>>>>>
>>>>>>> Many thanks Thibaut
>>>>>>>
>>>>>>>  So... in regards to your question...
>>>>>>>
>>>>>>>  How can I query a specific clone and its correct coordinates if I
>>>>>>> know  the clone ID.
>>>>>>>
>>>>>>>  For example
>>>>>>>
>>>>>>>  assuming this clone:
>>>>>>>  RP11-100N21
>>>>>>>
>>>>>>>  In other words , how to I query the underlying clone dataset and
>>>>>>> output those clones in genomic coordinates?
>>>>>>>
>>>>>>>  Many thanks
>>>>>>>
>>>>>>>  Duarte
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  =========================
>>>>>>>      Duarte Miguel Paulo Molha
>>>>>>>           http://about.me/duarte
>>>>>>> =========================
>>>>>>>
>>>>>>>   On 30 June 2015 at 13:15, Thibaut Hourlier <thibaut at ebi.ac.uk>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Duarte,
>>>>>>>> The clone names are stored in the misc_* tables. So you need to use
>>>>>>>> the MiscFeatureAdaptor,
>>>>>>>> http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1MiscFeatureAdaptor.html
>>>>>>>> :
>>>>>>>>
>>>>>>>> my $misc_clones = $mfa->fetch_all_by_Slice_and_set_code('tilepath');
>>>>>>>> foreach my $clone (@$misc_clones) {
>>>>>>>>  print join("\t", $clone->slice->seq_region_name, $clone->start,
>>>>>>>> $clone->end, @{$clone->get_all_attribute_values('name')}), "\n";
>>>>>>>> }
>>>>>>>>
>>>>>>>> A warning though, this is the tilepath so the boundaries of the
>>>>>>>> clones are different from the contigs/clones in the assembly as sometimes
>>>>>>>> they didn't use the entire clone for the assembly
>>>>>>>>
>>>>>>>> Hope this help
>>>>>>>>
>>>>>>>> Thibaut
>>>>>>>>
>>>>>>>> > On 30 Jun 2015, at 11:50, Duarte Molha <duartemolha at gmail.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > I used this code to get all the gebnomic coordinates of your
>>>>>>>> subcontigs:
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > my @slices = @{ $slice_adaptor->fetch_all('clone') };
>>>>>>>> > foreach my $slice (@slices){
>>>>>>>> >       $progress->update();
>>>>>>>> >       my $clone_name =  $slice->seq_region_name();
>>>>>>>> >       my $projection = $slice->project('toplevel');
>>>>>>>> >       foreach my $segment ( @{$projection} ) {
>>>>>>>> >               my $to_slice = $segment->to_Slice();
>>>>>>>> >               print join "\t",
>>>>>>>> ("chr".$to_slice->seq_region_name(), $to_slice->start(), $to_slice->end(),
>>>>>>>> $clone_name."\n");
>>>>>>>> >       }
>>>>>>>> > }
>>>>>>>> >
>>>>>>>> > However, by doing this, the database does not fetch the original
>>>>>>>> clone name
>>>>>>>> >
>>>>>>>> > for example.. using this script I get
>>>>>>>> > chr4    47567235        47733411        AC092597.1
>>>>>>>> >
>>>>>>>> > However I would like to get :
>>>>>>>> >
>>>>>>>> > chr4    47567235        47733411        RP11-100N21
>>>>>>>> >
>>>>>>>> > Can someone explain what I am doing wrong?
>>>>>>>> >
>>>>>>>> > Thanks
>>>>>>>> >
>>>>>>>> > Duarte
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > =========================
>>>>>>>> >      Duarte Miguel Paulo Molha
>>>>>>>> >          http://about.me/duarte
>>>>>>>> > =========================
>>>>>>>> >
>>>>>>>> > On 30 June 2015 at 09:45, Duarte Molha <duartemolha at gmail.com>
>>>>>>>> wrote:
>>>>>>>> > Dear devs
>>>>>>>> >
>>>>>>>> > How can I search for a specific clone id present on your tilepath
>>>>>>>> >
>>>>>>>> > for example this: RP5-892C22
>>>>>>>> >
>>>>>>>> > I would like to use the perl API if possible
>>>>>>>> >
>>>>>>>> > Many thanks
>>>>>>>> >
>>>>>>>> > Duarte
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > =========================
>>>>>>>> >      Duarte Miguel Paulo Molha
>>>>>>>> >          http://about.me/duarte
>>>>>>>> > =========================
>>>>>>>> >
>>>>>>>>  > _______________________________________________
>>>>>>>> > Dev mailing list    Dev at ensembl.org
>>>>>>>> > Posting guidelines and subscribe/unsubscribe info:
>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>> > Ensembl Blog: http://www.ensembl.info/
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>  _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>  _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150707/4e2ee7d8/attachment.html>


More information about the Dev mailing list