[ensembl-dev] accessing the tilepath entries programatically

Duarte Molha duartemolha at gmail.com
Fri Jul 3 17:01:24 BST 2015


Thanks Magali



Can you explain something to me?



You are now keeping the api compatible with both GRCH37 and GRCH38. This is
great because I can use my scripts with the latest API and not worry about
having to use an older API to query the older assembly. However I do not
understand why, in this case changing ‘clone_name’ to ‘Name’* works when
querying GRCh38 but fails when querying GRCh37.*



 Shouldn't the API calls be the same for both datasets. This means  I have
to change my code depending on what database I am querying. Isn't this what
the move to update the api for both datasets is trying to avoid?


Best regards


Duarte



=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On 2 July 2015 at 16:40, mag <mr6 at ebi.ac.uk> wrote:

>  Hi Duarte,
>
> Replacing 'clone_name' with 'Name' as Thibaut suggested works for me for
> GRCh38.
>
> my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value( 'Name',
> $query );
>
> while ( my $clone = shift @{$clones} ) {
>   my $slice = $clone->slice();
>   print join "\t", ("chr".$slice->seq_region_name(), $clone->start(),
> $clone->end() , $query."\n");
> }
>
>
> Regards,
> Magali
>
>
> On 01/07/2015 18:15, Duarte Molha wrote:
>
> I would still appreciate some help with this query. If possible.
> On 30 Jun 2015 16:29, "Duarte Molha" <duartemolha at gmail.com> wrote:
>
>>  Thibaut... Could you expand on how I can change my script to make it
>> work with the new assembly?
>> I have just realised that the reason I am no getting 60 BAC entries is
>> because their are only present in GRCh38 and not on the GRCh37
>>
>>  Can you tell me how I can modify my script to work with the new
>> assembly?
>>
>>  I don't seem to understand the projection method you are using.
>>  Here is the relevant part of my script
>>
>>  my $mf_adaptor         = $registry->get_adaptor( 'Human', 'Core',
>> 'MiscFeature' );
>>
>>  open (IN, ,"<", $options->{list})|| die "Could not open
>> ".$options->{list}." for reading \n";
>> my @input_queries = <IN>;
>> close IN;
>>
>>  foreach my $query (@input_queries){
>>  chomp $query;
>>  my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value(
>> 'clone_name', $query );
>>
>>  while ( my $clone = shift @{$clones} ) {
>>  my $slice = $clone->slice();
>>  print join "\t", ("chr".$slice->seq_region_name(), $clone->start(),
>> $clone->end() , $query."\n");
>>  }
>> }
>>
>>
>>  Best regards
>>
>>  Duarte
>>
>>  =========================
>>      Duarte Miguel Paulo Molha
>>           http://about.me/duarte
>> =========================
>>
>> On 30 June 2015 at 15:46, Duarte Molha <duartemolha at gmail.com> wrote:
>>
>>> no. That does not get anything.
>>>
>>>
>>>
>>>  =========================
>>>      Duarte Miguel Paulo Molha
>>>           http://about.me/duarte
>>> =========================
>>>
>>>   On 30 June 2015 at 14:50, Thibaut Hourlier <thibaut at ebi.ac.uk> wrote:
>>>
>>>> If you use name instead of clone_name, does it fetches the missing one?
>>>>
>>>>  Cheers
>>>>  Thibaut
>>>>
>>>>  On 30 Jun 2015, at 14:27, Duarte Molha <duartemolha at gmail.com> wrote:
>>>>
>>>>  Yes I am using the GRCh37 Thibaut  ... so I am ok for now... but it
>>>> is good to know this does not work with the latest assembly.
>>>> However... can you please answer my question regarding the missing
>>>> clones like  RP11-155D3 ... why can I not fetch this when it is
>>>> clearly on the database?
>>>>
>>>>  Thanks
>>>>
>>>>  Duarte
>>>>
>>>>
>>>>
>>>>  =========================
>>>>      Duarte Miguel Paulo Molha
>>>>           http://about.me/duarte
>>>> =========================
>>>>
>>>> On 30 June 2015 at 14:12, Thibaut Hourlier <thibaut at ebi.ac.uk> wrote:
>>>>
>>>>> My first question should have been which assembly are you using...
>>>>>
>>>>>  So yes this will work for GRCh37. Unfortunately it will not work for
>>>>> GRCh38 but this is something that we will fix for release 82.
>>>>>
>>>>>  So in the case of GRCh38, it is still possible but more complicated.
>>>>> It should work by getting the slice then projecting on the clone coordinate
>>>>> system
>>>>>
>>>>>  $subSlice = $misc_clone->feature_Slice;
>>>>> $projectionSegment = $subSlice->project('clone')
>>>>>
>>>>>  Cheers
>>>>>  Thibaut
>>>>>
>>>>>  On 30 Jun 2015, at 13:56, Duarte Molha <duartemolha at gmail.com> wrote:
>>>>>
>>>>>  Nevermind... after searching for miscFeatures information I found
>>>>> the relevant part in the api tutorial
>>>>>
>>>>>  Just for reference to anyone that has the same difficulties here is
>>>>> the relevant portion of the code I used:
>>>>> (please let me know if there is something I did wrong Thibaut)
>>>>>
>>>>>  my $mf_adaptor         = $registry->get_adaptor( 'Human', 'Core',
>>>>> 'MiscFeature' );
>>>>>
>>>>>  open (IN, ,"<", $options->{list})|| die "Could not open
>>>>> ".$options->{list}." for reading \n";
>>>>> my @input_queries = <IN>;
>>>>> close IN;
>>>>>
>>>>>  foreach my $query (@input_queries){
>>>>>  chomp $query;
>>>>>  my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value(
>>>>> 'clone_name', $query );
>>>>>
>>>>>  while ( my $clone = shift @{$clones} ) {
>>>>>  my $slice = $clone->slice();
>>>>>  print join "\t", ("chr".$slice->seq_region_name(), $clone->start(),
>>>>> $clone->end() , $query."\n");
>>>>>  }
>>>>> }
>>>>>
>>>>>
>>>>>  Best regards
>>>>>
>>>>>  Duarte
>>>>>
>>>>>  =========================
>>>>>      Duarte Miguel Paulo Molha
>>>>>           http://about.me/duarte
>>>>> =========================
>>>>>
>>>>> On 30 June 2015 at 13:26, Duarte Molha <duartemolha at gmail.com> wrote:
>>>>>
>>>>>> Many thanks Thibaut
>>>>>>
>>>>>>  So... in regards to your question...
>>>>>>
>>>>>>  How can I query a specific clone and its correct coordinates if I
>>>>>> know  the clone ID.
>>>>>>
>>>>>>  For example
>>>>>>
>>>>>>  assuming this clone:
>>>>>>  RP11-100N21
>>>>>>
>>>>>>  In other words , how to I query the underlying clone dataset and
>>>>>> output those clones in genomic coordinates?
>>>>>>
>>>>>>  Many thanks
>>>>>>
>>>>>>  Duarte
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  =========================
>>>>>>      Duarte Miguel Paulo Molha
>>>>>>           http://about.me/duarte
>>>>>> =========================
>>>>>>
>>>>>>   On 30 June 2015 at 13:15, Thibaut Hourlier <thibaut at ebi.ac.uk>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Duarte,
>>>>>>> The clone names are stored in the misc_* tables. So you need to use
>>>>>>> the MiscFeatureAdaptor,
>>>>>>> http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1MiscFeatureAdaptor.html
>>>>>>> :
>>>>>>>
>>>>>>> my $misc_clones = $mfa->fetch_all_by_Slice_and_set_code('tilepath');
>>>>>>> foreach my $clone (@$misc_clones) {
>>>>>>>  print join("\t", $clone->slice->seq_region_name, $clone->start,
>>>>>>> $clone->end, @{$clone->get_all_attribute_values('name')}), "\n";
>>>>>>> }
>>>>>>>
>>>>>>> A warning though, this is the tilepath so the boundaries of the
>>>>>>> clones are different from the contigs/clones in the assembly as sometimes
>>>>>>> they didn't use the entire clone for the assembly
>>>>>>>
>>>>>>> Hope this help
>>>>>>>
>>>>>>> Thibaut
>>>>>>>
>>>>>>> > On 30 Jun 2015, at 11:50, Duarte Molha <duartemolha at gmail.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > I used this code to get all the gebnomic coordinates of your
>>>>>>> subcontigs:
>>>>>>> >
>>>>>>> >
>>>>>>> > my @slices = @{ $slice_adaptor->fetch_all('clone') };
>>>>>>> > foreach my $slice (@slices){
>>>>>>> >       $progress->update();
>>>>>>> >       my $clone_name =  $slice->seq_region_name();
>>>>>>> >       my $projection = $slice->project('toplevel');
>>>>>>> >       foreach my $segment ( @{$projection} ) {
>>>>>>> >               my $to_slice = $segment->to_Slice();
>>>>>>> >               print join "\t",
>>>>>>> ("chr".$to_slice->seq_region_name(), $to_slice->start(), $to_slice->end(),
>>>>>>> $clone_name."\n");
>>>>>>> >       }
>>>>>>> > }
>>>>>>> >
>>>>>>> > However, by doing this, the database does not fetch the original
>>>>>>> clone name
>>>>>>> >
>>>>>>> > for example.. using this script I get
>>>>>>> > chr4    47567235        47733411        AC092597.1
>>>>>>> >
>>>>>>> > However I would like to get :
>>>>>>> >
>>>>>>> > chr4    47567235        47733411        RP11-100N21
>>>>>>> >
>>>>>>> > Can someone explain what I am doing wrong?
>>>>>>> >
>>>>>>> > Thanks
>>>>>>> >
>>>>>>> > Duarte
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > =========================
>>>>>>> >      Duarte Miguel Paulo Molha
>>>>>>> >          http://about.me/duarte
>>>>>>> > =========================
>>>>>>> >
>>>>>>> > On 30 June 2015 at 09:45, Duarte Molha <duartemolha at gmail.com>
>>>>>>> wrote:
>>>>>>> > Dear devs
>>>>>>> >
>>>>>>> > How can I search for a specific clone id present on your tilepath
>>>>>>> >
>>>>>>> > for example this: RP5-892C22
>>>>>>> >
>>>>>>> > I would like to use the perl API if possible
>>>>>>> >
>>>>>>> > Many thanks
>>>>>>> >
>>>>>>> > Duarte
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > =========================
>>>>>>> >      Duarte Miguel Paulo Molha
>>>>>>> >          http://about.me/duarte
>>>>>>> > =========================
>>>>>>> >
>>>>>>>  > _______________________________________________
>>>>>>> > Dev mailing list    Dev at ensembl.org
>>>>>>> > Posting guidelines and subscribe/unsubscribe info:
>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>> > Ensembl Blog: http://www.ensembl.info/
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>
>>>>>>
>>>>>>
>>>>>  _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>  _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150703/0833aac9/attachment.html>


More information about the Dev mailing list