[ensembl-dev] accessing the tilepath entries programatically

Duarte Molha duartemolha at gmail.com
Tue Jul 7 10:24:47 BST 2015


Thanks Thibaut

what I want to know is:  "the genomic coordinates of the clones in
assembly".

Cheers

Duarte

=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On 7 July 2015 at 09:55, Thibaut Hourlier <thibaut at ebi.ac.uk> wrote:

> Hi Duarte,
> Sorry for the late reply. This is not a problem with the API but a data
> problem.
> When you do the call $mf_adaptor->fetch_all_by_attribute_type_value(
> 'Name', $query ) you're asking for all the clones (misc features) which
> have the attribute 'Name'. So if we haven't store any data with this
> attribute, the API will return nothing but it will have done it's job. We
> used a different attribute in this case which was a error. It will be fixed
> for release 82. Sorry for the problem it's causing you.
>
> Before replying to your previous question about my code snippet, I may
> have misunderstood what you wanted to do. By printing the clone
> coordinates, do you want to know the coordinates of the clones in the
> assembly or do you want to know where the clones overlap the assembly (this
> is what the tilepath is)?
>
> Regards
> Thibaut
>
> On 7 Jul 2015, at 09:20, Duarte Molha <duartemolha at gmail.com> wrote:
>
> Anyone? Could you help me understand why the changing behavior of the same
> API between datasets?
>
> =========================
>      Duarte Miguel Paulo Molha
>          http://about.me/duarte
> =========================
>
> On 3 July 2015 at 17:01, Duarte Molha <duartemolha at gmail.com> wrote:
>
>> Thanks Magali
>>
>>
>> Can you explain something to me?
>>
>>
>> You are now keeping the api compatible with both GRCH37 and GRCH38. This
>> is great because I can use my scripts with the latest API and not worry
>> about having to use an older API to query the older assembly. However I do
>> not understand why, in this case changing ‘clone_name’ to ‘Name’* works
>> when querying GRCh38 but fails when querying GRCh37.*
>>
>>
>>  Shouldn't the API calls be the same for both datasets. This means  I
>> have to change my code depending on what database I am querying. Isn't this
>> what the move to update the api for both datasets is trying to avoid?
>>
>>
>> Best regards
>>
>>
>> Duarte
>>
>>
>>
>> =========================
>>      Duarte Miguel Paulo Molha
>>          http://about.me/duarte
>> =========================
>>
>> On 2 July 2015 at 16:40, mag <mr6 at ebi.ac.uk> wrote:
>>
>>>  Hi Duarte,
>>>
>>> Replacing 'clone_name' with 'Name' as Thibaut suggested works for me for
>>> GRCh38.
>>>
>>> my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value( 'Name',
>>> $query );
>>>
>>> while ( my $clone = shift @{$clones} ) {
>>>   my $slice = $clone->slice();
>>>   print join "\t", ("chr".$slice->seq_region_name(), $clone->start(),
>>> $clone->end() , $query."\n");
>>> }
>>>
>>>
>>> Regards,
>>> Magali
>>>
>>>
>>> On 01/07/2015 18:15, Duarte Molha wrote:
>>>
>>> I would still appreciate some help with this query. If possible.
>>> On 30 Jun 2015 16:29, "Duarte Molha" <duartemolha at gmail.com> wrote:
>>>
>>>>  Thibaut... Could you expand on how I can change my script to make it
>>>> work with the new assembly?
>>>> I have just realised that the reason I am no getting 60 BAC entries is
>>>> because their are only present in GRCh38 and not on the GRCh37
>>>>
>>>>  Can you tell me how I can modify my script to work with the new
>>>> assembly?
>>>>
>>>>  I don't seem to understand the projection method you are using.
>>>>  Here is the relevant part of my script
>>>>
>>>>  my $mf_adaptor         = $registry->get_adaptor( 'Human', 'Core',
>>>> 'MiscFeature' );
>>>>
>>>>  open (IN, ,"<", $options->{list})|| die "Could not open
>>>> ".$options->{list}." for reading \n";
>>>> my @input_queries = <IN>;
>>>> close IN;
>>>>
>>>>  foreach my $query (@input_queries){
>>>>  chomp $query;
>>>>  my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value(
>>>> 'clone_name', $query );
>>>>
>>>>  while ( my $clone = shift @{$clones} ) {
>>>>  my $slice = $clone->slice();
>>>>  print join "\t", ("chr".$slice->seq_region_name(), $clone->start(),
>>>> $clone->end() , $query."\n");
>>>>  }
>>>> }
>>>>
>>>>
>>>>  Best regards
>>>>
>>>>  Duarte
>>>>
>>>>  =========================
>>>>      Duarte Miguel Paulo Molha
>>>>           http://about.me/duarte
>>>> =========================
>>>>
>>>> On 30 June 2015 at 15:46, Duarte Molha <duartemolha at gmail.com> wrote:
>>>>
>>>>> no. That does not get anything.
>>>>>
>>>>>
>>>>>
>>>>>  =========================
>>>>>      Duarte Miguel Paulo Molha
>>>>>           http://about.me/duarte
>>>>> =========================
>>>>>
>>>>>   On 30 June 2015 at 14:50, Thibaut Hourlier <thibaut at ebi.ac.uk>
>>>>> wrote:
>>>>>
>>>>>> If you use name instead of clone_name, does it fetches the missing
>>>>>> one?
>>>>>>
>>>>>>  Cheers
>>>>>>  Thibaut
>>>>>>
>>>>>>  On 30 Jun 2015, at 14:27, Duarte Molha <duartemolha at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  Yes I am using the GRCh37 Thibaut  ... so I am ok for now... but it
>>>>>> is good to know this does not work with the latest assembly.
>>>>>> However... can you please answer my question regarding the missing
>>>>>> clones like  RP11-155D3 ... why can I not fetch this when it is
>>>>>> clearly on the database?
>>>>>>
>>>>>>  Thanks
>>>>>>
>>>>>>  Duarte
>>>>>>
>>>>>>
>>>>>>
>>>>>>  =========================
>>>>>>      Duarte Miguel Paulo Molha
>>>>>>           http://about.me/duarte
>>>>>> =========================
>>>>>>
>>>>>> On 30 June 2015 at 14:12, Thibaut Hourlier <thibaut at ebi.ac.uk> wrote:
>>>>>>
>>>>>>> My first question should have been which assembly are you using...
>>>>>>>
>>>>>>>  So yes this will work for GRCh37. Unfortunately it will not work
>>>>>>> for GRCh38 but this is something that we will fix for release 82.
>>>>>>>
>>>>>>>  So in the case of GRCh38, it is still possible but more
>>>>>>> complicated. It should work by getting the slice then projecting on the
>>>>>>> clone coordinate system
>>>>>>>
>>>>>>>  $subSlice = $misc_clone->feature_Slice;
>>>>>>> $projectionSegment = $subSlice->project('clone')
>>>>>>>
>>>>>>>  Cheers
>>>>>>>  Thibaut
>>>>>>>
>>>>>>>  On 30 Jun 2015, at 13:56, Duarte Molha <duartemolha at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  Nevermind... after searching for miscFeatures information I found
>>>>>>> the relevant part in the api tutorial
>>>>>>>
>>>>>>>  Just for reference to anyone that has the same difficulties here
>>>>>>> is the relevant portion of the code I used:
>>>>>>> (please let me know if there is something I did wrong Thibaut)
>>>>>>>
>>>>>>>  my $mf_adaptor         = $registry->get_adaptor( 'Human', 'Core',
>>>>>>> 'MiscFeature' );
>>>>>>>
>>>>>>>  open (IN, ,"<", $options->{list})|| die "Could not open
>>>>>>> ".$options->{list}." for reading \n";
>>>>>>> my @input_queries = <IN>;
>>>>>>> close IN;
>>>>>>>
>>>>>>>  foreach my $query (@input_queries){
>>>>>>>  chomp $query;
>>>>>>>  my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value(
>>>>>>> 'clone_name', $query );
>>>>>>>
>>>>>>>  while ( my $clone = shift @{$clones} ) {
>>>>>>>  my $slice = $clone->slice();
>>>>>>>  print join "\t", ("chr".$slice->seq_region_name(),
>>>>>>> $clone->start(), $clone->end() , $query."\n");
>>>>>>>  }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>  Best regards
>>>>>>>
>>>>>>>  Duarte
>>>>>>>
>>>>>>>  =========================
>>>>>>>      Duarte Miguel Paulo Molha
>>>>>>>           http://about.me/duarte
>>>>>>> =========================
>>>>>>>
>>>>>>> On 30 June 2015 at 13:26, Duarte Molha <duartemolha at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Many thanks Thibaut
>>>>>>>>
>>>>>>>>  So... in regards to your question...
>>>>>>>>
>>>>>>>>  How can I query a specific clone and its correct coordinates if I
>>>>>>>> know  the clone ID.
>>>>>>>>
>>>>>>>>  For example
>>>>>>>>
>>>>>>>>  assuming this clone:
>>>>>>>>  RP11-100N21
>>>>>>>>
>>>>>>>>  In other words , how to I query the underlying clone dataset and
>>>>>>>> output those clones in genomic coordinates?
>>>>>>>>
>>>>>>>>  Many thanks
>>>>>>>>
>>>>>>>>  Duarte
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  =========================
>>>>>>>>      Duarte Miguel Paulo Molha
>>>>>>>>           http://about.me/duarte
>>>>>>>> =========================
>>>>>>>>
>>>>>>>>   On 30 June 2015 at 13:15, Thibaut Hourlier <thibaut at ebi.ac.uk>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Duarte,
>>>>>>>>> The clone names are stored in the misc_* tables. So you need to
>>>>>>>>> use the MiscFeatureAdaptor,
>>>>>>>>> http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1MiscFeatureAdaptor.html
>>>>>>>>> :
>>>>>>>>>
>>>>>>>>> my $misc_clones =
>>>>>>>>> $mfa->fetch_all_by_Slice_and_set_code('tilepath');
>>>>>>>>> foreach my $clone (@$misc_clones) {
>>>>>>>>>  print join("\t", $clone->slice->seq_region_name, $clone->start,
>>>>>>>>> $clone->end, @{$clone->get_all_attribute_values('name')}), "\n";
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> A warning though, this is the tilepath so the boundaries of the
>>>>>>>>> clones are different from the contigs/clones in the assembly as sometimes
>>>>>>>>> they didn't use the entire clone for the assembly
>>>>>>>>>
>>>>>>>>> Hope this help
>>>>>>>>>
>>>>>>>>> Thibaut
>>>>>>>>>
>>>>>>>>> > On 30 Jun 2015, at 11:50, Duarte Molha <duartemolha at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > I used this code to get all the gebnomic coordinates of your
>>>>>>>>> subcontigs:
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > my @slices = @{ $slice_adaptor->fetch_all('clone') };
>>>>>>>>> > foreach my $slice (@slices){
>>>>>>>>> >       $progress->update();
>>>>>>>>> >       my $clone_name =  $slice->seq_region_name();
>>>>>>>>> >       my $projection = $slice->project('toplevel');
>>>>>>>>> >       foreach my $segment ( @{$projection} ) {
>>>>>>>>> >               my $to_slice = $segment->to_Slice();
>>>>>>>>> >               print join "\t",
>>>>>>>>> ("chr".$to_slice->seq_region_name(), $to_slice->start(), $to_slice->end(),
>>>>>>>>> $clone_name."\n");
>>>>>>>>> >       }
>>>>>>>>> > }
>>>>>>>>> >
>>>>>>>>> > However, by doing this, the database does not fetch the original
>>>>>>>>> clone name
>>>>>>>>> >
>>>>>>>>> > for example.. using this script I get
>>>>>>>>> > chr4    47567235        47733411        AC092597.1
>>>>>>>>> >
>>>>>>>>> > However I would like to get :
>>>>>>>>> >
>>>>>>>>> > chr4    47567235        47733411        RP11-100N21
>>>>>>>>> >
>>>>>>>>> > Can someone explain what I am doing wrong?
>>>>>>>>> >
>>>>>>>>> > Thanks
>>>>>>>>> >
>>>>>>>>> > Duarte
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > =========================
>>>>>>>>> >      Duarte Miguel Paulo Molha
>>>>>>>>> >          http://about.me/duarte
>>>>>>>>> > =========================
>>>>>>>>> >
>>>>>>>>> > On 30 June 2015 at 09:45, Duarte Molha <duartemolha at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> > Dear devs
>>>>>>>>> >
>>>>>>>>> > How can I search for a specific clone id present on your tilepath
>>>>>>>>> >
>>>>>>>>> > for example this: RP5-892C22
>>>>>>>>> >
>>>>>>>>> > I would like to use the perl API if possible
>>>>>>>>> >
>>>>>>>>> > Many thanks
>>>>>>>>> >
>>>>>>>>> > Duarte
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > =========================
>>>>>>>>> >      Duarte Miguel Paulo Molha
>>>>>>>>> >          http://about.me/duarte
>>>>>>>>> > =========================
>>>>>>>>> >
>>>>>>>>>  > _______________________________________________
>>>>>>>>> > Dev mailing list    Dev at ensembl.org
>>>>>>>>> > Posting guidelines and subscribe/unsubscribe info:
>>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>> > Ensembl Blog: http://www.ensembl.info/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>  _______________________________________________
>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>
>>>>>>>
>>>>>>  _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150707/b9d7ccb6/attachment.html>


More information about the Dev mailing list