[ensembl-dev] accessing the tilepath entries programatically

Thibaut Hourlier thibaut at ebi.ac.uk
Tue Jul 7 09:55:10 BST 2015


Hi Duarte,
Sorry for the late reply. This is not a problem with the API but a data problem.
When you do the call $mf_adaptor->fetch_all_by_attribute_type_value( 'Name', $query ) you're asking for all the clones (misc features) which have the attribute 'Name'. So if we haven't store any data with this attribute, the API will return nothing but it will have done it's job. We used a different attribute in this case which was a error. It will be fixed for release 82. Sorry for the problem it's causing you.

Before replying to your previous question about my code snippet, I may have misunderstood what you wanted to do. By printing the clone coordinates, do you want to know the coordinates of the clones in the assembly or do you want to know where the clones overlap the assembly (this is what the tilepath is)?

Regards
Thibaut
> On 7 Jul 2015, at 09:20, Duarte Molha <duartemolha at gmail.com> wrote:
> 
> Anyone? Could you help me understand why the changing behavior of the same API between datasets?
> 
> =========================
>      Duarte Miguel Paulo Molha      
>          http://about.me/duarte <http://about.me/duarte>         
> =========================
> 
> On 3 July 2015 at 17:01, Duarte Molha <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
> Thanks Magali
> 
>  
> Can you explain something to me?
> 
>  
> You are now keeping the api compatible with both GRCH37 and GRCH38. This is great because I can use my scripts with the latest API and not worry about having to use an older API to query the older assembly. However I do not understand why, in this case changing ‘clone_name’ to ‘Name’ works when querying GRCh38 but fails when querying GRCh37.
> 
>  
>  Shouldn't the API calls be the same for both datasets. This means  I have to change my code depending on what database I am querying. Isn't this what the move to update the api for both datasets is trying to avoid?
> 
> 
> 
> Best regards
> 
> 
> 
> Duarte
> 
> 
> 
> 
> =========================
>      Duarte Miguel Paulo Molha      
>          http://about.me/duarte <http://about.me/duarte>         
> =========================
> 
> On 2 July 2015 at 16:40, mag <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>> wrote:
> Hi Duarte,
> 
> Replacing 'clone_name' with 'Name' as Thibaut suggested works for me for GRCh38.
> 
> my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value( 'Name', $query );
> 
> while ( my $clone = shift @{$clones} ) {
>   my $slice = $clone->slice();
>   print join "\t", ("chr".$slice->seq_region_name(), $clone->start(), $clone->end() , $query."\n");
> }
> 
> 
> Regards,
> Magali
> 
> 
> On 01/07/2015 18:15, Duarte Molha wrote:
>> I would still appreciate some help with this query. If possible.
>> 
>> On 30 Jun 2015 16:29, "Duarte Molha" <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>> Thibaut... Could you expand on how I can change my script to make it work with the new assembly?
>> I have just realised that the reason I am no getting 60 BAC entries is because their are only present in GRCh38 and not on the GRCh37
>> 
>> Can you tell me how I can modify my script to work with the new assembly?
>> 
>> I don't seem to understand the projection method you are using.
>> Here is the relevant part of my script 
>> 
>> my $mf_adaptor         = $registry->get_adaptor( 'Human', 'Core', 'MiscFeature' );
>> 
>> open (IN, ,"<", $options->{list})|| die "Could not open ".$options->{list}." for reading \n";
>> my @input_queries = <IN>;
>> close IN;
>> 
>> foreach my $query (@input_queries){
>>  chomp $query;
>>  my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value( 'clone_name', $query );
>> 
>>  while ( my $clone = shift @{$clones} ) {
>>  my $slice = $clone->slice();
>>  print join "\t", ("chr".$slice->seq_region_name(), $clone->start(), $clone->end() , $query."\n"); 
>>  }
>> }
>> 
>> 
>> Best regards
>> 
>> Duarte
>> 
>> =========================
>>      Duarte Miguel Paulo Molha      
>>          http://about.me/duarte <http://about.me/duarte>         
>> =========================
>> 
>> On 30 June 2015 at 15:46, Duarte Molha <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>> no. That does not get anything.
>> 
>> 
>> 
>> =========================
>>      Duarte Miguel Paulo Molha      
>>          http://about.me/duarte <http://about.me/duarte>         
>> =========================
>> 
>> On 30 June 2015 at 14:50, Thibaut Hourlier <thibaut at ebi.ac.uk <mailto:thibaut at ebi.ac.uk>> wrote:
>> If you use name instead of clone_name, does it fetches the missing one?
>> 
>> Cheers
>> Thibaut
>> 
>>> On 30 Jun 2015, at 14:27, Duarte Molha <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>>> 
>>> Yes I am using the GRCh37 Thibaut  ... so I am ok for now... but it is good to know this does not work with the latest assembly. 
>>> However... can you please answer my question regarding the missing clones like  RP11-155D3 ... why can I not fetch this when it is clearly on the database?
>>> 
>>> Thanks
>>> 
>>> Duarte
>>> 
>>> 
>>> 
>>> =========================
>>>      Duarte Miguel Paulo Molha      
>>>          http://about.me/duarte <http://about.me/duarte>         
>>> =========================
>>> 
>>> On 30 June 2015 at 14:12, Thibaut Hourlier <thibaut at ebi.ac.uk <mailto:thibaut at ebi.ac.uk>> wrote:
>>> My first question should have been which assembly are you using...
>>> 
>>> So yes this will work for GRCh37. Unfortunately it will not work for GRCh38 but this is something that we will fix for release 82.
>>> 
>>> So in the case of GRCh38, it is still possible but more complicated. It should work by getting the slice then projecting on the clone coordinate system
>>> 
>>> $subSlice = $misc_clone->feature_Slice;
>>> $projectionSegment = $subSlice->project('clone')
>>> 
>>> Cheers
>>> Thibaut
>>> 
>>>> On 30 Jun 2015, at 13:56, Duarte Molha <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>>>> 
>>>> Nevermind... after searching for miscFeatures information I found the relevant part in the api tutorial
>>>> 
>>>> Just for reference to anyone that has the same difficulties here is the relevant portion of the code I used:
>>>> (please let me know if there is something I did wrong Thibaut)
>>>> 
>>>> my $mf_adaptor         = $registry->get_adaptor( 'Human', 'Core', 'MiscFeature' );
>>>> 
>>>> open (IN, ,"<", $options->{list})|| die "Could not open ".$options->{list}." for reading \n";
>>>> my @input_queries = <IN>;
>>>> close IN;
>>>> 
>>>> foreach my $query (@input_queries){
>>>> 
>>>>                                                           chomp $query;
>>>> 
>>>>                                                           my $clones =  $mf_adaptor->fetch_all_by_attribute_type_value( 'clone_name', $query );
>>>> 
>>>> 
>>>>                                                           while ( my $clone = shift @{$clones} ) {
>>>> 
>>>>                                                           my $slice = $clone->slice();
>>>> 
>>>>                                                           print join "\t", ("chr".$slice->seq_region_name(), $clone->start(), $clone->end() , $query."\n"); 
>>>> 
>>>>                                                           }
>>>> }
>>>> 
>>>> 
>>>> Best regards
>>>> 
>>>> Duarte
>>>> 
>>>> =========================
>>>>      Duarte Miguel Paulo Molha      
>>>>          http://about.me/duarte <http://about.me/duarte>         
>>>> =========================
>>>> 
>>>> On 30 June 2015 at 13:26, Duarte Molha <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>>>> Many thanks Thibaut
>>>> 
>>>> So... in regards to your question...
>>>> 
>>>> How can I query a specific clone and its correct coordinates if I know  the clone ID.
>>>> 
>>>> For example
>>>> 
>>>> assuming this clone:
>>>>  RP11-100N21
>>>> 
>>>> In other words , how to I query the underlying clone dataset and output those clones in genomic coordinates?
>>>> 
>>>> Many thanks
>>>> 
>>>> Duarte
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> =========================
>>>>      Duarte Miguel Paulo Molha      
>>>>          http://about.me/duarte <http://about.me/duarte>         
>>>> =========================
>>>> 
>>>> On 30 June 2015 at 13:15, Thibaut Hourlier <thibaut at ebi.ac.uk <mailto:thibaut at ebi.ac.uk>> wrote:
>>>> Hi Duarte,
>>>> The clone names are stored in the misc_* tables. So you need to use the MiscFeatureAdaptor, http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1MiscFeatureAdaptor.html <http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1MiscFeatureAdaptor.html>:
>>>> 
>>>> my $misc_clones = $mfa->fetch_all_by_Slice_and_set_code('tilepath');
>>>> foreach my $clone (@$misc_clones) {
>>>>  print join("\t", $clone->slice->seq_region_name, $clone->start, $clone->end, @{$clone->get_all_attribute_values('name')}), "\n";
>>>> }
>>>> 
>>>> A warning though, this is the tilepath so the boundaries of the clones are different from the contigs/clones in the assembly as sometimes they didn't use the entire clone for the assembly
>>>> 
>>>> Hope this help
>>>> 
>>>> Thibaut
>>>> 
>>>> > On 30 Jun 2015, at 11:50, Duarte Molha <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>>>> >
>>>> > I used this code to get all the gebnomic coordinates of your subcontigs:
>>>> >
>>>> >
>>>> > my @slices = @{ $slice_adaptor->fetch_all('clone') };
>>>> > foreach my $slice (@slices){
>>>> >       $progress->update();
>>>> >       my $clone_name =  $slice->seq_region_name();
>>>> >       my $projection = $slice->project('toplevel');
>>>> >       foreach my $segment ( @{$projection} ) {
>>>> >               my $to_slice = $segment->to_Slice();
>>>> >               print join "\t", ("chr".$to_slice->seq_region_name(), $to_slice->start(), $to_slice->end(), $clone_name."\n");
>>>> >       }
>>>> > }
>>>> >
>>>> > However, by doing this, the database does not fetch the original clone name
>>>> >
>>>> > for example.. using this script I get
>>>> > chr4    47567235        47733411        AC092597.1
>>>> >
>>>> > However I would like to get :
>>>> >
>>>> > chr4    47567235        47733411        RP11-100N21
>>>> >
>>>> > Can someone explain what I am doing wrong?
>>>> >
>>>> > Thanks
>>>> >
>>>> > Duarte
>>>> >
>>>> >
>>>> >
>>>> > =========================
>>>> >      Duarte Miguel Paulo Molha
>>>> >          http://about.me/duarte <http://about.me/duarte>
>>>> > =========================
>>>> >
>>>> > On 30 June 2015 at 09:45, Duarte Molha <duartemolha at gmail.com <mailto:duartemolha at gmail.com>> wrote:
>>>> > Dear devs
>>>> >
>>>> > How can I search for a specific clone id present on your tilepath
>>>> >
>>>> > for example this: RP5-892C22
>>>> >
>>>> > I would like to use the perl API if possible
>>>> >
>>>> > Many thanks
>>>> >
>>>> > Duarte
>>>> >
>>>> >
>>>> >
>>>> > =========================
>>>> >      Duarte Miguel Paulo Molha
>>>> >          http://about.me/duarte <http://about.me/duarte>
>>>> > =========================
>>>> >
>>>> > _______________________________________________
>>>> > Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>> > Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>>>> > Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>>>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>>>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>>> 
>>> 
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev <http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
> 
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150707/63bf25ba/attachment.html>


More information about the Dev mailing list