[ensembl-dev] Unexpected behaviour of fetch_all_by_outward_search

Brandon Walts bwalts at ebi.ac.uk
Wed Jun 26 11:35:28 BST 2019


Hi Asier

It's great to hear that you've devised a fix. We would welcome a PR 
against master, or to see your changes, whichever you'd prefer. We'll be 
able to look at it in the next few days.

Best
-Brandon

On 26/06/2019 11:03, Asier Gonzalez wrote:
>
> Hi Brandon,
>
> Thank you for your response. Do you have an idea of when it could be 
> fixed? I mean, are we talking about weeks or months? I use a tool that 
> calls this function at least every two months so I have amended the 
> code to do what I believe it is supposed to do. I could share it with 
> you if it would help you, or I could open a PR if you accept them. I 
> understand that you may have other priorities, but at least I want to 
> make sure that the future version will do what mine already does.
>
> Best wishes,
> Asier
>
> On 26/06/2019 10:56, Brandon Walts wrote:
>>
>> Hi Asier
>>
>> We've had a chance to look into it and you are correct, this function 
>> is not working as described. As currently implemented, it will return 
>> more results than expected. It's on our list to fix, and we plan to 
>> get to it in the near future.
>>
>> Best
>> -Brandon
>>
>> On 26/06/2019 09:51, Asier Gonzalez wrote:
>>>
>>> Hi Brandon,
>>>
>>> Do you have any updates about this?
>>>
>>> Thanks,
>>> Asier
>>>
>>> On 07/06/2019 16:42, Brandon Walts wrote:
>>>>
>>>> Hi Asier
>>>>
>>>> Thanks for bringing this up. We will look into what's going on and 
>>>> see if there is a bug, if the documentation needs improvement, or both.
>>>>
>>>> Best
>>>> -Brandon
>>>>
>>>> On 07/06/2019 13:47, Asier Gonzalez wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm troubleshooting a Perl tool that calls the Ensembl API with a 
>>>>> variant id and tries to find the gene with the closest 5' end 
>>>>> within a 500 kb window. The tool was written by a colleague and it 
>>>>> uses 
>>>>> Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search() 
>>>>> like this:
>>>>>
>>>>> my @gene_list_for_feature  = @{$gene_adaptor->fetch_all_by_outward_search(
>>>>>                                                     -FEATURE => $var_feature,
>>>>>                                                     -RANGE =>10000,
>>>>>                                                     -MAX_RANGE =>500000,
>>>>>                                                     -LIMIT =>40,
>>>>>                                                     -FIVE_PRIME =>1)};
>>>>>
>>>>> According to the documentation of this function 
>>>>> (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a), 
>>>>> it "Searches for features within the suggested -RANGE, and if it 
>>>>> finds none, expands the search area until it satisfies -LIMIT or 
>>>>> hits -MAX_RANGE". My understanding is that in my case it should 
>>>>> search first in a 10 kb window and, if there are no genes, 
>>>>> progressively expand it to up to 500 kb unless it finds 40 
>>>>> features before. However, this is not the behaviour I am seeing, 
>>>>> the search range grows like this: 10k, 20k, 60k, 240k and 1.20M. 
>>>>> Is this a bug or have I misundertood what it does?
>>>>>
>>>>> I have looked into the code of this subroutine 
>>>>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469) 
>>>>> and the search window growths exponentially because it multiplies 
>>>>> the previous value instead of the initial value:
>>>>>
>>>>> [L1452] $search_range = $search_range * $factor;
>>>>>
>>>>> In addition, it is not true that it only expands the range if it 
>>>>> does not find any features in the initial window, which is obvious 
>>>>> from looking into the while statement:
>>>>>
>>>>> [L1451] while (scalar @results < $limit && $search_range <= 
>>>>> $max_range) {
>>>>>
>>>>> I am also confused by the fact that, apparently, the found 
>>>>> features only need to be partially within the range. For instance, 
>>>>> ENSG00000150394 (CDH8) is found with the above parameters although 
>>>>> its 5' prime end is 1,338,771 bp away from the variant according 
>>>>> to the distance reported by the function. So, it seems that the 
>>>>> feature is found because its 3' end is within the range although 
>>>>> the 5' prime end, which is what I am interested in, is not. This 
>>>>> somehow contradicts what the documentation says 
>>>>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491): 
>>>>> "When looking beyond the boundaries of the source Feature, the 
>>>>> distance is measured to the nearest end of that Feature to the 
>>>>> nearby Feature's nearest end."
>>>>>
>>>>> Any help will be much appreciated. I am happy to share code if you 
>>>>> think it would be useful.
>>>>>
>>>>> Thanks,
>>>>> Asier
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing listDev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>>> Ensembl Blog:http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190626/e891cecc/attachment.html>


More information about the Dev mailing list