[ensembl-dev] Unexpected behaviour of fetch_all_by_outward_search

Asier Gonzalez gonzaleza at ebi.ac.uk
Fri Aug 9 13:46:42 BST 2019


Hi Brandon,

Apologies for the delay but I have finally had time to make the changes 
and open a PR (https://github.com/Ensembl/ensembl/pull/407).

Please let me know should there be any issues. I can confirm that I have 
used my version of the code and it now works as expected.

Best wishes,
Asier

On 26/06/2019 11:35, Brandon Walts wrote:
>
> Hi Asier
>
> It's great to hear that you've devised a fix. We would welcome a PR 
> against master, or to see your changes, whichever you'd prefer. We'll 
> be able to look at it in the next few days.
>
> Best
> -Brandon
>
> On 26/06/2019 11:03, Asier Gonzalez wrote:
>>
>> Hi Brandon,
>>
>> Thank you for your response. Do you have an idea of when it could be 
>> fixed? I mean, are we talking about weeks or months? I use a tool 
>> that calls this function at least every two months so I have amended 
>> the code to do what I believe it is supposed to do. I could share it 
>> with you if it would help you, or I could open a PR if you accept 
>> them. I understand that you may have other priorities, but at least I 
>> want to make sure that the future version will do what mine already does.
>>
>> Best wishes,
>> Asier
>>
>> On 26/06/2019 10:56, Brandon Walts wrote:
>>>
>>> Hi Asier
>>>
>>> We've had a chance to look into it and you are correct, this 
>>> function is not working as described. As currently implemented, it 
>>> will return more results than expected. It's on our list to fix, and 
>>> we plan to get to it in the near future.
>>>
>>> Best
>>> -Brandon
>>>
>>> On 26/06/2019 09:51, Asier Gonzalez wrote:
>>>>
>>>> Hi Brandon,
>>>>
>>>> Do you have any updates about this?
>>>>
>>>> Thanks,
>>>> Asier
>>>>
>>>> On 07/06/2019 16:42, Brandon Walts wrote:
>>>>>
>>>>> Hi Asier
>>>>>
>>>>> Thanks for bringing this up. We will look into what's going on and 
>>>>> see if there is a bug, if the documentation needs improvement, or 
>>>>> both.
>>>>>
>>>>> Best
>>>>> -Brandon
>>>>>
>>>>> On 07/06/2019 13:47, Asier Gonzalez wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'm troubleshooting a Perl tool that calls the Ensembl API with a 
>>>>>> variant id and tries to find the gene with the closest 5' end 
>>>>>> within a 500 kb window. The tool was written by a colleague and 
>>>>>> it uses 
>>>>>> Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search() 
>>>>>> like this:
>>>>>>
>>>>>> my @gene_list_for_feature  = @{$gene_adaptor->fetch_all_by_outward_search(
>>>>>>                                                     -FEATURE => $var_feature,
>>>>>>                                                     -RANGE =>10000,
>>>>>>                                                     -MAX_RANGE =>500000,
>>>>>>                                                     -LIMIT =>40,
>>>>>>                                                     -FIVE_PRIME =>1)};
>>>>>>
>>>>>> According to the documentation of this function 
>>>>>> (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a), 
>>>>>> it "Searches for features within the suggested -RANGE, and if it 
>>>>>> finds none, expands the search area until it satisfies -LIMIT or 
>>>>>> hits -MAX_RANGE". My understanding is that in my case it should 
>>>>>> search first in a 10 kb window and, if there are no genes, 
>>>>>> progressively expand it to up to 500 kb unless it finds 40 
>>>>>> features before. However, this is not the behaviour I am seeing, 
>>>>>> the search range grows like this: 10k, 20k, 60k, 240k and 1.20M. 
>>>>>> Is this a bug or have I misundertood what it does?
>>>>>>
>>>>>> I have looked into the code of this subroutine 
>>>>>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469) 
>>>>>> and the search window growths exponentially because it multiplies 
>>>>>> the previous value instead of the initial value:
>>>>>>
>>>>>> [L1452] $search_range = $search_range * $factor;
>>>>>>
>>>>>> In addition, it is not true that it only expands the range if it 
>>>>>> does not find any features in the initial window, which is 
>>>>>> obvious from looking into the while statement:
>>>>>>
>>>>>> [L1451] while (scalar @results < $limit && $search_range <= 
>>>>>> $max_range) {
>>>>>>
>>>>>> I am also confused by the fact that, apparently, the found 
>>>>>> features only need to be partially within the range. For 
>>>>>> instance, ENSG00000150394 (CDH8) is found with the above 
>>>>>> parameters although its 5' prime end is 1,338,771 bp away from 
>>>>>> the variant according to the distance reported by the function. 
>>>>>> So, it seems that the feature is found because its 3' end is 
>>>>>> within the range although the 5' prime end, which is what I am 
>>>>>> interested in, is not. This somehow contradicts what the 
>>>>>> documentation says 
>>>>>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491): 
>>>>>> "When looking beyond the boundaries of the source Feature, the 
>>>>>> distance is measured to the nearest end of that Feature to the 
>>>>>> nearby Feature's nearest end."
>>>>>>
>>>>>> Any help will be much appreciated. I am happy to share code if 
>>>>>> you think it would be useful.
>>>>>>
>>>>>> Thanks,
>>>>>> Asier
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing listDev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>>>> Ensembl Blog:http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190809/ecac0a3c/attachment.html>


More information about the Dev mailing list