[ensembl-dev] Unexpected behaviour of fetch_all_by_outward_search

Asier Gonzalez gonzaleza at ebi.ac.uk
Wed Jun 26 11:03:17 BST 2019


Hi Brandon,

Thank you for your response. Do you have an idea of when it could be 
fixed? I mean, are we talking about weeks or months? I use a tool that 
calls this function at least every two months so I have amended the code 
to do what I believe it is supposed to do. I could share it with you if 
it would help you, or I could open a PR if you accept them. I understand 
that you may have other priorities, but at least I want to make sure 
that the future version will do what mine already does.

Best wishes,
Asier

On 26/06/2019 10:56, Brandon Walts wrote:
>
> Hi Asier
>
> We've had a chance to look into it and you are correct, this function 
> is not working as described. As currently implemented, it will return 
> more results than expected. It's on our list to fix, and we plan to 
> get to it in the near future.
>
> Best
> -Brandon
>
> On 26/06/2019 09:51, Asier Gonzalez wrote:
>>
>> Hi Brandon,
>>
>> Do you have any updates about this?
>>
>> Thanks,
>> Asier
>>
>> On 07/06/2019 16:42, Brandon Walts wrote:
>>>
>>> Hi Asier
>>>
>>> Thanks for bringing this up. We will look into what's going on and 
>>> see if there is a bug, if the documentation needs improvement, or both.
>>>
>>> Best
>>> -Brandon
>>>
>>> On 07/06/2019 13:47, Asier Gonzalez wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I'm troubleshooting a Perl tool that calls the Ensembl API with a 
>>>> variant id and tries to find the gene with the closest 5' end 
>>>> within a 500 kb window. The tool was written by a colleague and it 
>>>> uses 
>>>> Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search() 
>>>> like this:
>>>>
>>>> my @gene_list_for_feature  = @{$gene_adaptor->fetch_all_by_outward_search(
>>>>                                                     -FEATURE => $var_feature,
>>>>                                                     -RANGE =>10000,
>>>>                                                     -MAX_RANGE =>500000,
>>>>                                                     -LIMIT =>40,
>>>>                                                     -FIVE_PRIME =>1)};
>>>>
>>>> According to the documentation of this function 
>>>> (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a), 
>>>> it "Searches for features within the suggested -RANGE, and if it 
>>>> finds none, expands the search area until it satisfies -LIMIT or 
>>>> hits -MAX_RANGE". My understanding is that in my case it should 
>>>> search first in a 10 kb window and, if there are no genes, 
>>>> progressively expand it to up to 500 kb unless it finds 40 features 
>>>> before. However, this is not the behaviour I am seeing, the search 
>>>> range grows like this: 10k, 20k, 60k, 240k and 1.20M. Is this a bug 
>>>> or have I misundertood what it does?
>>>>
>>>> I have looked into the code of this subroutine 
>>>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469) 
>>>> and the search window growths exponentially because it multiplies 
>>>> the previous value instead of the initial value:
>>>>
>>>> [L1452] $search_range = $search_range * $factor;
>>>>
>>>> In addition, it is not true that it only expands the range if it 
>>>> does not find any features in the initial window, which is obvious 
>>>> from looking into the while statement:
>>>>
>>>> [L1451] while (scalar @results < $limit && $search_range <= 
>>>> $max_range) {
>>>>
>>>> I am also confused by the fact that, apparently, the found features 
>>>> only need to be partially within the range. For instance, 
>>>> ENSG00000150394 (CDH8) is found with the above parameters although 
>>>> its 5' prime end is 1,338,771 bp away from the variant according to 
>>>> the distance reported by the function. So, it seems that the 
>>>> feature is found because its 3' end is within the range although 
>>>> the 5' prime end, which is what I am interested in, is not. This 
>>>> somehow contradicts what the documentation says 
>>>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491): 
>>>> "When looking beyond the boundaries of the source Feature, the 
>>>> distance is measured to the nearest end of that Feature to the 
>>>> nearby Feature's nearest end."
>>>>
>>>> Any help will be much appreciated. I am happy to share code if you 
>>>> think it would be useful.
>>>>
>>>> Thanks,
>>>> Asier
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing listDev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>> Ensembl Blog:http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190626/f10d5e97/attachment.html>


More information about the Dev mailing list