[ensembl-dev] Unexpected behaviour of fetch_all_by_outward_search

Brandon Walts bwalts at ebi.ac.uk
Wed Jun 26 10:56:20 BST 2019


Hi Asier

We've had a chance to look into it and you are correct, this function is 
not working as described. As currently implemented, it will return more 
results than expected. It's on our list to fix, and we plan to get to it 
in the near future.

Best
-Brandon

On 26/06/2019 09:51, Asier Gonzalez wrote:
>
> Hi Brandon,
>
> Do you have any updates about this?
>
> Thanks,
> Asier
>
> On 07/06/2019 16:42, Brandon Walts wrote:
>>
>> Hi Asier
>>
>> Thanks for bringing this up. We will look into what's going on and 
>> see if there is a bug, if the documentation needs improvement, or both.
>>
>> Best
>> -Brandon
>>
>> On 07/06/2019 13:47, Asier Gonzalez wrote:
>>>
>>> Hi all,
>>>
>>> I'm troubleshooting a Perl tool that calls the Ensembl API with a 
>>> variant id and tries to find the gene with the closest 5' end within 
>>> a 500 kb window. The tool was written by a colleague and it uses 
>>> Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search() 
>>> like this:
>>>
>>> my @gene_list_for_feature  = @{$gene_adaptor->fetch_all_by_outward_search(
>>>                                                     -FEATURE => $var_feature,
>>>                                                     -RANGE =>10000,
>>>                                                     -MAX_RANGE =>500000,
>>>                                                     -LIMIT =>40,
>>>                                                     -FIVE_PRIME =>1)};
>>>
>>> According to the documentation of this function 
>>> (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a), 
>>> it "Searches for features within the suggested -RANGE, and if it 
>>> finds none, expands the search area until it satisfies -LIMIT or 
>>> hits -MAX_RANGE". My understanding is that in my case it should 
>>> search first in a 10 kb window and, if there are no genes, 
>>> progressively expand it to up to 500 kb unless it finds 40 features 
>>> before. However, this is not the behaviour I am seeing, the search 
>>> range grows like this: 10k, 20k, 60k, 240k and 1.20M. Is this a bug 
>>> or have I misundertood what it does?
>>>
>>> I have looked into the code of this subroutine 
>>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469) 
>>> and the search window growths exponentially because it multiplies 
>>> the previous value instead of the initial value:
>>>
>>> [L1452] $search_range = $search_range * $factor;
>>>
>>> In addition, it is not true that it only expands the range if it 
>>> does not find any features in the initial window, which is obvious 
>>> from looking into the while statement:
>>>
>>> [L1451] while (scalar @results < $limit && $search_range <= 
>>> $max_range) {
>>>
>>> I am also confused by the fact that, apparently, the found features 
>>> only need to be partially within the range. For instance, 
>>> ENSG00000150394 (CDH8) is found with the above parameters although 
>>> its 5' prime end is 1,338,771 bp away from the variant according to 
>>> the distance reported by the function. So, it seems that the feature 
>>> is found because its 3' end is within the range although the 5' 
>>> prime end, which is what I am interested in, is not. This somehow 
>>> contradicts what the documentation says 
>>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491): 
>>> "When looking beyond the boundaries of the source Feature, the 
>>> distance is measured to the nearest end of that Feature to the 
>>> nearby Feature's nearest end."
>>>
>>> Any help will be much appreciated. I am happy to share code if you 
>>> think it would be useful.
>>>
>>> Thanks,
>>> Asier
>>>
>>>
>>> _______________________________________________
>>> Dev mailing listDev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>> Ensembl Blog:http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190626/ee72f0c1/attachment.html>


More information about the Dev mailing list