[ensembl-dev] Unexpected behaviour of fetch_all_by_outward_search

Asier Gonzalez gonzaleza at ebi.ac.uk
Wed Jun 26 09:51:35 BST 2019


Hi Brandon,

Do you have any updates about this?

Thanks,
Asier

On 07/06/2019 16:42, Brandon Walts wrote:
>
> Hi Asier
>
> Thanks for bringing this up. We will look into what's going on and see 
> if there is a bug, if the documentation needs improvement, or both.
>
> Best
> -Brandon
>
> On 07/06/2019 13:47, Asier Gonzalez wrote:
>>
>> Hi all,
>>
>> I'm troubleshooting a Perl tool that calls the Ensembl API with a 
>> variant id and tries to find the gene with the closest 5' end within 
>> a 500 kb window. The tool was written by a colleague and it uses 
>> Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search() 
>> like this:
>>
>> my @gene_list_for_feature  = @{$gene_adaptor->fetch_all_by_outward_search(
>>                                                     -FEATURE => $var_feature,
>>                                                     -RANGE =>10000,
>>                                                     -MAX_RANGE =>500000,
>>                                                     -LIMIT =>40,
>>                                                     -FIVE_PRIME =>1)};
>>
>> According to the documentation of this function 
>> (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a), 
>> it "Searches for features within the suggested -RANGE, and if it 
>> finds none, expands the search area until it satisfies -LIMIT or hits 
>> -MAX_RANGE". My understanding is that in my case it should search 
>> first in a 10 kb window and, if there are no genes, progressively 
>> expand it to up to 500 kb unless it finds 40 features before. 
>> However, this is not the behaviour I am seeing, the search range 
>> grows like this: 10k, 20k, 60k, 240k and 1.20M. Is this a bug or have 
>> I misundertood what it does?
>>
>> I have looked into the code of this subroutine 
>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469) 
>> and the search window growths exponentially because it multiplies the 
>> previous value instead of the initial value:
>>
>> [L1452] $search_range = $search_range * $factor;
>>
>> In addition, it is not true that it only expands the range if it does 
>> not find any features in the initial window, which is obvious from 
>> looking into the while statement:
>>
>> [L1451] while (scalar @results < $limit && $search_range <= $max_range) {
>>
>> I am also confused by the fact that, apparently, the found features 
>> only need to be partially within the range. For instance, 
>> ENSG00000150394 (CDH8) is found with the above parameters although 
>> its 5' prime end is 1,338,771 bp away from the variant according to 
>> the distance reported by the function. So, it seems that the feature 
>> is found because its 3' end is within the range although the 5' prime 
>> end, which is what I am interested in, is not. This somehow 
>> contradicts what the documentation says 
>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491): 
>> "When looking beyond the boundaries of the source Feature, the 
>> distance is measured to the nearest end of that Feature to the nearby 
>> Feature's nearest end."
>>
>> Any help will be much appreciated. I am happy to share code if you 
>> think it would be useful.
>>
>> Thanks,
>> Asier
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog:http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190626/c41d6514/attachment.html>


More information about the Dev mailing list