[ensembl-dev] Unexpected behaviour of fetch_all_by_outward_search

Brandon Walts bwalts at ebi.ac.uk
Fri Jun 7 16:42:32 BST 2019


Hi Asier

Thanks for bringing this up. We will look into what's going on and see 
if there is a bug, if the documentation needs improvement, or both.

Best
-Brandon

On 07/06/2019 13:47, Asier Gonzalez wrote:
>
> Hi all,
>
> I'm troubleshooting a Perl tool that calls the Ensembl API with a 
> variant id and tries to find the gene with the closest 5' end within a 
> 500 kb window. The tool was written by a colleague and it uses 
> Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search() 
> like this:
>
> my @gene_list_for_feature  = @{$gene_adaptor->fetch_all_by_outward_search(
>                                                     -FEATURE => $var_feature,
>                                                     -RANGE =>10000,
>                                                     -MAX_RANGE =>500000,
>                                                     -LIMIT =>40,
>                                                     -FIVE_PRIME =>1)};
>
> According to the documentation of this function 
> (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a), 
> it "Searches for features within the suggested -RANGE, and if it finds 
> none, expands the search area until it satisfies -LIMIT or hits 
> -MAX_RANGE". My understanding is that in my case it should search 
> first in a 10 kb window and, if there are no genes, progressively 
> expand it to up to 500 kb unless it finds 40 features before. However, 
> this is not the behaviour I am seeing, the search range grows like 
> this: 10k, 20k, 60k, 240k and 1.20M. Is this a bug or have I 
> misundertood what it does?
>
> I have looked into the code of this subroutine 
> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469) 
> and the search window growths exponentially because it multiplies the 
> previous value instead of the initial value:
>
> [L1452] $search_range = $search_range * $factor;
>
> In addition, it is not true that it only expands the range if it does 
> not find any features in the initial window, which is obvious from 
> looking into the while statement:
>
> [L1451] while (scalar @results < $limit && $search_range <= $max_range) {
>
> I am also confused by the fact that, apparently, the found features 
> only need to be partially within the range. For instance, 
> ENSG00000150394 (CDH8) is found with the above parameters although its 
> 5' prime end is 1,338,771 bp away from the variant according to the 
> distance reported by the function. So, it seems that the feature is 
> found because its 3' end is within the range although the 5' prime 
> end, which is what I am interested in, is not. This somehow 
> contradicts what the documentation says 
> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491): 
> "When looking beyond the boundaries of the source Feature, the 
> distance is measured to the nearest end of that Feature to the nearby 
> Feature's nearest end."
>
> Any help will be much appreciated. I am happy to share code if you 
> think it would be useful.
>
> Thanks,
> Asier
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190607/25174e28/attachment.html>


More information about the Dev mailing list