[ensembl-dev] Unexpected behaviour of fetch_all_by_outward_search
Brandon Walts
bwalts at ebi.ac.uk
Fri Jun 7 16:42:32 BST 2019
Hi Asier
Thanks for bringing this up. We will look into what's going on and see
if there is a bug, if the documentation needs improvement, or both.
Best
-Brandon
On 07/06/2019 13:47, Asier Gonzalez wrote:
>
> Hi all,
>
> I'm troubleshooting a Perl tool that calls the Ensembl API with a
> variant id and tries to find the gene with the closest 5' end within a
> 500 kb window. The tool was written by a colleague and it uses
> Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search()
> like this:
>
> my @gene_list_for_feature = @{$gene_adaptor->fetch_all_by_outward_search(
> -FEATURE => $var_feature,
> -RANGE =>10000,
> -MAX_RANGE =>500000,
> -LIMIT =>40,
> -FIVE_PRIME =>1)};
>
> According to the documentation of this function
> (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a),
> it "Searches for features within the suggested -RANGE, and if it finds
> none, expands the search area until it satisfies -LIMIT or hits
> -MAX_RANGE". My understanding is that in my case it should search
> first in a 10 kb window and, if there are no genes, progressively
> expand it to up to 500 kb unless it finds 40 features before. However,
> this is not the behaviour I am seeing, the search range grows like
> this: 10k, 20k, 60k, 240k and 1.20M. Is this a bug or have I
> misundertood what it does?
>
> I have looked into the code of this subroutine
> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469)
> and the search window growths exponentially because it multiplies the
> previous value instead of the initial value:
>
> [L1452] $search_range = $search_range * $factor;
>
> In addition, it is not true that it only expands the range if it does
> not find any features in the initial window, which is obvious from
> looking into the while statement:
>
> [L1451] while (scalar @results < $limit && $search_range <= $max_range) {
>
> I am also confused by the fact that, apparently, the found features
> only need to be partially within the range. For instance,
> ENSG00000150394 (CDH8) is found with the above parameters although its
> 5' prime end is 1,338,771 bp away from the variant according to the
> distance reported by the function. So, it seems that the feature is
> found because its 3' end is within the range although the 5' prime
> end, which is what I am interested in, is not. This somehow
> contradicts what the documentation says
> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491):
> "When looking beyond the boundaries of the source Feature, the
> distance is measured to the nearest end of that Feature to the nearby
> Feature's nearest end."
>
> Any help will be much appreciated. I am happy to share code if you
> think it would be useful.
>
> Thanks,
> Asier
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190607/25174e28/attachment.html>
More information about the Dev
mailing list