[ensembl-dev] Unexpected behaviour of fetch_all_by_outward_search

Asier Gonzalez gonzaleza at ebi.ac.uk
Fri Jun 7 13:47:02 BST 2019


Hi all,

I'm troubleshooting a Perl tool that calls the Ensembl API with a 
variant id and tries to find the gene with the closest 5' end within a 
500 kb window. The tool was written by a colleague and it uses 
Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search() 
like this:

my @gene_list_for_feature  = @{$gene_adaptor->fetch_all_by_outward_search(
                                                    -FEATURE => $var_feature,
                                                    -RANGE =>10000,
                                                    -MAX_RANGE =>500000,
                                                    -LIMIT =>40,
                                                    -FIVE_PRIME =>1)};

According to the documentation of this function 
(http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a), 
it "Searches for features within the suggested -RANGE, and if it finds 
none, expands the search area until it satisfies -LIMIT or hits 
-MAX_RANGE". My understanding is that in my case it should search first 
in a 10 kb window and, if there are no genes, progressively expand it to 
up to 500 kb unless it finds 40 features before. However, this is not 
the behaviour I am seeing, the search range grows like this: 10k, 20k, 
60k, 240k and 1.20M. Is this a bug or have I misundertood what it does?

I have looked into the code of this subroutine 
(https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469) 
and the search window growths exponentially because it multiplies the 
previous value instead of the initial value:

[L1452] $search_range = $search_range * $factor;

In addition, it is not true that it only expands the range if it does 
not find any features in the initial window, which is obvious from 
looking into the while statement:

[L1451] while (scalar @results < $limit && $search_range <= $max_range) {

I am also confused by the fact that, apparently, the found features only 
need to be partially within the range. For instance, ENSG00000150394 
(CDH8) is found with the above parameters although its 5' prime end is 
1,338,771 bp away from the variant according to the distance reported by 
the function. So, it seems that the feature is found because its 3' end 
is within the range although the 5' prime end, which is what I am 
interested in, is not. This somehow contradicts what the documentation 
says 
(https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491): 
"When looking beyond the boundaries of the source Feature, the distance 
is measured to the nearest end of that Feature to the nearby Feature's 
nearest end."

Any help will be much appreciated. I am happy to share code if you think 
it would be useful.

Thanks,
Asier

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190607/765628aa/attachment.html>


More information about the Dev mailing list