[ensembl-dev] Unexpected behaviour of fetch_all_by_outward_search
Asier Gonzalez
gonzaleza at ebi.ac.uk
Wed Jun 26 09:51:35 BST 2019
Hi Brandon,
Do you have any updates about this?
Thanks,
Asier
On 07/06/2019 16:42, Brandon Walts wrote:
>
> Hi Asier
>
> Thanks for bringing this up. We will look into what's going on and see
> if there is a bug, if the documentation needs improvement, or both.
>
> Best
> -Brandon
>
> On 07/06/2019 13:47, Asier Gonzalez wrote:
>>
>> Hi all,
>>
>> I'm troubleshooting a Perl tool that calls the Ensembl API with a
>> variant id and tries to find the gene with the closest 5' end within
>> a 500 kb window. The tool was written by a colleague and it uses
>> Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search()
>> like this:
>>
>> my @gene_list_for_feature = @{$gene_adaptor->fetch_all_by_outward_search(
>> -FEATURE => $var_feature,
>> -RANGE =>10000,
>> -MAX_RANGE =>500000,
>> -LIMIT =>40,
>> -FIVE_PRIME =>1)};
>>
>> According to the documentation of this function
>> (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a),
>> it "Searches for features within the suggested -RANGE, and if it
>> finds none, expands the search area until it satisfies -LIMIT or hits
>> -MAX_RANGE". My understanding is that in my case it should search
>> first in a 10 kb window and, if there are no genes, progressively
>> expand it to up to 500 kb unless it finds 40 features before.
>> However, this is not the behaviour I am seeing, the search range
>> grows like this: 10k, 20k, 60k, 240k and 1.20M. Is this a bug or have
>> I misundertood what it does?
>>
>> I have looked into the code of this subroutine
>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469)
>> and the search window growths exponentially because it multiplies the
>> previous value instead of the initial value:
>>
>> [L1452] $search_range = $search_range * $factor;
>>
>> In addition, it is not true that it only expands the range if it does
>> not find any features in the initial window, which is obvious from
>> looking into the while statement:
>>
>> [L1451] while (scalar @results < $limit && $search_range <= $max_range) {
>>
>> I am also confused by the fact that, apparently, the found features
>> only need to be partially within the range. For instance,
>> ENSG00000150394 (CDH8) is found with the above parameters although
>> its 5' prime end is 1,338,771 bp away from the variant according to
>> the distance reported by the function. So, it seems that the feature
>> is found because its 3' end is within the range although the 5' prime
>> end, which is what I am interested in, is not. This somehow
>> contradicts what the documentation says
>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491):
>> "When looking beyond the boundaries of the source Feature, the
>> distance is measured to the nearest end of that Feature to the nearby
>> Feature's nearest end."
>>
>> Any help will be much appreciated. I am happy to share code if you
>> think it would be useful.
>>
>> Thanks,
>> Asier
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog:http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190626/c41d6514/attachment.html>
More information about the Dev
mailing list