[ensembl-dev] Unexpected behaviour of fetch_all_by_outward_search
Asier Gonzalez
gonzaleza at ebi.ac.uk
Wed Jun 26 11:03:17 BST 2019
Hi Brandon,
Thank you for your response. Do you have an idea of when it could be
fixed? I mean, are we talking about weeks or months? I use a tool that
calls this function at least every two months so I have amended the code
to do what I believe it is supposed to do. I could share it with you if
it would help you, or I could open a PR if you accept them. I understand
that you may have other priorities, but at least I want to make sure
that the future version will do what mine already does.
Best wishes,
Asier
On 26/06/2019 10:56, Brandon Walts wrote:
>
> Hi Asier
>
> We've had a chance to look into it and you are correct, this function
> is not working as described. As currently implemented, it will return
> more results than expected. It's on our list to fix, and we plan to
> get to it in the near future.
>
> Best
> -Brandon
>
> On 26/06/2019 09:51, Asier Gonzalez wrote:
>>
>> Hi Brandon,
>>
>> Do you have any updates about this?
>>
>> Thanks,
>> Asier
>>
>> On 07/06/2019 16:42, Brandon Walts wrote:
>>>
>>> Hi Asier
>>>
>>> Thanks for bringing this up. We will look into what's going on and
>>> see if there is a bug, if the documentation needs improvement, or both.
>>>
>>> Best
>>> -Brandon
>>>
>>> On 07/06/2019 13:47, Asier Gonzalez wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I'm troubleshooting a Perl tool that calls the Ensembl API with a
>>>> variant id and tries to find the gene with the closest 5' end
>>>> within a 500 kb window. The tool was written by a colleague and it
>>>> uses
>>>> Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetch_all_by_outward_search()
>>>> like this:
>>>>
>>>> my @gene_list_for_feature = @{$gene_adaptor->fetch_all_by_outward_search(
>>>> -FEATURE => $var_feature,
>>>> -RANGE =>10000,
>>>> -MAX_RANGE =>500000,
>>>> -LIMIT =>40,
>>>> -FIVE_PRIME =>1)};
>>>>
>>>> According to the documentation of this function
>>>> (http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1DBSQL_1_1BaseFeatureAdaptor.html#a76a51bc70828aaccb9435eda9a44b20a),
>>>> it "Searches for features within the suggested -RANGE, and if it
>>>> finds none, expands the search area until it satisfies -LIMIT or
>>>> hits -MAX_RANGE". My understanding is that in my case it should
>>>> search first in a 10 kb window and, if there are no genes,
>>>> progressively expand it to up to 500 kb unless it finds 40 features
>>>> before. However, this is not the behaviour I am seeing, the search
>>>> range grows like this: 10k, 20k, 60k, 240k and 1.20M. Is this a bug
>>>> or have I misundertood what it does?
>>>>
>>>> I have looked into the code of this subroutine
>>>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1441-L1469)
>>>> and the search window growths exponentially because it multiplies
>>>> the previous value instead of the initial value:
>>>>
>>>> [L1452] $search_range = $search_range * $factor;
>>>>
>>>> In addition, it is not true that it only expands the range if it
>>>> does not find any features in the initial window, which is obvious
>>>> from looking into the while statement:
>>>>
>>>> [L1451] while (scalar @results < $limit && $search_range <=
>>>> $max_range) {
>>>>
>>>> I am also confused by the fact that, apparently, the found features
>>>> only need to be partially within the range. For instance,
>>>> ENSG00000150394 (CDH8) is found with the above parameters although
>>>> its 5' prime end is 1,338,771 bp away from the variant according to
>>>> the distance reported by the function. So, it seems that the
>>>> feature is found because its 3' end is within the range although
>>>> the 5' prime end, which is what I am interested in, is not. This
>>>> somehow contradicts what the documentation says
>>>> (https://github.com/Ensembl/ensembl/blob/release/96/modules/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm#L1490-L1491):
>>>> "When looking beyond the boundaries of the source Feature, the
>>>> distance is measured to the nearest end of that Feature to the
>>>> nearby Feature's nearest end."
>>>>
>>>> Any help will be much appreciated. I am happy to share code if you
>>>> think it would be useful.
>>>>
>>>> Thanks,
>>>> Asier
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing listDev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>> Ensembl Blog:http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190626/f10d5e97/attachment.html>
More information about the Dev
mailing list