[ensembl-dev] VEP API returning 400 for memory error
Asier Gonzalez
gonzaleza at ebi.ac.uk
Wed Aug 21 16:16:03 BST 2019
Hi Anja,
I understand that there may be some data issues, which is why I capture
400 errors and flag them with a message in the output. I will ammend the
code to retry when 400 errors occur as per your suggestion.
Just to finish, do you want me to send you a list of the variants that
seem to have data issues?
Thank you,
Asier
On 21/08/2019 16:08, Anja Thormann wrote:
> Hi Asier,
>
> this is helpful. Apart from the memory errors you are actually seeing
> genuine problems with the input data. For the example variant you have
> sent we only have missing information and VEP cannot calculate any
> consequences with the incomplete data. In our production pipeline we
> are more permissive and try to calculate consequences even with
> incomplete data. As a side note for our next release rs1057518506 has
> all required alleles present and VEP will be able to calculate
> consequences for it.
>
> To conclude, looping a few times over variants with 400 errors seems
> to be the way to go for now. We will look into catching memory limit
> errors better and return a more appropriate error code. But keep in
> mind that there are variants for which VEP cannot caclulate
> consequences for example because of missing or wrong allele or
> location information.
>
> Best,
> Anja
>
>> On 21 Aug 2019, at 15:16, Asier Gonzalez <gonzaleza at ebi.ac.uk
>> <mailto:gonzaleza at ebi.ac.uk>> wrote:
>>
>> Hi Anja,
>>
>> This is the part of the code that manages the queries:
>> https://github.com/opentargets/snp_to_gene/blob/master/snp_assignment.pl#L153-L179.
>> If you look further down you will see that it cannot be a 429 code as
>> they are handled explicitly. I could add repeats in case of 400 codes
>> if you think that it is the best solution. I have asked about the
>> memory allocation error because that is the one that gets resolved if
>> I try again but there are around 50 variants that always give a 400
>> code even though they exist in Ensembl. One example is rs1057518506
>> which is a deletion according to dbSNP
>> <https://www.ncbi.nlm.nih.gov/snp/rs1057518506>, is described as a
>> frameshift variant in Ensembl but it lacks alleles
>> <https://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=17:7220304-7221313;v=rs1057518506;vdb=variation;vf=371667648>
>> and the API returns a 400 code saying that the length of the
>> reference allele is 0
>> <https://rest.ensembl.org/vep/human/id/rs1057518506?content-type=application/json>.
>> I am happy to share these cases with you if you are interested in them.
>>
>> I understand that using the POST endpoints would be a better
>> solution, at least because we could use a single query to retrieve
>> data about multiple variants thus reducing the burden on the API.
>> However,I am afraid that this is a piece of code that we run twice
>> every two months and it is not a priority for us to refactor it
>> unless you have a good reason for me to convince my managers.
>>
>> Thank you,
>> Asier
>>
>> On 21/08/2019 14:54, Anja Thormann wrote:
>>> Hi Asier,
>>>
>>> I would like to take a look at your script please. I recommend for
>>> the first part that you use our VEP POST endpoints for region
>>> <https://rest.ensembl.org/documentation/info/vep_region_post> and id
>>> <https://rest.ensembl.org/documentation/info/vep_id_post>. At this
>>> point I recommend that you rerun your requests a few times on a 400
>>> error and if the requests keep failing contact us with details
>>> (variant id or region) of your failed requests. Can you rule out
>>> that the error message has a 429 code due to too many requests?
>>>
>>> Thank you,
>>> Anja
>>>
>>>> On 21 Aug 2019, at 14:28, Asier Gonzalez <gonzaleza at ebi.ac.uk
>>>> <mailto:gonzaleza at ebi.ac.uk>> wrote:
>>>>
>>>> Hello Anja,
>>>>
>>>> Thank you for your email. This is an script that calls both the id
>>>> (/vep/human/id/) and region (/vep/human/region/) endpoints
>>>> depending on whether the variant is defined by a rsID or a genomic
>>>> coordinates. It calls the API thousands of times using GET
>>>> requests, once per variant. I don't see any other settings but I
>>>> can point you to the few code lines that control it on Github if
>>>> you want to have a look yourself.
>>>>
>>>> Please let me know if I can help. I just need to know whether this
>>>> is an expected behaviour as our script retries calling the API if
>>>> the response is a 5XX error but it passes if it's a 400 as it
>>>> should have been caused by the query and retrying should not make
>>>> any difference.
>>>>
>>>> Best wishes,
>>>> Asier
>>>>
>>>> On 21/08/2019 13:39, Anja Thormann wrote:
>>>>> Dear Asier,
>>>>>
>>>>> thank you for your feedback. Could you please let me know which
>>>>> VEP endpoint and settings you use? If you are using a POST
>>>>> endpoint, how many variant ids or regions are you sending? I
>>>>> assume that the problem happens at a point during the VEP
>>>>> calculation where it is difficult to differentiate the cause of
>>>>> the problem. But we will investigate further and hopefully be able
>>>>> to provide a more accurate error report.
>>>>>
>>>>> Best wishes,
>>>>> Anja
>>>>>
>>>>>> On 21 Aug 2019, at 12:47, Asier Gonzalez <gonzaleza at ebi.ac.uk
>>>>>> <mailto:gonzaleza at ebi.ac.uk>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have an script that calls the VEP API for a few thousand
>>>>>> variants and does some processing. The script captures HTTP
>>>>>> errors and messages and I have found a few cases where the 400
>>>>>> error is accompanied by a "ERROR: Cannot allocate memory"
>>>>>> message. If I query the API again with the ids that produced that
>>>>>> error I get a results so I understand that this is a temporary
>>>>>> issue. I could handle the 400 errors further to control these
>>>>>> cases but I wonder if it is an expected case or if it an issue as
>>>>>> it sounds as this should be an 5XX error instead of a 400.
>>>>>>
>>>>>> Kind regards,
>>>>>> Asier
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190821/16ac94a8/attachment.html>
More information about the Dev
mailing list