[ensembl-dev] VEP API returning 400 for memory error

Asier Gonzalez gonzaleza at ebi.ac.uk
Wed Aug 21 16:16:03 BST 2019


Hi Anja,

I understand that there may be some data issues, which is why I capture 
400 errors and flag them with a message in the output. I will ammend the 
code to retry when 400 errors occur as per your suggestion.

Just to finish, do you want me to send you a list of the variants that 
seem to have data issues?

Thank you,
Asier

On 21/08/2019 16:08, Anja Thormann wrote:
> Hi Asier,
>
> this is helpful. Apart from the memory errors you are actually seeing 
> genuine problems with the input data. For the example variant you have 
> sent we only have missing information and VEP cannot calculate any 
> consequences with the incomplete data. In our production pipeline we 
> are more permissive and try to calculate consequences even with 
> incomplete data. As a side note for our next release rs1057518506 has 
> all required alleles present and VEP will be able to calculate 
> consequences for it.
>
> To conclude, looping a few times over variants with 400 errors seems 
> to be the way to go for now. We will look into catching memory limit 
> errors better and return a more appropriate error code. But keep in 
> mind that there are variants for which VEP cannot caclulate 
> consequences for example because of missing or wrong allele or 
> location information.
>
> Best,
> Anja
>
>> On 21 Aug 2019, at 15:16, Asier Gonzalez <gonzaleza at ebi.ac.uk 
>> <mailto:gonzaleza at ebi.ac.uk>> wrote:
>>
>> Hi Anja,
>>
>> This is the part of the code that manages the queries: 
>> https://github.com/opentargets/snp_to_gene/blob/master/snp_assignment.pl#L153-L179. 
>> If you look further down you will see that it cannot be a 429 code as 
>> they are handled explicitly. I could add repeats in case of 400 codes 
>> if you think that it is the best solution. I have asked about the 
>> memory allocation error because that is the one that gets resolved if 
>> I try again but there are around 50 variants that always give a 400 
>> code even though they exist in Ensembl. One example is rs1057518506 
>> which is a deletion according to dbSNP 
>> <https://www.ncbi.nlm.nih.gov/snp/rs1057518506>, is described as a 
>> frameshift variant in Ensembl but it lacks alleles 
>> <https://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=17:7220304-7221313;v=rs1057518506;vdb=variation;vf=371667648> 
>> and the API returns a 400 code saying that the length of the 
>> reference allele is 0 
>> <https://rest.ensembl.org/vep/human/id/rs1057518506?content-type=application/json>. 
>> I am happy to share these cases with you if you are interested in them.
>>
>> I understand that using the POST endpoints would be a better 
>> solution, at least because we could use a single query to retrieve 
>> data about multiple variants thus reducing the burden on the API. 
>> However,I am afraid that this is a piece of code that we run twice 
>> every two months and it is not a priority for us to refactor it 
>> unless you have a good reason for me to convince my managers.
>>
>> Thank you,
>> Asier
>>
>> On 21/08/2019 14:54, Anja Thormann wrote:
>>> Hi Asier,
>>>
>>> I would like to take a look at your script please. I recommend for 
>>> the first part that you use our VEP POST endpoints for region 
>>> <https://rest.ensembl.org/documentation/info/vep_region_post> and id 
>>> <https://rest.ensembl.org/documentation/info/vep_id_post>. At this 
>>> point I recommend that you rerun your requests a few times on a 400 
>>> error and if the requests keep failing contact us with details 
>>> (variant id or region) of your failed requests. Can you rule out 
>>> that the error message has a 429 code due to too many requests?
>>>
>>> Thank you,
>>> Anja
>>>
>>>> On 21 Aug 2019, at 14:28, Asier Gonzalez <gonzaleza at ebi.ac.uk 
>>>> <mailto:gonzaleza at ebi.ac.uk>> wrote:
>>>>
>>>> Hello Anja,
>>>>
>>>> Thank you for your email. This is an script that calls both the id 
>>>> (/vep/human/id/) and region (/vep/human/region/) endpoints 
>>>> depending on whether the variant is defined by a rsID or a genomic 
>>>> coordinates. It calls the API thousands of times using GET 
>>>> requests, once per variant. I don't see any other settings but I 
>>>> can point you to the few code lines that control it on Github if 
>>>> you want to have a look yourself.
>>>>
>>>> Please let me know if I can help. I just need to know whether this 
>>>> is an expected behaviour as our script retries calling the API if 
>>>> the response is a 5XX error but it passes if it's a 400 as it 
>>>> should have been caused by the query and retrying should not make 
>>>> any difference.
>>>>
>>>> Best wishes,
>>>> Asier
>>>>
>>>> On 21/08/2019 13:39, Anja Thormann wrote:
>>>>> Dear Asier,
>>>>>
>>>>> thank you for your feedback. Could you please let me know which 
>>>>> VEP endpoint and settings you use? If you are using a POST 
>>>>> endpoint, how many variant ids or regions are you sending? I 
>>>>> assume that the problem happens at a point during the VEP 
>>>>> calculation where it is difficult to differentiate the cause of 
>>>>> the problem. But we will investigate further and hopefully be able 
>>>>> to provide a more accurate error report.
>>>>>
>>>>> Best wishes,
>>>>> Anja
>>>>>
>>>>>> On 21 Aug 2019, at 12:47, Asier Gonzalez <gonzaleza at ebi.ac.uk 
>>>>>> <mailto:gonzaleza at ebi.ac.uk>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have an script that calls the VEP API for a few thousand 
>>>>>> variants and does some processing. The script captures HTTP 
>>>>>> errors and messages and I have found a few cases where the 400 
>>>>>> error is accompanied by a  "ERROR: Cannot allocate memory" 
>>>>>> message. If I query the API again with the ids that produced that 
>>>>>> error I get a results so I understand that this is a temporary 
>>>>>> issue. I could handle the 400 errors further to control these 
>>>>>> cases but I wonder if it is an expected case or if it an issue as 
>>>>>> it sounds as this should be an 5XX error instead of a 400.
>>>>>>
>>>>>> Kind regards,
>>>>>> Asier
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>>> Posting guidelines and subscribe/unsubscribe info: 
>>>>>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>> Posting guidelines and subscribe/unsubscribe info: 
>>>>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190821/16ac94a8/attachment.html>


More information about the Dev mailing list