[ensembl-dev] VEP API returning 400 for memory error

Anja Thormann anja at ebi.ac.uk
Wed Aug 21 16:46:48 BST 2019


Hi Assier,

I’d like to get the the list of variants with data issues. However, I would be even more interested in the ones that fail because of hitting a memory limit if that is possible.

Thank you,
Anja

> On 21 Aug 2019, at 16:16, Asier Gonzalez <gonzaleza at ebi.ac.uk> wrote:
> 
> Hi Anja,
> 
> I understand that there may be some data issues, which is why I capture 400 errors and flag them with a message in the output. I will ammend the code to retry when 400 errors occur as per your suggestion.
> 
> Just to finish, do you want me to send you a list of the variants that seem to have data issues?
> 
> Thank you,
> Asier
> On 21/08/2019 16:08, Anja Thormann wrote:
>> Hi Asier,
>> 
>> this is helpful. Apart from the memory errors you are actually seeing genuine problems with the input data. For the example variant you have sent we only have missing information and VEP cannot calculate any consequences with the incomplete data. In our production pipeline we are more permissive and try to calculate consequences even with incomplete data. As a side note for our next release rs1057518506 has all required alleles present and VEP will be able to calculate consequences for it.
>> 
>> To conclude, looping a few times over variants with 400 errors seems to be the way to go for now. We will look into catching memory limit errors better and return a more appropriate error code. But keep in mind that there are variants for which VEP cannot caclulate consequences for example because of missing or wrong allele or location information.
>> 
>> Best,
>> Anja
>> 
>>> On 21 Aug 2019, at 15:16, Asier Gonzalez <gonzaleza at ebi.ac.uk <mailto:gonzaleza at ebi.ac.uk>> wrote:
>>> 
>>> Hi Anja,
>>> 
>>> This is the part of the code that manages the queries: https://github.com/opentargets/snp_to_gene/blob/master/snp_assignment.pl#L153-L179 <https://github.com/opentargets/snp_to_gene/blob/master/snp_assignment.pl#L153-L179>. If you look further down you will see that it cannot be a 429 code as they are handled explicitly. I could add repeats in case of 400 codes if you think that it is the best solution. I have asked about the memory allocation error because that is the one that gets resolved if I try again but there are around 50 variants that always give a 400 code even though they exist in Ensembl. One example is rs1057518506 which is a deletion according to dbSNP <https://www.ncbi.nlm.nih.gov/snp/rs1057518506>, is described as a frameshift variant in Ensembl but it lacks alleles <https://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=17:7220304-7221313;v=rs1057518506;vdb=variation;vf=371667648> and the API returns a 400 code saying that the length of the reference allele is 0 <https://rest.ensembl.org/vep/human/id/rs1057518506?content-type=application/json>. I am happy to share these cases with you if you are interested in them.
>>> I understand that using the POST endpoints would be a better solution, at least because we could use a single query to retrieve data about multiple variants thus reducing the burden on the API. However,I am afraid that this is a piece of code that we run twice every two months and it is not a priority for us to refactor it unless you have a good reason for me to convince my managers.
>>> 
>>> Thank you,
>>> Asier
>>> On 21/08/2019 14:54, Anja Thormann wrote:
>>>> Hi Asier,
>>>> 
>>>> I would like to take a look at your script please. I recommend for the first part that you use our VEP POST endpoints for region <https://rest.ensembl.org/documentation/info/vep_region_post> and id <https://rest.ensembl.org/documentation/info/vep_id_post>. At this point I recommend that you rerun your requests a few times on a 400 error and if the requests keep failing contact us with details (variant id or region) of your failed requests. Can you rule out that the error message has a 429 code due to too many requests?
>>>> 
>>>> Thank you,
>>>> Anja
>>>> 
>>>>> On 21 Aug 2019, at 14:28, Asier Gonzalez <gonzaleza at ebi.ac.uk <mailto:gonzaleza at ebi.ac.uk>> wrote:
>>>>> 
>>>>> Hello Anja,
>>>>> 
>>>>> Thank you for your email. This is an script that calls both the id (/vep/human/id/) and region (/vep/human/region/) endpoints depending on whether the variant is defined by a rsID or a genomic coordinates. It calls the API thousands of times using GET requests, once per variant. I don't see any other settings but I can point you to the few code lines that control it on Github if you want to have a look yourself.
>>>>> 
>>>>> Please let me know if I can help. I just need to know whether this is an expected behaviour as our script retries calling the API if the response is a 5XX error but it passes if it's a 400 as it should have been caused by the query and retrying should not make any difference.
>>>>> 
>>>>> Best wishes,
>>>>> Asier
>>>>> 
>>>>> On 21/08/2019 13:39, Anja Thormann wrote:
>>>>>> Dear Asier,
>>>>>> 
>>>>>> thank you for your feedback. Could you please let me know which VEP endpoint and settings you use? If you are using a POST endpoint, how many variant ids or regions are you sending? I assume that the problem happens at a point during the VEP calculation where it is difficult to differentiate the cause of the problem. But we will investigate further and hopefully be able to provide a more accurate error report.
>>>>>> 
>>>>>> Best wishes,
>>>>>> Anja
>>>>>> 
>>>>>>> On 21 Aug 2019, at 12:47, Asier Gonzalez <gonzaleza at ebi.ac.uk <mailto:gonzaleza at ebi.ac.uk>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I have an script that calls the VEP API for a few thousand variants and does some processing. The script captures HTTP errors and messages and I have found a few cases where the 400 error is accompanied by a  "ERROR: Cannot allocate memory" message. If I query the API again with the ids that produced that error I get a results so I understand that this is a temporary issue. I could handle the 400 errors further to control these cases but I wonder if it is an expected case or if it an issue as it sounds as this should be an 5XX error instead of a 400.
>>>>>>> 
>>>>>>> Kind regards,
>>>>>>> Asier
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>>>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org>
>>>>>>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org>
>>>>>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>>>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190821/5da6ede6/attachment.html>


More information about the Dev mailing list