[ensembl-dev] VEP API and parsing multiallelic variants

Joseph A Prinz joseph.prinz at duke.edu
Thu Oct 19 14:35:43 BST 2017


Hi Will,

Now that you mention it, I see that this method has been added to the ExAC plugin. I did not realize it also returned minimized alleles--perfect!

Lots of thanks to you and the dev team!

Best,
Joey

________________________________________
From: William McLaren <wm2 at ebi.ac.uk>
Sent: Thursday, October 19, 2017 8:37:01 AM
To: Joseph A Prinz; Ensembl developers list
Subject: Re: [ensembl-dev] VEP API and parsing multiallelic variants

Hi Joey,

We’ve recently implemented a utility method get_matched_variant_alleles() [1] you can employ to do just this, and VEP in fact uses this same method to match alleles between pairs of variants potentially with different levels of minimisation etc. I would not recommend using split_variants() or similar as they are intended only for use within the VEP code.

Note that this is not a complete solution to the issue; if the variants fall in repetitive sequence, the method will miss these as it does not have the context of the surrounding sequence.

[2] and [3] shows how we implement it in a few places in the our code for reference.

Hope that helps

Will McLaren
Ensembl Variation


[1] : http://www.ensembl.org/info/docs/Doxygen/variation-api/classBio_1_1EnsEMBL_1_1Variation_1_1Utils_1_1Sequence.html#a82c641995b21a46e8d9af3af0b329753<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.org_info_docs_Doxygen_variation-2Dapi_classBio-5F1-5F1EnsEMBL-5F1-5F1Variation-5F1-5F1Utils-5F1-5F1Sequence.html-23a82c641995b21a46e8d9af3af0b329753&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=MyiOXAbvDPHjfcbbrGErkARYNzXyIRtvJJwraJIIOZY&m=2rLcZKvceGbMurIMZtJPVC8UWchZVbHSSvQkvx9ZakE&s=J52E_UZgU5oxDLIBQI-Bli0rCH081WQ_ku57JHJu7Lo&e=>
[2] : https://github.com/Ensembl/ensembl-vep/blob/release/90/modules/Bio/EnsEMBL/VEP/AnnotationType/Variation.pm#L125-L186<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ensembl_ensembl-2Dvep_blob_release_90_modules_Bio_EnsEMBL_VEP_AnnotationType_Variation.pm-23L125-2DL186&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=MyiOXAbvDPHjfcbbrGErkARYNzXyIRtvJJwraJIIOZY&m=2rLcZKvceGbMurIMZtJPVC8UWchZVbHSSvQkvx9ZakE&s=LdQ2eBwHpi_mQNq4xcZYdwoQMk_4TqUvmbz9u8DHL2I&e=>
[3] : https://github.com/Ensembl/ensembl-vep/blob/release/90/modules/Bio/EnsEMBL/VEP/AnnotationSource/File/VCF.pm#L275-L336<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Ensembl_ensembl-2Dvep_blob_release_90_modules_Bio_EnsEMBL_VEP_AnnotationSource_File_VCF.pm-23L275-2DL336&d=DwMFaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=MyiOXAbvDPHjfcbbrGErkARYNzXyIRtvJJwraJIIOZY&m=2rLcZKvceGbMurIMZtJPVC8UWchZVbHSSvQkvx9ZakE&s=6oklyAPbOEoVrldlTkGwjoS7DwOiHm1JyIAhTCUd7JE&e=>



On 18 October 2017 at 8:32:34 pm, Joseph A Prinz (joseph.prinz at duke.edu<mailto:joseph.prinz at duke.edu>) wrote:

Hi ensembl devs!

I am writing a plugin for VEP that will process a VCF file using the BaseVepTabixPlugin class.
My goal is to be able to match alleles between the output of VEP and the overlapping portions of the VCF file being processed by the plugin.

To this end, I have been using the ExAC plugin as a rough guide, and have am using Utils::VEP parse_line to parse the tabix results.
I have noticed that parse_line will not try to minimize variants that are multiallelic (this case is excluded by minimize_variants called by parse_line).
The private method split_variants looks like what I am looking for, but the only public method evoking it is get_all_consequences and this may be cumbersome for a large VCF file.

What would be the most efficient way to leverage the API to transform alleles of a VCF to be comparable with those being produced by VEP?
Also, would you consider adding split_variants as an optional parameter to parse_line?

Thank you!
Joey
_______________________________________________
Dev mailing list Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



More information about the Dev mailing list