[ensembl-dev] how to get original VCF file's POS from variant effect predictor output?

Michael Yourshaw myourshaw at ucla.edu
Wed Aug 3 21:01:42 BST 2011


In order to associate the output of the variant effect predictor back to the original VCF 4.0 file, I need to be able to determine the value of the POS field of the VCF file from data in the VEP output. How can I do this?

At the risk of revealing my ignorance of VCF format and algebra, I think the following works, but it depends on there never being a VCF where len(REF) == len(ALT) == 2 — I am not sure this is a safe assumption:

	get chromStart and chromEnd from the VEP Location field (chromEnd=chromStart if not chromEnd). Can’t use Uploaded variation, which might get turned into rs ID.

	if chromStart == chromEnd: #SNV or indel with len(REF) == 2
		POS = chromStart
	elif chromStart == chromEnd-1: # indel with len(REF) == 1 and len(ALT) > 2 ( if len(REF) == len(ALT) == 2 POS would be chromStart)
		POS = chromEnd
	else:
		POS = chromStart-1


ॐ

Michael Yourshaw
UCLA Geffen School of Medicine
Department of Human Genetics, Nelson Lab
695 Charles E Young Drive S
Gonda 5554
Los Angeles CA 90095-8348 USA
myourshaw at ucla.edu
970.691.8299

This message is intended only for the use of the addressee and may contain information that is PRIVILEGED and CONFIDENTIAL, and/or may contain ATTORNEY WORK PRODUCT. If you are not the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify us immediately. Thank you.





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20110803/6728476c/attachment.html>


More information about the Dev mailing list