[ensembl-dev] bug in the VEP annotation of VCFs with multiple individuals
Duarte Molha
Duarte.Molha at ogt.com
Tue Aug 27 14:50:06 BST 2013
My apologies Will, but I think you are missing a bit of my problem.
This not a non-variant variation for the remaining individuals/samples . Why are they not being output?
Cheers
Duarte
From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Will McLaren
Sent: 27 August 2013 14:45
To: Ensembl developers list
Subject: Re: [ensembl-dev] bug in the VEP annotation of VCFs with multiple individuals
Hi Duarte,
Thanks for raising this. There's an interesting quirk here which seems to be what you're looking at. However, I _do_ see lines of output for the other 8 individuals in the file.
What is it you would expect to see for sample9? Would you expect that line to be excluded from the output?
The reason it is shown is because you are using most_severe, which forces the VEP to give the most severe consequence per variant (which I would generally advise against using!) - when using --individual each individual/variant combination is considered as an independent variant.
The reason it is intergenic_variant is because that is the "default" consequence - since the locus is non-variant for sample9, it does not go through the consequence prediction, but because you are forcing it to be printed out with most_severe, the VEP has to default to using intergenic_variant.
I could see two solutions - either excluding the line (since it is non-variant), or having some sort of "no consequence" type - which I am loathe to do as this doesn't fit in to our SO schema.
Will
On 27 August 2013 12:16, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
Dear Developers
I believe there is another bug in the VEP when dealing with input VCFs with multiple individuals...
Please take a look at this VCF input and the corresponding output:
INPUT VCF line:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
1 876499 . A G 2900.87 PASS AC=15;AF=0.938;AN=16;BaseQRankSum=1.636;DP=92;Dels=0.00;FS=0.000;HRun=6;HaplotypeScore=0.4159;MQ=59.36;MQ0=0;MQRankSum=1.274;QD=31.53;ReadPosRankSum=-0.482;SB=-1653.87;set=variant2 GT:AD:DP:GQ:PL 1/1:0,9:9:24.07:303,24,0 1/1:0,10:10:27.09:365,27,0 0/1:5,4:9:99:104,0,166 1/1:0,7:7:18.04:220,18,0 1/1:0,16:16:39.13:534,39,0 1/1:0,12:12:30.10:407,30,0 1/1:0,14:14:39.13:535,39,0 1/1:0,15:15:36.12:483,36,0 ./.
OUTPUT annotation file:
#Uploaded_variation Location Existing_variation Allele ZYG Gene Feature Feature_type Consequence GMAF IND
1_876499_A 1:876499 rs4372192 - HOM - - - intergenic_variant A:0.0824 sample9
As you can see, the annotation output only contains 1 line and it is for the individual that has no genotype call (./.)
Also, the variation name does not contain the ref/alt_allele information on the name as all other variations. I would expect if to be called 1_876499_A/G
For reference here are the config options I used:
host [internalserver]user [user]
password [password]
db_version 72
port 3306
species homo_sapiens
####### runtime options #############
buffer_size 40000
most_severe 1
check_existing 1
check_alleles 1
individual all
fork 6
verbose 1
gmaf 1
filter_common 1
fields Uploaded_variation,Location,Existing_variation,Allele,ZYG,Gene,Feature,Feature_type,Consequence,GMAF,IND
####### cache stuff #############
cache 1
dir_plugins /NGS_Test/vep_72_testing/Plugins/
dir_cache /ReferenceData/vep_cache
# cache_region_size 1MB
#offline 1
# skip_db_check 1
_______________________________________________
Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130827/20f5373c/attachment.html>
More information about the Dev
mailing list