[ensembl-dev] bug in the VEP annotation of VCFs with multiple individuals
Will McLaren
wm2 at ebi.ac.uk
Tue Aug 27 15:28:17 BST 2013
Ah, sorry, I wasn't testing with --filter_common.
However, if I add --filter_common, I don't see any lines in the output, and
the VEP reports all 9 variants/individuals are being filtered
out. rs4372192 overlaps and has a frequency of 0.08.
Perhaps your API and/or script is out of date? Though I still don't see the
problem even if I test it with v71...
Will
On 27 August 2013 15:03, Duarte Molha <duartemolha at gmail.com> wrote:
> My script does not output any information for the remaining 8 samples. If
> I understand correctly from your email, your script is outputting the other
> samples.
>
> What might be causing the discrepancy?
>
> Best regards
>
> Duarte
>
>
> =========================
> Duarte Miguel Paulo Molha
> http://about.me/duarte
> =========================
>
>
> On Tue, Aug 27, 2013 at 2:50 PM, Duarte Molha <Duarte.Molha at ogt.com>wrote:
>
>> My apologies Will, but I think you are missing a bit of my problem.****
>>
>> ** **
>>
>> This not a non-variant variation for the remaining individuals/samples .
>> Why are they not being output?****
>>
>> ** **
>>
>> Cheers****
>>
>>
>> Duarte****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] *On
>> Behalf Of *Will McLaren
>> *Sent:* 27 August 2013 14:45
>> *To:* Ensembl developers list
>> *Subject:* Re: [ensembl-dev] bug in the VEP annotation of VCFs with
>> multiple individuals****
>>
>> ** **
>>
>> Hi Duarte,****
>>
>> ** **
>>
>> Thanks for raising this. There's an interesting quirk here which seems to
>> be what you're looking at. However, I _do_ see lines of output for the
>> other 8 individuals in the file.****
>>
>> ** **
>>
>> What is it you would expect to see for sample9? Would you expect that
>> line to be excluded from the output?****
>>
>> ** **
>>
>> The reason it is shown is because you are using most_severe, which forces
>> the VEP to give the most severe consequence per variant (which I would
>> generally advise against using!) - when using --individual each
>> individual/variant combination is considered as an independent variant.**
>> **
>>
>> ** **
>>
>> The reason it is intergenic_variant is because that is the "default"
>> consequence - since the locus is non-variant for sample9, it does not go
>> through the consequence prediction, but because you are forcing it to be
>> printed out with most_severe, the VEP has to default to using
>> intergenic_variant.****
>>
>> ** **
>>
>> I could see two solutions - either excluding the line (since it is
>> non-variant), or having some sort of "no consequence" type - which I am
>> loathe to do as this doesn't fit in to our SO schema.****
>>
>> ** **
>>
>> Will****
>>
>> ** **
>>
>> On 27 August 2013 12:16, Duarte Molha <duartemolha at gmail.com> wrote:****
>>
>> Dear Developers****
>>
>> ** **
>>
>> I believe there is another bug in the VEP when dealing with input VCFs
>> with multiple individuals...****
>>
>> Please take a look at this VCF input and the corresponding output:****
>>
>> ** **
>>
>> INPUT VCF line:****
>>
>> ** **
>>
>> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
>> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9**
>> **
>>
>> 1 876499 . A G 2900.87 PASS
>> AC=15;AF=0.938;AN=16;BaseQRankSum=1.636;DP=92;Dels=0.00;FS=0.000;HRun=6;HaplotypeScore=0.4159;MQ=59.36;MQ0=0;MQRankSum=1.274;QD=31.53;ReadPosRankSum=-0.482;SB=-1653.87;set=variant2
>> GT:AD:DP:GQ:PL 1/1:0,9:9:24.07:303,24,0
>> 1/1:0,10:10:27.09:365,27,0 0/1:5,4:9:99:104,0,166
>> 1/1:0,7:7:18.04:220,18,0 1/1:0,16:16:39.13:534,39,0
>> 1/1:0,12:12:30.10:407,30,0 1/1:0,14:14:39.13:535,39,0
>> 1/1:0,15:15:36.12:483,36,0 ./.****
>>
>> ** **
>>
>> OUTPUT annotation file:****
>>
>> #Uploaded_variation Location Existing_variation Allele
>> ZYG Gene Feature Feature_type Consequence GMAF IND****
>>
>> 1_876499_A 1:876499 rs4372192 - HOM -
>> - - intergenic_variant A:0.0824 sample9****
>>
>> ** **
>>
>> As you can see, the annotation output only contains 1 line and it is for
>> the individual that has no genotype call (./.)****
>>
>> ** **
>>
>> Also, the variation name does not contain the ref/alt_allele information
>> on the name as all other variations. I would expect if to be called
>> 1_876499_A/G****
>>
>> ** **
>>
>> For reference here are the config options I used:****
>>
>> ** **
>>
>> host
>> [internalserver]user
>> [user]****
>>
>> password [password]****
>>
>> db_version 72 ****
>>
>> port 3306 ****
>>
>> species homo_sapiens****
>>
>> ****
>>
>> ####### runtime options #############****
>>
>> buffer_size 40000****
>>
>> most_severe 1****
>>
>> check_existing 1****
>>
>> check_alleles 1****
>>
>> individual all****
>>
>> fork 6****
>>
>> verbose 1*
>> ***
>>
>> gmaf 1****
>>
>> filter_common 1****
>>
>> fields
>> Uploaded_variation,Location,Existing_variation,Allele,ZYG,Gene,Feature,Feature_type,Consequence,GMAF,IND
>> ****
>>
>> ** **
>>
>> ####### cache stuff ############# ****
>>
>> cache 1****
>>
>> dir_plugins
>> /NGS_Test/vep_72_testing/Plugins/****
>>
>> dir_cache
>> /ReferenceData/vep_cache****
>>
>> # cache_region_size 1MB****
>>
>> #offline 1****
>>
>> # skip_db_check 1****
>>
>> ** **
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/****
>>
>> ** **
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130827/b5c19d2b/attachment.html>
More information about the Dev
mailing list