[ensembl-dev] bug in the VEP annotation of VCFs with multiple individuals

Duarte Molha duartemolha at gmail.com
Tue Aug 27 15:03:20 BST 2013


My script does not output any information for the remaining 8 samples. If I
understand correctly from your email, your script is outputting the other
samples.

What might be causing the discrepancy?

Best regards

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================


On Tue, Aug 27, 2013 at 2:50 PM, Duarte Molha <Duarte.Molha at ogt.com> wrote:

> My apologies Will, but I think you are missing a bit of my problem.****
>
> ** **
>
> This not a non-variant variation for the remaining individuals/samples .
> Why are they not being output?****
>
> ** **
>
> Cheers****
>
>
> Duarte****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] *On
> Behalf Of *Will McLaren
> *Sent:* 27 August 2013 14:45
> *To:* Ensembl developers list
> *Subject:* Re: [ensembl-dev] bug in the VEP annotation of VCFs with
> multiple individuals****
>
> ** **
>
> Hi Duarte,****
>
> ** **
>
> Thanks for raising this. There's an interesting quirk here which seems to
> be what you're looking at. However, I _do_ see lines of output for the
> other 8 individuals in the file.****
>
> ** **
>
> What is it you would expect to see for sample9? Would you expect that line
> to be excluded from the output?****
>
> ** **
>
> The reason it is shown is because you are using most_severe, which forces
> the VEP to give the most severe consequence per variant (which I would
> generally advise against using!) - when using --individual each
> individual/variant combination is considered as an independent variant.***
> *
>
> ** **
>
> The reason it is intergenic_variant is because that is the "default"
> consequence - since the locus is non-variant for sample9, it does not go
> through the consequence prediction, but because you are forcing it to be
> printed out with most_severe, the VEP has to default to using
> intergenic_variant.****
>
> ** **
>
> I could see two solutions - either excluding the line (since it is
> non-variant), or having some sort of "no consequence" type - which I am
> loathe to do as this doesn't fit in to our SO schema.****
>
> ** **
>
> Will****
>
> ** **
>
> On 27 August 2013 12:16, Duarte Molha <duartemolha at gmail.com> wrote:****
>
> Dear Developers****
>
> ** **
>
> I believe there is another bug in the VEP when dealing with input VCFs
> with multiple individuals...****
>
> Please take a look at this VCF input and the corresponding output:****
>
> ** **
>
> INPUT VCF line:****
>
> ** **
>
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9***
> *
>
> 1       876499  .       A       G       2900.87 PASS
> AC=15;AF=0.938;AN=16;BaseQRankSum=1.636;DP=92;Dels=0.00;FS=0.000;HRun=6;HaplotypeScore=0.4159;MQ=59.36;MQ0=0;MQRankSum=1.274;QD=31.53;ReadPosRankSum=-0.482;SB=-1653.87;set=variant2
> GT:AD:DP:GQ:PL  1/1:0,9:9:24.07:303,24,0
> 1/1:0,10:10:27.09:365,27,0      0/1:5,4:9:99:104,0,166
> 1/1:0,7:7:18.04:220,18,0        1/1:0,16:16:39.13:534,39,0
> 1/1:0,12:12:30.10:407,30,0      1/1:0,14:14:39.13:535,39,0
> 1/1:0,15:15:36.12:483,36,0      ./.****
>
> ** **
>
> OUTPUT annotation file:****
>
> #Uploaded_variation     Location        Existing_variation      Allele
> ZYG     Gene    Feature Feature_type    Consequence     GMAF    IND****
>
> 1_876499_A      1:876499        rs4372192       -       HOM     -
> -       -       intergenic_variant      A:0.0824         sample9****
>
> ** **
>
> As you can see, the annotation output only contains 1 line and it is for
> the individual that has no genotype call (./.)****
>
> ** **
>
> Also, the variation name does not contain the ref/alt_allele information
> on the name as all other variations. I would expect if to be called
> 1_876499_A/G****
>
> ** **
>
> For reference here are the config options I used:****
>
> ** **
>
> host
> [internalserver]user
> [user]****
>
> password                                            [password]****
>
> db_version        72            ****
>
> port                                                       3306 ****
>
> species                                                 homo_sapiens****
>
>  ****
>
> #######     runtime options  #############****
>
> buffer_size                                         40000****
>
> most_severe                     1****
>
> check_existing                  1****
>
> check_alleles                     1****
>
> individual                                             all****
>
> fork                                                        6****
>
>  verbose                                                               1**
> **
>
>  gmaf                                                      1****
>
> filter_common                  1****
>
> fields
> Uploaded_variation,Location,Existing_variation,Allele,ZYG,Gene,Feature,Feature_type,Consequence,GMAF,IND
> ****
>
> ** **
>
> #######     cache stuff   ############# ****
>
> cache                                                    1****
>
> dir_plugins
> /NGS_Test/vep_72_testing/Plugins/****
>
> dir_cache
> /ReferenceData/vep_cache****
>
> # cache_region_size       1MB****
>
> #offline                                                1****
>
> # skip_db_check                              1****
>
> ** **
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/****
>
> ** **
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130827/12c0fc9f/attachment.html>


More information about the Dev mailing list