[ensembl-dev] bug in the VEP annotation of VCFs with multiple individuals

Duarte Molha Duarte.Molha at ogt.com
Mon Sep 2 14:10:15 BST 2013


Thanks Will

From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Will McLaren
Sent: 02 September 2013 14:09
To: Ensembl developers list
Subject: Re: [ensembl-dev] bug in the VEP annotation of VCFs with multiple individuals

Hi Duarte,

Yes, agreed, it will continue as is.

Will

On 2 September 2013 14:07, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
Dear Will

Do you agree with my request or you think the default behaviour should continue?
I just want to know so that I can plan accordingly. If this behavior is not to be changed I will need to make some changes to my own script to account for it. Thanks

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On Tue, Aug 27, 2013 at 4:05 PM, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
Hi Will

I now understand the issue.

And no... I do not think you should add a no_consequence type, but I do think that it should not print out anything for that sample, even with the -most_severe flag on.

Since it is a non-variant it should not output anything. without the --filter_common out and the --most_severe flag on ,  the output should be 7 lines of annotation one for each sample that has that variation genotyped.

Does this make sense? I rarely use the most-severe flag anyway, so it does not affect me much, but it might induce people in error that are not aware of this default behavior. In the case, as an example, this is a non-coding exon variant and the output for that single sample might induce errors of analysis for many people.

Best regards

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On Tue, Aug 27, 2013 at 3:28 PM, Will McLaren <wm2 at ebi.ac.uk<mailto:wm2 at ebi.ac.uk>> wrote:
Ah, sorry, I wasn't testing with --filter_common.

However, if I add --filter_common, I don't see any lines in the output, and the VEP reports all 9 variants/individuals are being filtered out. rs4372192 overlaps and has a frequency of 0.08.

Perhaps your API and/or script is out of date? Though I still don't see the problem even if I test it with v71...

Will

On 27 August 2013 15:03, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
My script does not output any information for the remaining 8 samples. If I understand correctly from your email, your script is outputting the other samples.

What might be causing the discrepancy?

Best regards

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================

On Tue, Aug 27, 2013 at 2:50 PM, Duarte Molha <Duarte.Molha at ogt.com<mailto:Duarte.Molha at ogt.com>> wrote:
My apologies Will, but I think you are missing a bit of my problem.

This not a non-variant variation for the remaining individuals/samples . Why are they not being output?

Cheers

Duarte



From: dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org> [mailto:dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org>] On Behalf Of Will McLaren
Sent: 27 August 2013 14:45
To: Ensembl developers list
Subject: Re: [ensembl-dev] bug in the VEP annotation of VCFs with multiple individuals

Hi Duarte,

Thanks for raising this. There's an interesting quirk here which seems to be what you're looking at. However, I _do_ see lines of output for the other 8 individuals in the file.

What is it you would expect to see for sample9? Would you expect that line to be excluded from the output?

The reason it is shown is because you are using most_severe, which forces the VEP to give the most severe consequence per variant (which I would generally advise against using!) - when using --individual each individual/variant combination is considered as an independent variant.

The reason it is intergenic_variant is because that is the "default" consequence - since the locus is non-variant for sample9, it does not go through the consequence prediction, but because you are forcing it to be printed out with most_severe, the VEP has to default to using intergenic_variant.

I could see two solutions - either excluding the line (since it is non-variant), or having some sort of "no consequence" type - which I am loathe to do as this doesn't fit in to our SO schema.

Will

On 27 August 2013 12:16, Duarte Molha <duartemolha at gmail.com<mailto:duartemolha at gmail.com>> wrote:
Dear Developers

I believe there is another bug in the VEP when dealing with input VCFs with multiple individuals...
Please take a look at this VCF input and the corresponding output:

INPUT VCF line:


#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9

1       876499  .       A       G       2900.87 PASS    AC=15;AF=0.938;AN=16;BaseQRankSum=1.636;DP=92;Dels=0.00;FS=0.000;HRun=6;HaplotypeScore=0.4159;MQ=59.36;MQ0=0;MQRankSum=1.274;QD=31.53;ReadPosRankSum=-0.482;SB=-1653.87;set=variant2    GT:AD:DP:GQ:PL  1/1:0,9:9:24.07:303,24,0        1/1:0,10:10:27.09:365,27,0      0/1:5,4:9:99:104,0,166  1/1:0,7:7:18.04:220,18,0        1/1:0,16:16:39.13:534,39,0      1/1:0,12:12:30.10:407,30,0      1/1:0,14:14:39.13:535,39,0      1/1:0,15:15:36.12:483,36,0      ./.

OUTPUT annotation file:

#Uploaded_variation     Location        Existing_variation      Allele  ZYG     Gene    Feature Feature_type    Consequence     GMAF    IND

1_876499_A      1:876499        rs4372192       -       HOM     -       -       -       intergenic_variant      A:0.0824         sample9

As you can see, the annotation output only contains 1 line and it is for the individual that has no genotype call (./.)

Also, the variation name does not contain the ref/alt_allele information on the name as all other variations. I would expect if to be called 1_876499_A/G

For reference here are the config options I used:


host                                                       [internalserver]user                                                       [user]

password                                            [password]

db_version        72

port                                                       3306

species                                                 homo_sapiens



#######     runtime options  #############

buffer_size                                         40000

most_severe                     1

check_existing                  1

check_alleles                     1

individual                                             all

fork                                                        6

 verbose                                                               1

 gmaf                                                      1

filter_common                  1

fields Uploaded_variation,Location,Existing_variation,Allele,ZYG,Gene,Feature,Feature_type,Consequence,GMAF,IND



#######     cache stuff   #############

cache                                                    1

dir_plugins                                          /NGS_Test/vep_72_testing/Plugins/

dir_cache                                            /ReferenceData/vep_cache

# cache_region_size       1MB

#offline                                                1

# skip_db_check                              1



_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/



_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130902/42f5e4fa/attachment.html>


More information about the Dev mailing list