[ensembl-dev] bug in the VEP annotation of VCFs with multiple individuals

Will McLaren wm2 at ebi.ac.uk
Mon Sep 2 14:09:22 BST 2013


Hi Duarte,

Yes, agreed, it will continue as is.

Will


On 2 September 2013 14:07, Duarte Molha <duartemolha at gmail.com> wrote:

> Dear Will
>
> Do you agree with my request or you think the default behaviour should
> continue?
> I just want to know so that I can plan accordingly. If this behavior is
> not to be changed I will need to make some changes to my own script to
> account for it. Thanks
>
> Duarte
>
>
> =========================
>      Duarte Miguel Paulo Molha
>          http://about.me/duarte
> =========================
>
>
> On Tue, Aug 27, 2013 at 4:05 PM, Duarte Molha <duartemolha at gmail.com>wrote:
>
>> Hi Will
>>
>> I now understand the issue.
>>
>> And no... I do not think you should add a no_consequence type, but I do
>> think that it should not print out anything for that sample, even with the
>> -most_severe flag on.
>>
>> Since it is a non-variant it should not output anything. without the
>> --filter_common out and the --most_severe flag on ,  the output should be 7
>> lines of annotation one for each sample that has that variation genotyped.
>>
>> Does this make sense? I rarely use the most-severe flag anyway, so it
>> does not affect me much, but it might induce people in error that are not
>> aware of this default behavior. In the case, as an example, this is a
>> non-coding exon variant and the output for that single sample might induce
>> errors of analysis for many people.
>>
>> Best regards
>>
>> Duarte
>>
>>
>> =========================
>>      Duarte Miguel Paulo Molha
>>          http://about.me/duarte
>> =========================
>>
>>
>> On Tue, Aug 27, 2013 at 3:28 PM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>
>>> Ah, sorry, I wasn't testing with --filter_common.
>>>
>>> However, if I add --filter_common, I don't see any lines in the output,
>>> and the VEP reports all 9 variants/individuals are being filtered
>>> out. rs4372192 overlaps and has a frequency of 0.08.
>>>
>>> Perhaps your API and/or script is out of date? Though I still don't see
>>> the problem even if I test it with v71...
>>>
>>> Will
>>>
>>>
>>> On 27 August 2013 15:03, Duarte Molha <duartemolha at gmail.com> wrote:
>>>
>>>> My script does not output any information for the remaining 8 samples.
>>>> If I understand correctly from your email, your script is outputting the
>>>> other samples.
>>>>
>>>> What might be causing the discrepancy?
>>>>
>>>> Best regards
>>>>
>>>> Duarte
>>>>
>>>>
>>>> =========================
>>>>      Duarte Miguel Paulo Molha
>>>>          http://about.me/duarte
>>>> =========================
>>>>
>>>>
>>>> On Tue, Aug 27, 2013 at 2:50 PM, Duarte Molha <Duarte.Molha at ogt.com>wrote:
>>>>
>>>>> My apologies Will, but I think you are missing a bit of my problem.***
>>>>> *
>>>>>
>>>>> ** **
>>>>>
>>>>> This not a non-variant variation for the remaining individuals/samples
>>>>> . Why are they not being output?****
>>>>>
>>>>> ** **
>>>>>
>>>>> Cheers****
>>>>>
>>>>>
>>>>> Duarte****
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> ** **
>>>>>
>>>>> *From:* dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] *On
>>>>> Behalf Of *Will McLaren
>>>>> *Sent:* 27 August 2013 14:45
>>>>> *To:* Ensembl developers list
>>>>> *Subject:* Re: [ensembl-dev] bug in the VEP annotation of VCFs with
>>>>> multiple individuals****
>>>>>
>>>>> ** **
>>>>>
>>>>> Hi Duarte,****
>>>>>
>>>>> ** **
>>>>>
>>>>> Thanks for raising this. There's an interesting quirk here which seems
>>>>> to be what you're looking at. However, I _do_ see lines of output for the
>>>>> other 8 individuals in the file.****
>>>>>
>>>>> ** **
>>>>>
>>>>> What is it you would expect to see for sample9? Would you expect that
>>>>> line to be excluded from the output?****
>>>>>
>>>>> ** **
>>>>>
>>>>> The reason it is shown is because you are using most_severe, which
>>>>> forces the VEP to give the most severe consequence per variant (which I
>>>>> would generally advise against using!) - when using --individual each
>>>>> individual/variant combination is considered as an independent variant.
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> The reason it is intergenic_variant is because that is the "default"
>>>>> consequence - since the locus is non-variant for sample9, it does not go
>>>>> through the consequence prediction, but because you are forcing it to be
>>>>> printed out with most_severe, the VEP has to default to using
>>>>> intergenic_variant.****
>>>>>
>>>>> ** **
>>>>>
>>>>> I could see two solutions - either excluding the line (since it is
>>>>> non-variant), or having some sort of "no consequence" type - which I am
>>>>> loathe to do as this doesn't fit in to our SO schema.****
>>>>>
>>>>> ** **
>>>>>
>>>>> Will****
>>>>>
>>>>> ** **
>>>>>
>>>>> On 27 August 2013 12:16, Duarte Molha <duartemolha at gmail.com> wrote:**
>>>>> **
>>>>>
>>>>> Dear Developers****
>>>>>
>>>>> ** **
>>>>>
>>>>> I believe there is another bug in the VEP when dealing with input VCFs
>>>>> with multiple individuals...****
>>>>>
>>>>> Please take a look at this VCF input and the corresponding output:****
>>>>>
>>>>> ** **
>>>>>
>>>>> INPUT VCF line:****
>>>>>
>>>>> ** **
>>>>>
>>>>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
>>>>> FORMAT  sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8
>>>>> sample9****
>>>>>
>>>>> 1       876499  .       A       G       2900.87 PASS
>>>>> AC=15;AF=0.938;AN=16;BaseQRankSum=1.636;DP=92;Dels=0.00;FS=0.000;HRun=6;HaplotypeScore=0.4159;MQ=59.36;MQ0=0;MQRankSum=1.274;QD=31.53;ReadPosRankSum=-0.482;SB=-1653.87;set=variant2
>>>>> GT:AD:DP:GQ:PL  1/1:0,9:9:24.07:303,24,0
>>>>> 1/1:0,10:10:27.09:365,27,0      0/1:5,4:9:99:104,0,166
>>>>> 1/1:0,7:7:18.04:220,18,0        1/1:0,16:16:39.13:534,39,0
>>>>> 1/1:0,12:12:30.10:407,30,0      1/1:0,14:14:39.13:535,39,0
>>>>> 1/1:0,15:15:36.12:483,36,0      ./.****
>>>>>
>>>>> ** **
>>>>>
>>>>> OUTPUT annotation file:****
>>>>>
>>>>> #Uploaded_variation     Location        Existing_variation
>>>>> Allele  ZYG     Gene    Feature Feature_type    Consequence     GMAF    IND
>>>>> ****
>>>>>
>>>>> 1_876499_A      1:876499        rs4372192       -       HOM
>>>>> -       -       -       intergenic_variant      A:0.0824         sample9
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> As you can see, the annotation output only contains 1 line and it is
>>>>> for the individual that has no genotype call (./.)****
>>>>>
>>>>> ** **
>>>>>
>>>>> Also, the variation name does not contain the ref/alt_allele
>>>>> information on the name as all other variations. I would expect if to be
>>>>> called 1_876499_A/G****
>>>>>
>>>>> ** **
>>>>>
>>>>> For reference here are the config options I used:****
>>>>>
>>>>> ** **
>>>>>
>>>>> host
>>>>> [internalserver]user
>>>>> [user]****
>>>>>
>>>>> password                                            [password]****
>>>>>
>>>>> db_version        72            ****
>>>>>
>>>>> port                                                       3306 ****
>>>>>
>>>>> species                                                 homo_sapiens**
>>>>> **
>>>>>
>>>>>  ****
>>>>>
>>>>> #######     runtime options  #############****
>>>>>
>>>>> buffer_size                                         40000****
>>>>>
>>>>> most_severe                     1****
>>>>>
>>>>> check_existing                  1****
>>>>>
>>>>> check_alleles                     1****
>>>>>
>>>>> individual                                             all****
>>>>>
>>>>> fork                                                        6****
>>>>>
>>>>>  verbose
>>>>> 1****
>>>>>
>>>>>  gmaf                                                      1****
>>>>>
>>>>> filter_common                  1****
>>>>>
>>>>> fields
>>>>> Uploaded_variation,Location,Existing_variation,Allele,ZYG,Gene,Feature,Feature_type,Consequence,GMAF,IND
>>>>> ****
>>>>>
>>>>> ** **
>>>>>
>>>>> #######     cache stuff   ############# ****
>>>>>
>>>>> cache                                                    1****
>>>>>
>>>>> dir_plugins
>>>>> /NGS_Test/vep_72_testing/Plugins/****
>>>>>
>>>>> dir_cache
>>>>> /ReferenceData/vep_cache****
>>>>>
>>>>> # cache_region_size       1MB****
>>>>>
>>>>> #offline                                                1****
>>>>>
>>>>> # skip_db_check                              1****
>>>>>
>>>>> ** **
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/****
>>>>>
>>>>> ** **
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130902/a4c1a0a9/attachment.html>


More information about the Dev mailing list