[ensembl-dev] bug in the VEP annotation of VCFs with multiple individuals

Duarte Molha duartemolha at gmail.com
Mon Sep 2 14:07:48 BST 2013


Dear Will

Do you agree with my request or you think the default behaviour should
continue?
I just want to know so that I can plan accordingly. If this behavior is not
to be changed I will need to make some changes to my own script to account
for it. Thanks

Duarte


=========================
     Duarte Miguel Paulo Molha
         http://about.me/duarte
=========================


On Tue, Aug 27, 2013 at 4:05 PM, Duarte Molha <duartemolha at gmail.com> wrote:

> Hi Will
>
> I now understand the issue.
>
> And no... I do not think you should add a no_consequence type, but I do
> think that it should not print out anything for that sample, even with the
> -most_severe flag on.
>
> Since it is a non-variant it should not output anything. without the
> --filter_common out and the --most_severe flag on ,  the output should be 7
> lines of annotation one for each sample that has that variation genotyped.
>
> Does this make sense? I rarely use the most-severe flag anyway, so it does
> not affect me much, but it might induce people in error that are not aware
> of this default behavior. In the case, as an example, this is a non-coding
> exon variant and the output for that single sample might induce errors of
> analysis for many people.
>
> Best regards
>
> Duarte
>
>
> =========================
>      Duarte Miguel Paulo Molha
>          http://about.me/duarte
> =========================
>
>
> On Tue, Aug 27, 2013 at 3:28 PM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>
>> Ah, sorry, I wasn't testing with --filter_common.
>>
>> However, if I add --filter_common, I don't see any lines in the output,
>> and the VEP reports all 9 variants/individuals are being filtered
>> out. rs4372192 overlaps and has a frequency of 0.08.
>>
>> Perhaps your API and/or script is out of date? Though I still don't see
>> the problem even if I test it with v71...
>>
>> Will
>>
>>
>> On 27 August 2013 15:03, Duarte Molha <duartemolha at gmail.com> wrote:
>>
>>> My script does not output any information for the remaining 8 samples.
>>> If I understand correctly from your email, your script is outputting the
>>> other samples.
>>>
>>> What might be causing the discrepancy?
>>>
>>> Best regards
>>>
>>> Duarte
>>>
>>>
>>> =========================
>>>      Duarte Miguel Paulo Molha
>>>          http://about.me/duarte
>>> =========================
>>>
>>>
>>> On Tue, Aug 27, 2013 at 2:50 PM, Duarte Molha <Duarte.Molha at ogt.com>wrote:
>>>
>>>> My apologies Will, but I think you are missing a bit of my problem.****
>>>>
>>>> ** **
>>>>
>>>> This not a non-variant variation for the remaining individuals/samples
>>>> . Why are they not being output?****
>>>>
>>>> ** **
>>>>
>>>> Cheers****
>>>>
>>>>
>>>> Duarte****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> *From:* dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] *On
>>>> Behalf Of *Will McLaren
>>>> *Sent:* 27 August 2013 14:45
>>>> *To:* Ensembl developers list
>>>> *Subject:* Re: [ensembl-dev] bug in the VEP annotation of VCFs with
>>>> multiple individuals****
>>>>
>>>> ** **
>>>>
>>>> Hi Duarte,****
>>>>
>>>> ** **
>>>>
>>>> Thanks for raising this. There's an interesting quirk here which seems
>>>> to be what you're looking at. However, I _do_ see lines of output for the
>>>> other 8 individuals in the file.****
>>>>
>>>> ** **
>>>>
>>>> What is it you would expect to see for sample9? Would you expect that
>>>> line to be excluded from the output?****
>>>>
>>>> ** **
>>>>
>>>> The reason it is shown is because you are using most_severe, which
>>>> forces the VEP to give the most severe consequence per variant (which I
>>>> would generally advise against using!) - when using --individual each
>>>> individual/variant combination is considered as an independent variant.
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> The reason it is intergenic_variant is because that is the "default"
>>>> consequence - since the locus is non-variant for sample9, it does not go
>>>> through the consequence prediction, but because you are forcing it to be
>>>> printed out with most_severe, the VEP has to default to using
>>>> intergenic_variant.****
>>>>
>>>> ** **
>>>>
>>>> I could see two solutions - either excluding the line (since it is
>>>> non-variant), or having some sort of "no consequence" type - which I am
>>>> loathe to do as this doesn't fit in to our SO schema.****
>>>>
>>>> ** **
>>>>
>>>> Will****
>>>>
>>>> ** **
>>>>
>>>> On 27 August 2013 12:16, Duarte Molha <duartemolha at gmail.com> wrote:***
>>>> *
>>>>
>>>> Dear Developers****
>>>>
>>>> ** **
>>>>
>>>> I believe there is another bug in the VEP when dealing with input VCFs
>>>> with multiple individuals...****
>>>>
>>>> Please take a look at this VCF input and the corresponding output:****
>>>>
>>>> ** **
>>>>
>>>> INPUT VCF line:****
>>>>
>>>> ** **
>>>>
>>>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
>>>> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
>>>> ****
>>>>
>>>> 1       876499  .       A       G       2900.87 PASS
>>>> AC=15;AF=0.938;AN=16;BaseQRankSum=1.636;DP=92;Dels=0.00;FS=0.000;HRun=6;HaplotypeScore=0.4159;MQ=59.36;MQ0=0;MQRankSum=1.274;QD=31.53;ReadPosRankSum=-0.482;SB=-1653.87;set=variant2
>>>> GT:AD:DP:GQ:PL  1/1:0,9:9:24.07:303,24,0
>>>> 1/1:0,10:10:27.09:365,27,0      0/1:5,4:9:99:104,0,166
>>>> 1/1:0,7:7:18.04:220,18,0        1/1:0,16:16:39.13:534,39,0
>>>> 1/1:0,12:12:30.10:407,30,0      1/1:0,14:14:39.13:535,39,0
>>>> 1/1:0,15:15:36.12:483,36,0      ./.****
>>>>
>>>> ** **
>>>>
>>>> OUTPUT annotation file:****
>>>>
>>>> #Uploaded_variation     Location        Existing_variation      Allele
>>>> ZYG     Gene    Feature Feature_type    Consequence     GMAF    IND****
>>>>
>>>> 1_876499_A      1:876499        rs4372192       -       HOM     -
>>>> -       -       intergenic_variant      A:0.0824         sample9****
>>>>
>>>> ** **
>>>>
>>>> As you can see, the annotation output only contains 1 line and it is
>>>> for the individual that has no genotype call (./.)****
>>>>
>>>> ** **
>>>>
>>>> Also, the variation name does not contain the ref/alt_allele
>>>> information on the name as all other variations. I would expect if to be
>>>> called 1_876499_A/G****
>>>>
>>>> ** **
>>>>
>>>> For reference here are the config options I used:****
>>>>
>>>> ** **
>>>>
>>>> host
>>>> [internalserver]user
>>>> [user]****
>>>>
>>>> password                                            [password]****
>>>>
>>>> db_version        72            ****
>>>>
>>>> port                                                       3306 ****
>>>>
>>>> species                                                 homo_sapiens***
>>>> *
>>>>
>>>>  ****
>>>>
>>>> #######     runtime options  #############****
>>>>
>>>> buffer_size                                         40000****
>>>>
>>>> most_severe                     1****
>>>>
>>>> check_existing                  1****
>>>>
>>>> check_alleles                     1****
>>>>
>>>> individual                                             all****
>>>>
>>>> fork                                                        6****
>>>>
>>>>  verbose                                                               1
>>>> ****
>>>>
>>>>  gmaf                                                      1****
>>>>
>>>> filter_common                  1****
>>>>
>>>> fields
>>>> Uploaded_variation,Location,Existing_variation,Allele,ZYG,Gene,Feature,Feature_type,Consequence,GMAF,IND
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> #######     cache stuff   ############# ****
>>>>
>>>> cache                                                    1****
>>>>
>>>> dir_plugins
>>>> /NGS_Test/vep_72_testing/Plugins/****
>>>>
>>>> dir_cache
>>>> /ReferenceData/vep_cache****
>>>>
>>>> # cache_region_size       1MB****
>>>>
>>>> #offline                                                1****
>>>>
>>>> # skip_db_check                              1****
>>>>
>>>> ** **
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/****
>>>>
>>>> ** **
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130902/51d22517/attachment.html>


More information about the Dev mailing list