[ensembl-dev] bug in the VEP annotation of VCFs with multiple individuals

Will McLaren wm2 at ebi.ac.uk
Tue Aug 27 15:28:17 BST 2013


Ah, sorry, I wasn't testing with --filter_common.

However, if I add --filter_common, I don't see any lines in the output, and
the VEP reports all 9 variants/individuals are being filtered
out. rs4372192 overlaps and has a frequency of 0.08.

Perhaps your API and/or script is out of date? Though I still don't see the
problem even if I test it with v71...

Will


On 27 August 2013 15:03, Duarte Molha <duartemolha at gmail.com> wrote:

> My script does not output any information for the remaining 8 samples. If
> I understand correctly from your email, your script is outputting the other
> samples.
>
> What might be causing the discrepancy?
>
> Best regards
>
> Duarte
>
>
> =========================
>      Duarte Miguel Paulo Molha
>          http://about.me/duarte
> =========================
>
>
> On Tue, Aug 27, 2013 at 2:50 PM, Duarte Molha <Duarte.Molha at ogt.com>wrote:
>
>> My apologies Will, but I think you are missing a bit of my problem.****
>>
>> ** **
>>
>> This not a non-variant variation for the remaining individuals/samples .
>> Why are they not being output?****
>>
>> ** **
>>
>> Cheers****
>>
>>
>> Duarte****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> *From:* dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] *On
>> Behalf Of *Will McLaren
>> *Sent:* 27 August 2013 14:45
>> *To:* Ensembl developers list
>> *Subject:* Re: [ensembl-dev] bug in the VEP annotation of VCFs with
>> multiple individuals****
>>
>> ** **
>>
>> Hi Duarte,****
>>
>> ** **
>>
>> Thanks for raising this. There's an interesting quirk here which seems to
>> be what you're looking at. However, I _do_ see lines of output for the
>> other 8 individuals in the file.****
>>
>> ** **
>>
>> What is it you would expect to see for sample9? Would you expect that
>> line to be excluded from the output?****
>>
>> ** **
>>
>> The reason it is shown is because you are using most_severe, which forces
>> the VEP to give the most severe consequence per variant (which I would
>> generally advise against using!) - when using --individual each
>> individual/variant combination is considered as an independent variant.**
>> **
>>
>> ** **
>>
>> The reason it is intergenic_variant is because that is the "default"
>> consequence - since the locus is non-variant for sample9, it does not go
>> through the consequence prediction, but because you are forcing it to be
>> printed out with most_severe, the VEP has to default to using
>> intergenic_variant.****
>>
>> ** **
>>
>> I could see two solutions - either excluding the line (since it is
>> non-variant), or having some sort of "no consequence" type - which I am
>> loathe to do as this doesn't fit in to our SO schema.****
>>
>> ** **
>>
>> Will****
>>
>> ** **
>>
>> On 27 August 2013 12:16, Duarte Molha <duartemolha at gmail.com> wrote:****
>>
>> Dear Developers****
>>
>> ** **
>>
>> I believe there is another bug in the VEP when dealing with input VCFs
>> with multiple individuals...****
>>
>> Please take a look at this VCF input and the corresponding output:****
>>
>> ** **
>>
>> INPUT VCF line:****
>>
>> ** **
>>
>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
>> sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9**
>> **
>>
>> 1       876499  .       A       G       2900.87 PASS
>> AC=15;AF=0.938;AN=16;BaseQRankSum=1.636;DP=92;Dels=0.00;FS=0.000;HRun=6;HaplotypeScore=0.4159;MQ=59.36;MQ0=0;MQRankSum=1.274;QD=31.53;ReadPosRankSum=-0.482;SB=-1653.87;set=variant2
>> GT:AD:DP:GQ:PL  1/1:0,9:9:24.07:303,24,0
>> 1/1:0,10:10:27.09:365,27,0      0/1:5,4:9:99:104,0,166
>> 1/1:0,7:7:18.04:220,18,0        1/1:0,16:16:39.13:534,39,0
>> 1/1:0,12:12:30.10:407,30,0      1/1:0,14:14:39.13:535,39,0
>> 1/1:0,15:15:36.12:483,36,0      ./.****
>>
>> ** **
>>
>> OUTPUT annotation file:****
>>
>> #Uploaded_variation     Location        Existing_variation      Allele
>> ZYG     Gene    Feature Feature_type    Consequence     GMAF    IND****
>>
>> 1_876499_A      1:876499        rs4372192       -       HOM     -
>> -       -       intergenic_variant      A:0.0824         sample9****
>>
>> ** **
>>
>> As you can see, the annotation output only contains 1 line and it is for
>> the individual that has no genotype call (./.)****
>>
>> ** **
>>
>> Also, the variation name does not contain the ref/alt_allele information
>> on the name as all other variations. I would expect if to be called
>> 1_876499_A/G****
>>
>> ** **
>>
>> For reference here are the config options I used:****
>>
>> ** **
>>
>> host
>> [internalserver]user
>> [user]****
>>
>> password                                            [password]****
>>
>> db_version        72            ****
>>
>> port                                                       3306 ****
>>
>> species                                                 homo_sapiens****
>>
>>  ****
>>
>> #######     runtime options  #############****
>>
>> buffer_size                                         40000****
>>
>> most_severe                     1****
>>
>> check_existing                  1****
>>
>> check_alleles                     1****
>>
>> individual                                             all****
>>
>> fork                                                        6****
>>
>>  verbose                                                               1*
>> ***
>>
>>  gmaf                                                      1****
>>
>> filter_common                  1****
>>
>> fields
>> Uploaded_variation,Location,Existing_variation,Allele,ZYG,Gene,Feature,Feature_type,Consequence,GMAF,IND
>> ****
>>
>> ** **
>>
>> #######     cache stuff   ############# ****
>>
>> cache                                                    1****
>>
>> dir_plugins
>> /NGS_Test/vep_72_testing/Plugins/****
>>
>> dir_cache
>> /ReferenceData/vep_cache****
>>
>> # cache_region_size       1MB****
>>
>> #offline                                                1****
>>
>> # skip_db_check                              1****
>>
>> ** **
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/****
>>
>> ** **
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130827/b5c19d2b/attachment.html>


More information about the Dev mailing list