[ensembl-dev] VEP does not annotate all rows in a samtools VCF

Marlies Dolezal marlies.dolezal at gmail.com
Mon Apr 28 11:58:34 BST 2014


hi will,
i produced a couple of files with only 1000 lines.
the error is in my samtools vcf files i have "variants" that differ in the
number of Ns for ref and alt alleles.
and these are correctly omitted by VEP.
sorry to have bothered you with my problem!

regards marlies


2014-04-28 11:49 GMT+02:00 Will McLaren <wm2 at ebi.ac.uk>:

> I can't think of anything else - perhaps you can send me a snippet of your
> input that recreates the problem?
>
> e.g. a file with 1000 lines that the VEP reports annotating only 995
>
> Will
>
>
> On 28 April 2014 10:01, Marlies Dolezal <marlies.dolezal at gmail.com> wrote:
>
>> hi Will,
>> thanks for your quick reply!
>> unfortunately using the  --allow_non_variant option gave the exact same
>> output.
>>
>>
>> -offline --allow_non_variant --force_overwrite --species bos_taurus
>> --fork 16 - --input_file Chr$i.vcf --o Chr$i.vep
>>
>> Start time 2014-04-28 10:27:05
>> End time 2014-04-28 10:36:57
>> Run time 592 seconds
>>
>>
>>
>> Lines of input read 900857
>> Variants processed 900241
>> Variants remaining after filtering 900241
>>
>>
>> any other ideas?
>> thanks again!
>> regards, marlies
>>
>>
>> 2014-04-28 10:24 GMT+02:00 Will McLaren <wm2 at ebi.ac.uk>:
>>
>> >
>>
>> > Hello,
>> >
>> > It's possible you have some non-variant lines in your VCF; these will
>> have a "." as the ALT allele column, something like:
>> >
>> > 21      26960070        .     G       .       .       .       .
>> >
>> > By default the VEP ignores these. You can force the VEP to allow them
>> through (though they still won't be annotated) using --allow_non_variant.
>> >
>> > Regards
>> >
>> > Will McLaren
>> > Ensembl Variation
>> >
>> >
>> > On 28 April 2014 08:52, Marlies Dolezal <marlies.dolezal at gmail.com>
>> wrote:
>> >>
>> >> hi all,
>> >>
>> >> i am using the latest VEP version 75 (API)(75) to annotate samtools
>> VCF files.
>> >>
>> >> -offline --force_overwrite --species bos_taurus --fork 16 --input_file
>> Chr$i.vcf --o Chr$i.vep
>> >>
>> >>
>> >> the General statistics section of the VEP_summary.html tells me that
>> all lines of my vcfs are read in, but only a subset of these are processed.
>> >> eg:
>> >> Lines of input read 900857
>> >> Variants processed 900241
>> >>
>> >> the difference in lines does not correspond to header/comment lines
>> only.
>> >>
>> >> where can i find out which variants are not processed to try to figure
>> out why they are not processed?
>> >>
>> >> thanks a lot in advance
>> >> regards Marlies
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Dr. Marlies Dolezal
>> >> 1030 Vienna
>> >> Austria/Europe
>> >>
>> >> marlies.dolezal(at)gmail.com
>> >>
>> >> “The great tragedy of science is the slaying of a beautiful hypothesis
>> by an ugly fact.”
>> >> Thomas Henry Huxley
>> >> (1825-1895)
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Dev mailing list    Dev at ensembl.org
>> >> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> >> Ensembl Blog: http://www.ensembl.info/
>> >>
>> >
>> >
>> > _______________________________________________
>> > Dev mailing list    Dev at ensembl.org
>> > Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> > Ensembl Blog: http://www.ensembl.info/
>> >
>>
>>
>>
>> --
>> Dr. Marlies Dolezal
>> 1030 Vienna
>> Austria/Europe
>>
>> marlies.dolezal(at)gmail.com
>>
>> “The great tragedy of science is the slaying of a beautiful hypothesis by
>> an ugly fact.”
>> Thomas Henry Huxley
>> (1825-1895)
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 
Dr. Marlies Dolezal
1030 Vienna
Austria/Europe

marlies.dolezal(at)gmail.com

“The great tragedy of science is the slaying of a beautiful hypothesis by
an ugly fact.”
Thomas Henry Huxley
(1825-1895)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140428/73b25af0/attachment.html>


More information about the Dev mailing list