[ensembl-dev] Variant Effect Predictor and VCF output

cj5 at sanger.ac.uk cj5 at sanger.ac.uk
Thu Nov 10 14:02:39 GMT 2011


H Will,

> Thanks for sharing, that's really useful.
>
> Would it be OK if I implemented a similar thing in the VEP?

Absolutely. Please be aware also of the outstanding sort bug in the VEP :

http://lists.ensembl.org/pipermail/dev/2011-October/001706.html

Until this is fixed, the script needs to be preceded with an awk/ sort to
guarantee that the text file is in chromosome order.
I use :

head -100 $1 | grep '^#'    # speed things up, assume less than 100 header
lines
grep -v '^#' $1 | awk '$2 ~ "^[1-9]:"{$2="0"$2} {print $0}' | sort -k2 |
awk 'BEGIN {OFS="\t"} $2~"^0"{$2 = substr($2,2)} {print $0}'


> I could change CSQ to something different if you think that would be
> more sensible, or is there precedence for using this in VCF world?
>

Fine, there is no consensus re the info field name.

Thanks
Chris


> On 10 November 2011 13:41,  <cj5 at sanger.ac.uk> wrote:
>> Hi,
>> For the UK10K project we are using the following script, which
>> optionally
>> adds GERP and Grantham Matric scores
>>
>> https://github.com/VertebrateResequencing/vr-codebase/blob/develop/scripts/vcf2consequences_vep
>>
>> regards
>> Chris Joyce
>> Wellcome Trust Sanger Institute
>>
>>
>>> Hi Fedor,
>>>
>>> Currently there is no standard way to describe consequences in VCF;
>>> the main issue to overcome is that our output format provides one line
>>> per variant/allele/transcript, whereas VCF mandates one line per
>>> variant. This means we'd have to squeeze an awful lot of information
>>> into the INFO column of the VCF.
>>>
>>> We should, however, be able to provide at least summary level
>>> information in the INFO field, and this is what we will look into
>>> doing, as we have had several requests for VCF output to be a feature
>>> of the VEP.
>>>
>>> I am not aware of any tools to convert, however, I think a simple perl
>>> script and using the --most_severe or --summary options (both of which
>>> output only one line per variant) in the VEP you should be able to
>>> combine the original VCF with the output.
>>>
>>> Hope this helps
>>>
>>> Will McLaren
>>> Ensembl Variation
>>>
>>> On 9 November 2011 20:03, Fedor Gusev <gusevfe at gmail.com> wrote:
>>>> Hello everyone.
>>>>
>>>> How come it is not possible for VEP to output a vcf file? Are there
>>>> any tools to convert the output to VCF?
>>>>
>>>> --
>>>> Kind regards,
>>>> Fedor Gusev.
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> List admin (including subscribe/unsubscribe):
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>
>>
>> --
>>
>>
>>
>>
>>
>>
>> --
>>  The Wellcome Trust Sanger Institute is operated by Genome Research
>>  Limited, a charity registered in England with number 1021457 and a
>>  company registered in England with number 2742969, whose registered
>>  office is 215 Euston Road, London, NW1 2BE.
>>
>






More information about the Dev mailing list