[ensembl-dev] Variant Effect Predictor and VCF output

Will McLaren wm2 at ebi.ac.uk
Thu Nov 10 14:16:12 GMT 2011


Thanks again Chris, I will fix that bug for this release.

Will

On 10 November 2011 14:02,  <cj5 at sanger.ac.uk> wrote:
> H Will,
>
>> Thanks for sharing, that's really useful.
>>
>> Would it be OK if I implemented a similar thing in the VEP?
>
> Absolutely. Please be aware also of the outstanding sort bug in the VEP :
>
> http://lists.ensembl.org/pipermail/dev/2011-October/001706.html
>
> Until this is fixed, the script needs to be preceded with an awk/ sort to
> guarantee that the text file is in chromosome order.
> I use :
>
> head -100 $1 | grep '^#'    # speed things up, assume less than 100 header
> lines
> grep -v '^#' $1 | awk '$2 ~ "^[1-9]:"{$2="0"$2} {print $0}' | sort -k2 |
> awk 'BEGIN {OFS="\t"} $2~"^0"{$2 = substr($2,2)} {print $0}'
>
>
>> I could change CSQ to something different if you think that would be
>> more sensible, or is there precedence for using this in VCF world?
>>
>
> Fine, there is no consensus re the info field name.
>
> Thanks
> Chris
>
>
>> On 10 November 2011 13:41,  <cj5 at sanger.ac.uk> wrote:
>>> Hi,
>>> For the UK10K project we are using the following script, which
>>> optionally
>>> adds GERP and Grantham Matric scores
>>>
>>> https://github.com/VertebrateResequencing/vr-codebase/blob/develop/scripts/vcf2consequences_vep
>>>
>>> regards
>>> Chris Joyce
>>> Wellcome Trust Sanger Institute
>>>
>>>
>>>> Hi Fedor,
>>>>
>>>> Currently there is no standard way to describe consequences in VCF;
>>>> the main issue to overcome is that our output format provides one line
>>>> per variant/allele/transcript, whereas VCF mandates one line per
>>>> variant. This means we'd have to squeeze an awful lot of information
>>>> into the INFO column of the VCF.
>>>>
>>>> We should, however, be able to provide at least summary level
>>>> information in the INFO field, and this is what we will look into
>>>> doing, as we have had several requests for VCF output to be a feature
>>>> of the VEP.
>>>>
>>>> I am not aware of any tools to convert, however, I think a simple perl
>>>> script and using the --most_severe or --summary options (both of which
>>>> output only one line per variant) in the VEP you should be able to
>>>> combine the original VCF with the output.
>>>>
>>>> Hope this helps
>>>>
>>>> Will McLaren
>>>> Ensembl Variation
>>>>
>>>> On 9 November 2011 20:03, Fedor Gusev <gusevfe at gmail.com> wrote:
>>>>> Hello everyone.
>>>>>
>>>>> How come it is not possible for VEP to output a vcf file? Are there
>>>>> any tools to convert the output to VCF?
>>>>>
>>>>> --
>>>>> Kind regards,
>>>>> Fedor Gusev.
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> List admin (including subscribe/unsubscribe):
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> List admin (including subscribe/unsubscribe):
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>  The Wellcome Trust Sanger Institute is operated by Genome Research
>>>  Limited, a charity registered in England with number 1021457 and a
>>>  company registered in England with number 2742969, whose registered
>>>  office is 215 Euston Road, London, NW1 2BE.
>>>
>>
>
>
>
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>




More information about the Dev mailing list