[ensembl-dev] VEP vcf annotation

Diana Lemos dlemos at ebi.ac.uk
Fri Jul 2 12:01:40 BST 2021


Thanks for sharing the command.

Your commands are not the same, to generate the VCF output you have the 
option --pick which is going to pick one consequence per variant 
according to the criteria described here: 
https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick 
<https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick>

This option is not being used in your second command. If you remove 
--pick you should have the same number of consequences in both outputs.


Best wishes,

Diana


On 02/07/2021 11:45, Dietmar Rieder wrote:
> Hi,
>
> here is the command for the vcf output:
>
> vep -i CRC15_CRC15_normal_Somatic.hc.vcf.gz \
>     -o CRC15_CRC15_normal_tumor_vep.vcf \
>     --fork 16 \
>     --stats_file CRC15_CRC15_normal_tumor_vep_summary.html \
>     --species homo_sapiens \
>     --assembly GRCh38 \
>     --offline \
>     --cache \
>     --cache_version 103 \
>     --dir /data/databases/vep_cache \
>     --dir_cache /data/databases/databases/vep_cache \
>     --hgvs \
>     --fasta 
> /data/databases/vep_cache/homo_sapiens/103_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz 
> \
>     --pick --plugin Frameshift --plugin Wildtype \
>     --plugin 
> ProteinSeqs,CRC15_CRC15_normal_tumor_reference.fa,CRC15_CRC15_normal_tumor_mutated.fa 
> \
>     --symbol --terms SO --transcript_version --tsl \
>     --vcf 2> vep_errors_1.txt
>
>
>
> and this is the command for the table output:
>
> vep -i CRC15_CRC15_normal_Somatic.hc.vcf.gz \
>     -o CRC15_CRC15_normal_hc_vep.txt \
>     --fork 16 \
>     --stats_file CRC15_CRC15_normal_hc_vep_summary.html \
>     --species homo_sapiens \
>     --assembly GRCh38 \
>     --offline \
>     --dir /data/databases/vep_cache \
>     --cache \
>     --cache_version 103 \
>     --dir_cache /data/databases/vep_cache \
>     --fasta 
> /data/databases/vep_cache/homo_sapiens/103_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz 
> \
>     --format "vcf" \
>     --everything \
>     --tab 2> vep_errors.txt
>
> Best
>    Dietmar
>
> On 7/2/21 12:18 PM, Diana Lemos wrote:
>> Hi Dietmar,
>>
>> I'm unable to reproduce the issue. Could you please send me the VEP 
>> command you are running?
>>
>>
>> Thanks
>>
>> Diana
>>
>>
>> On 02/07/2021 10:52, Dietmar Rieder wrote:
>>> Hi,
>>>
>>> we are using VEP (103) to annotat our VCFs and we just stumbled over 
>>> the situation that for the mutation chr5_112838250_C/T 
>>> (chr5:112838250) we get 7 annotated transcript variants in the gene 
>>> with SYMBOL ACP and one in the "gene" with SYMBOL AC008575.1, in the 
>>> VEP txt output, which is fine.
>>>
>>> BUT
>>>
>>> when we use -vcf to get an annotated vcf file we get the mutation on 
>>> that position only annotated with the SYMBOL AC008575.1
>>> This is problematic, because the canonical gene here is APC (a known 
>>> driver gene in CRC) and we miss it when parsing the VCF
>>>
>>> Would it be possible to add all gene symbols to the SYMBOL field in 
>>> the CSQ of the vcf?
>>>
>>> Thanks
>>>   Dietmar
>>>
>>>
>>> _______________________________________________
>>> Dev mailing listDev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe 
>>> info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>> Ensembl Blog:http://www.ensembl.info/
>
>




More information about the Dev mailing list