[ensembl-dev] VEP vcf annotation
Diana Lemos
dlemos at ebi.ac.uk
Fri Jul 2 12:01:40 BST 2021
Thanks for sharing the command.
Your commands are not the same, to generate the VCF output you have the
option --pick which is going to pick one consequence per variant
according to the criteria described here:
https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick
<https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#pick>
This option is not being used in your second command. If you remove
--pick you should have the same number of consequences in both outputs.
Best wishes,
Diana
On 02/07/2021 11:45, Dietmar Rieder wrote:
> Hi,
>
> here is the command for the vcf output:
>
> vep -i CRC15_CRC15_normal_Somatic.hc.vcf.gz \
> -o CRC15_CRC15_normal_tumor_vep.vcf \
> --fork 16 \
> --stats_file CRC15_CRC15_normal_tumor_vep_summary.html \
> --species homo_sapiens \
> --assembly GRCh38 \
> --offline \
> --cache \
> --cache_version 103 \
> --dir /data/databases/vep_cache \
> --dir_cache /data/databases/databases/vep_cache \
> --hgvs \
> --fasta
> /data/databases/vep_cache/homo_sapiens/103_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
> \
> --pick --plugin Frameshift --plugin Wildtype \
> --plugin
> ProteinSeqs,CRC15_CRC15_normal_tumor_reference.fa,CRC15_CRC15_normal_tumor_mutated.fa
> \
> --symbol --terms SO --transcript_version --tsl \
> --vcf 2> vep_errors_1.txt
>
>
>
> and this is the command for the table output:
>
> vep -i CRC15_CRC15_normal_Somatic.hc.vcf.gz \
> -o CRC15_CRC15_normal_hc_vep.txt \
> --fork 16 \
> --stats_file CRC15_CRC15_normal_hc_vep_summary.html \
> --species homo_sapiens \
> --assembly GRCh38 \
> --offline \
> --dir /data/databases/vep_cache \
> --cache \
> --cache_version 103 \
> --dir_cache /data/databases/vep_cache \
> --fasta
> /data/databases/vep_cache/homo_sapiens/103_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
> \
> --format "vcf" \
> --everything \
> --tab 2> vep_errors.txt
>
> Best
> Dietmar
>
> On 7/2/21 12:18 PM, Diana Lemos wrote:
>> Hi Dietmar,
>>
>> I'm unable to reproduce the issue. Could you please send me the VEP
>> command you are running?
>>
>>
>> Thanks
>>
>> Diana
>>
>>
>> On 02/07/2021 10:52, Dietmar Rieder wrote:
>>> Hi,
>>>
>>> we are using VEP (103) to annotat our VCFs and we just stumbled over
>>> the situation that for the mutation chr5_112838250_C/T
>>> (chr5:112838250) we get 7 annotated transcript variants in the gene
>>> with SYMBOL ACP and one in the "gene" with SYMBOL AC008575.1, in the
>>> VEP txt output, which is fine.
>>>
>>> BUT
>>>
>>> when we use -vcf to get an annotated vcf file we get the mutation on
>>> that position only annotated with the SYMBOL AC008575.1
>>> This is problematic, because the canonical gene here is APC (a known
>>> driver gene in CRC) and we miss it when parsing the VCF
>>>
>>> Would it be possible to add all gene symbols to the SYMBOL field in
>>> the CSQ of the vcf?
>>>
>>> Thanks
>>> Dietmar
>>>
>>>
>>> _______________________________________________
>>> Dev mailing listDev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe
>>> info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>>> Ensembl Blog:http://www.ensembl.info/
>
>
More information about the Dev
mailing list