[ensembl-dev] VEP vcf annotation

Dietmar Rieder dietmar.rieder at i-med.ac.at
Fri Jul 2 11:45:12 BST 2021


Hi,

here is the command for the vcf output:

vep -i CRC15_CRC15_normal_Somatic.hc.vcf.gz \
     -o CRC15_CRC15_normal_tumor_vep.vcf \
     --fork 16 \
     --stats_file CRC15_CRC15_normal_tumor_vep_summary.html \
     --species homo_sapiens \
     --assembly GRCh38 \
     --offline \
     --cache \
     --cache_version 103 \
     --dir /data/databases/vep_cache \
     --dir_cache /data/databases/databases/vep_cache \
     --hgvs \
     --fasta 
/data/databases/vep_cache/homo_sapiens/103_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz 
\
     --pick --plugin Frameshift --plugin Wildtype \
     --plugin 
ProteinSeqs,CRC15_CRC15_normal_tumor_reference.fa,CRC15_CRC15_normal_tumor_mutated.fa 
\
     --symbol --terms SO --transcript_version --tsl \
     --vcf 2> vep_errors_1.txt



and this is the command for the table output:

vep -i CRC15_CRC15_normal_Somatic.hc.vcf.gz \
     -o CRC15_CRC15_normal_hc_vep.txt \
     --fork 16 \
     --stats_file CRC15_CRC15_normal_hc_vep_summary.html \
     --species homo_sapiens \
     --assembly GRCh38 \
     --offline \
     --dir /data/databases/vep_cache \
     --cache \
     --cache_version 103 \
     --dir_cache /data/databases/vep_cache \
     --fasta 
/data/databases/vep_cache/homo_sapiens/103_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz 
\
     --format "vcf" \
     --everything \
     --tab 2> vep_errors.txt

Best
    Dietmar

On 7/2/21 12:18 PM, Diana Lemos wrote:
> Hi Dietmar,
> 
> I'm unable to reproduce the issue. Could you please send me the VEP 
> command you are running?
> 
> 
> Thanks
> 
> Diana
> 
> 
> On 02/07/2021 10:52, Dietmar Rieder wrote:
>> Hi,
>>
>> we are using VEP (103) to annotat our VCFs and we just stumbled over 
>> the situation that for the mutation chr5_112838250_C/T 
>> (chr5:112838250) we get 7 annotated transcript variants in the gene 
>> with SYMBOL ACP and one in the "gene" with SYMBOL AC008575.1, in the 
>> VEP txt output, which is fine.
>>
>> BUT
>>
>> when we use -vcf to get an annotated vcf file we get the mutation on 
>> that position only annotated with the SYMBOL AC008575.1
>> This is problematic, because the canonical gene here is APC (a known 
>> driver gene in CRC) and we miss it when parsing the VCF
>>
>> Would it be possible to add all gene symbols to the SYMBOL field in 
>> the CSQ of the vcf?
>>
>> Thanks
>>   Dietmar
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog:http://www.ensembl.info/


-- 
_________________________________________
D i e t m a r  R i e d e r, Mag.Dr.
Head of HPC/Bioinformatics facility
Innsbruck Medical University
Biocenter - Institute of Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402
Fax: +43 512 9003 73100
Email: dietmar.rieder at i-med.ac.at
Web:   http://www.icbi.at


-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20210702/e5678bfe/attachment.sig>


More information about the Dev mailing list