[ensembl-dev] VEP vcf annotation
Dietmar Rieder
dietmar.rieder at i-med.ac.at
Fri Jul 2 11:45:12 BST 2021
Hi,
here is the command for the vcf output:
vep -i CRC15_CRC15_normal_Somatic.hc.vcf.gz \
-o CRC15_CRC15_normal_tumor_vep.vcf \
--fork 16 \
--stats_file CRC15_CRC15_normal_tumor_vep_summary.html \
--species homo_sapiens \
--assembly GRCh38 \
--offline \
--cache \
--cache_version 103 \
--dir /data/databases/vep_cache \
--dir_cache /data/databases/databases/vep_cache \
--hgvs \
--fasta
/data/databases/vep_cache/homo_sapiens/103_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
\
--pick --plugin Frameshift --plugin Wildtype \
--plugin
ProteinSeqs,CRC15_CRC15_normal_tumor_reference.fa,CRC15_CRC15_normal_tumor_mutated.fa
\
--symbol --terms SO --transcript_version --tsl \
--vcf 2> vep_errors_1.txt
and this is the command for the table output:
vep -i CRC15_CRC15_normal_Somatic.hc.vcf.gz \
-o CRC15_CRC15_normal_hc_vep.txt \
--fork 16 \
--stats_file CRC15_CRC15_normal_hc_vep_summary.html \
--species homo_sapiens \
--assembly GRCh38 \
--offline \
--dir /data/databases/vep_cache \
--cache \
--cache_version 103 \
--dir_cache /data/databases/vep_cache \
--fasta
/data/databases/vep_cache/homo_sapiens/103_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
\
--format "vcf" \
--everything \
--tab 2> vep_errors.txt
Best
Dietmar
On 7/2/21 12:18 PM, Diana Lemos wrote:
> Hi Dietmar,
>
> I'm unable to reproduce the issue. Could you please send me the VEP
> command you are running?
>
>
> Thanks
>
> Diana
>
>
> On 02/07/2021 10:52, Dietmar Rieder wrote:
>> Hi,
>>
>> we are using VEP (103) to annotat our VCFs and we just stumbled over
>> the situation that for the mutation chr5_112838250_C/T
>> (chr5:112838250) we get 7 annotated transcript variants in the gene
>> with SYMBOL ACP and one in the "gene" with SYMBOL AC008575.1, in the
>> VEP txt output, which is fine.
>>
>> BUT
>>
>> when we use -vcf to get an annotated vcf file we get the mutation on
>> that position only annotated with the SYMBOL AC008575.1
>> This is problematic, because the canonical gene here is APC (a known
>> driver gene in CRC) and we miss it when parsing the VCF
>>
>> Would it be possible to add all gene symbols to the SYMBOL field in
>> the CSQ of the vcf?
>>
>> Thanks
>> Dietmar
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog:http://www.ensembl.info/
--
_________________________________________
D i e t m a r R i e d e r, Mag.Dr.
Head of HPC/Bioinformatics facility
Innsbruck Medical University
Biocenter - Institute of Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402
Fax: +43 512 9003 73100
Email: dietmar.rieder at i-med.ac.at
Web: http://www.icbi.at
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 665 bytes
Desc: OpenPGP digital signature
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20210702/e5678bfe/attachment.sig>
More information about the Dev
mailing list