[ensembl-dev] Split a vcf site by transcript

Tue Apr 5 09:26:30 BST 2016

Hi there,

I'm using variant effect predictor on local, release 83. I have a 
question for you.
This is my command line:

perl variant_effect_predictor.pl -i 
$path/20160317_Cardio_GATK_Filter.vcf -o $path/20160317_ANN_Cardio.vcf 
--stats_file $path/20160317_ANN_Cardio.html --cache --assembly GRCh37 
--offline --force_overwrite -v --variant_class --sift b --poly b 
--vcf_info_field ANN --hgvs --protein --canonical --check_existing 
--gmaf --pubmed --species homo_sapiens --failed 1 --vcf --plugin LoFtool

I'm trying to get a vcf output with one line for each transcript, in 
fact, I'd like to filter after the annotation using a list of 
transcripts like this:

ENST00000299421
ENST00000372980
ENST00000310706
ENST00000252321
ENST00000315987

Using --pick option, I get only one trascript for each position but I'm 
looking for all transcripts splitted on different lines, like the .txt 
output but in a vcf format like this example:

chr1    25890050    .    A    G    20396.77    PASS 
AC=15;AF=0.625;AN=24;BaseQRankSum=3.48;ClippingRankSum=0.973;DP=909;ExcessHet=5.5287;FS=0.000;InbreedingCoeff=-0.2456;MLEAC=15;MLEAF=0.625;MQ=60.00;MQRankSum=0.167;QD=23.05;ReadPosRankSum=0.394;SOR=0.709;ANN=G|upstream_gene_variant|MODIFIER|LDLRAP1|ENSG00000157978|Transcript|ENST00000470950|processed_transcript||||||||||rs6661159|1|1032|1|SNV|HGNC|18640||||||G:0.4393|||| 
GT:AD:DP:GQ:PL    0/1:34,54:88:99:1817,0,946

chr1    25890050    .    A    G    20396.77    PASS 
AC=15;AF=0.625;AN=24;BaseQRankSum=3.48;ClippingRankSum=0.973;DP=909;ExcessHet=5.5287;FS=0.000;InbreedingCoeff=-0.2456;MLEAC=15;MLEAF=0.625;MQ=60.00;MQRankSum=0.167;QD=23.05;ReadPosRankSum=0.394;SOR=0.709;ANN=G|intron_variant&non_coding_transcript_variant|MODIFIER|LDLRAP1|ENSG00000157978|Transcript|ENST00000484476|processed_transcript||1/3|ENST00000484476.1:n.339-102A>G|||||||rs6661159|1||1|SNV|HGNC|18640||||||G:0.4393|||| 
GT:AD:DP:GQ:PL    0/1:34,54:88:99:1817,0,946

Last question: is it possible to split also multiallelic sites 
annotation or do you suggest to normalize them before?

Thank you for your help,

Matteo