[ensembl-dev] VEP predictor 2.5
nathalie
nac at sanger.ac.uk
Tue May 15 15:31:54 BST 2012
On 15/05/12 15:06, Will McLaren wrote:
> Hi Nathalie,
HI Will,
>
> There isn't an option to do this, as doing so would disrupt the VCF
> formatting and make the VCF incompatible with other software. You can
> see the definitions of VCF here http://www.1000genomes.org/node/101.
sure , in fact I want a text file not a vcf file..
> If you wanted to you could use the tr UNIX command to change the pipes to tabs:
>
> tr "|" "\t" my_output.txt> my_changed_output.txt
>
> However, I'd urge caution doing this; the VCF format output has one
> line per variant, so you will find multiple "blocks" of consequence
> information separated by commas (each block corresponds to one
> transcript that the variant overlaps), and in doing the above tr
> command you will get different numbers of columns on each line of
> output.
>
> In the default format (without --vcf) you get one line of output per
> variant and transcript combination (equivalent to one of the blocks
> from the VCF output).
>
> What exactly are you trying to achieve? It might be you could write a
> simple perl (or any other language) script to parse the default format
> output to combine it with your original VCF in a way that suits you
> better.
I just would like to give to my end users an "easier" tab delimited
file (to open with excel) where they can get all info from the default
output as well as info about GT AD DP GQ...like this
#Uploaded_variation Location Allele Gene Feature
Feature_type Consequence cDNA_position CDS_position
Protein_position Amino_acids Codons Existing_varia
tion Extra GT:AD:DP:GQ:PL
rs2480683 1:50661411 T ENSG00000162374 ENST00000371824
Transcript SYNONYMOUS_CODING 944 687 229 P
ccC/ccT rs2480683 ENSP=ENSP00000360889;H
GNC=ELAVL4 0/1:2,3:6:63.59:81,0,64
rs2480683 1:50661411 T ENSG00000162374 ENST00000448907
Transcript SYNONYMOUS_CODING 847 696 232 P
ccC/ccT rs2480683 ENSP=ENSP00000399939;H
GNC=ELAVL4 0/1:2,3:6:63.59:81,0,64
indeed making a script which would fuse my vcf with my VEP default
format will achieve the format I want.
Nat
> Cheers
>
> Will
>
> On 15 May 2012 14:51, nathalie<nac at sanger.ac.uk> wrote:
>> HI,
>>
>> I am using VEP predictor 2.5 with my vcf files and I want to see a vcf
>> output with all consequences without the | separator but with a tab
>> separator
>>
>> This is the command I use:
>> ./variant_effect_predictor.pl -i in -o out.VEP2.5 -sift=b -polyphen=b
>> --check_existing --hgnc --gene --protein --vcf
>>
>> I would like a mix between the output you get choosing without the vcf
>> option and still have GT:AD:DP:GQ:PL CHROM POS ID REF ALT
>> QUAL FILTER from the vcf file
>>
>> This is my output file:
>> ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
>> predicted by VEP. Format:
>> Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_
>> acids|Codons|Existing_variation|PolyPhen|SIFT|HGNC|ENSP">
>> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
>> MKS97
>> Chr1 5002713 rs6669905 T C 51.36 PASS
>> CSQ=C||||INTERGENIC||||||rs6669905|||| GT:AD:DP:GQ:PL
>> 0/1:2,3:5:63.59:81,0,64
>> Chr1 5036174 rs6689050 G C 31.02 PASS
>> CSQ=C||||INTERGENIC|||||||||| GT:AD:DP:GQ:PL 0/1:2,2:4:61.01:61,0,67
>> Chr1 50162840 rs6689057 C T 408.03 PASS
>> CSQ=T|ENSG00000186094|ENST00000371839|Transcript|INTRONIC||||||rs6689057|||AGBL4|ENSP00000360905,T|ENSG00000186094|ENS
>> T00000411952|Transcript|INTRONIC||||||rs6689057|||AGBL4|ENSP00000411423,T|ENSG00000186094|ENST00000497451|Transcript|WITHIN_NON_CODING_GENE&INTRONIC||||||rs6689057|||AGBL4|,T|ENSG00000186094
>> |ENST00000371838|Transcript|INTRONIC||||||rs6689057|||AGBL4|ENSP00000360904,T|ENSG00000186094|ENST00000371836|Transcript|INTRONIC||||||rs6689057|||AGBL4|ENSP00000360902
>> GT:AD:DP:GQ:PL
>> 0/1:11,14:25:99:438,0,353
>>
>>
>>
>> Without the --vcf option the output is like this
>> #Uploaded_variation Location Allele Gene Feature
>> Feature_type Consequence cDNA_position CDS_position
>> Protein_position Amino_acids Codons Existing_varia
>> tion Extra
>> rs2480683 1:50661411 T ENSG00000162374 ENST00000371824
>> Transcript SYNONYMOUS_CODING 944 687 229 P
>> ccC/ccT rs2480683 ENSP=ENSP00000360889;H
>> GNC=ELAVL4
>> rs2480683 1:50661411 T ENSG00000162374 ENST00000448907
>> Transcript SYNONYMOUS_CODING 847 696 232 P
>> ccC/ccT rs2480683 ENSP=ENSP00000399939;H
>> GNC=ELAVL4
>> rs2480683 1:50661411 T ENSG00000162374 ENST00000371821
>> Transcript SYNONYMOUS_CODING 1017 702 234 P
>> ccC/ccT rs2480683 ENSP=ENSP00000360886;H
>>
>>
>> I would like to see the vcf (below) with the rest of the file like in the
>> example just above
>> CHROM POS ID REF ALT QUAL FILTER
>> Chr1 5002713 rs6669905 T C 51.36 PASS
>> GT:AD:DP:GQ:PL 0/1:2,3:5:63.59:81,0,64
>>
>>
>> is there an option in VEP 2.5 to do this? could you help?
>> thanks
>> Nat
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list