[ensembl-dev] VEP predictor 2.5

nathalie nac at sanger.ac.uk
Tue May 15 15:31:54 BST 2012


On 15/05/12 15:06, Will McLaren wrote:
> Hi Nathalie,
HI Will,
>
> There isn't an option to do this, as doing so would disrupt the VCF
> formatting and make the VCF incompatible with other software. You can
> see the definitions of VCF here http://www.1000genomes.org/node/101.
sure , in fact I want a text file not a vcf file..
> If you wanted to you could use the tr UNIX command to change the pipes to tabs:
>
> tr "|" "\t" my_output.txt>  my_changed_output.txt
>
> However, I'd urge caution doing this; the VCF format output has one
> line per variant, so you will find multiple "blocks" of consequence
> information separated by commas (each block corresponds to one
> transcript that the variant overlaps), and in doing the above tr
> command you will get different numbers of columns on each line of
> output.
>
> In the default format (without --vcf) you get one line of output per
> variant and transcript combination (equivalent to one of the blocks
> from the VCF output).
>
> What exactly are you trying to achieve? It might be you could write a
> simple perl (or any other language) script to parse the default format
> output to combine it with your original VCF in a way that suits you
> better.


I just would like to give to my end users an "easier"  tab delimited  
file (to open with excel) where they can get all info from the default 
output as well as  info about GT AD DP GQ...like this
  #Uploaded_variation     Location        Allele  Gene    Feature 
Feature_type    Consequence     cDNA_position   CDS_position    
Protein_position        Amino_acids     Codons  Existing_varia
tion    Extra  GT:AD:DP:GQ:PL
rs2480683       1:50661411      T       ENSG00000162374 ENST00000371824 
Transcript      SYNONYMOUS_CODING       944     687     229     P       
ccC/ccT rs2480683       ENSP=ENSP00000360889;H
GNC=ELAVL4  0/1:2,3:6:63.59:81,0,64
rs2480683       1:50661411      T       ENSG00000162374 ENST00000448907 
Transcript      SYNONYMOUS_CODING       847     696     232     P       
ccC/ccT rs2480683       ENSP=ENSP00000399939;H
GNC=ELAVL4  0/1:2,3:6:63.59:81,0,64

indeed making a script which would fuse my vcf with my VEP default 
format will achieve the format I want.
Nat

> Cheers
>
> Will
>
> On 15 May 2012 14:51, nathalie<nac at sanger.ac.uk>  wrote:
>> HI,
>>
>> I am using VEP predictor 2.5 with my vcf files and  I want to see a vcf
>> output with all consequences without the | separator but  with a tab
>> separator
>>
>> This is the command I use:
>> ./variant_effect_predictor.pl -i in  -o out.VEP2.5 -sift=b -polyphen=b
>> --check_existing --hgnc --gene --protein --vcf
>>
>> I would like a mix between the output you get choosing without the vcf
>> option and still have GT:AD:DP:GQ:PL   CHROM  POS     ID      REF     ALT
>>    QUAL    FILTER  from the vcf file
>>
>> This is my output file:
>> ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
>> predicted by VEP. Format:
>> Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_
>> acids|Codons|Existing_variation|PolyPhen|SIFT|HGNC|ENSP">
>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
>>   MKS97
>> Chr1    5002713 rs6669905       T       C       51.36   PASS
>>   CSQ=C||||INTERGENIC||||||rs6669905||||  GT:AD:DP:GQ:PL
>>   0/1:2,3:5:63.59:81,0,64
>> Chr1    5036174 rs6689050       G       C       31.02   PASS
>>   CSQ=C||||INTERGENIC||||||||||   GT:AD:DP:GQ:PL  0/1:2,2:4:61.01:61,0,67
>> Chr1    50162840        rs6689057       C       T       408.03  PASS
>>   CSQ=T|ENSG00000186094|ENST00000371839|Transcript|INTRONIC||||||rs6689057|||AGBL4|ENSP00000360905,T|ENSG00000186094|ENS
>> T00000411952|Transcript|INTRONIC||||||rs6689057|||AGBL4|ENSP00000411423,T|ENSG00000186094|ENST00000497451|Transcript|WITHIN_NON_CODING_GENE&INTRONIC||||||rs6689057|||AGBL4|,T|ENSG00000186094
>> |ENST00000371838|Transcript|INTRONIC||||||rs6689057|||AGBL4|ENSP00000360904,T|ENSG00000186094|ENST00000371836|Transcript|INTRONIC||||||rs6689057|||AGBL4|ENSP00000360902
>>         GT:AD:DP:GQ:PL
>>         0/1:11,14:25:99:438,0,353
>>
>>
>>
>> Without the --vcf option the output is like this
>>   #Uploaded_variation     Location        Allele  Gene    Feature
>> Feature_type    Consequence     cDNA_position   CDS_position
>>   Protein_position        Amino_acids     Codons  Existing_varia
>> tion    Extra
>> rs2480683       1:50661411      T       ENSG00000162374 ENST00000371824
>> Transcript      SYNONYMOUS_CODING       944     687     229     P
>> ccC/ccT rs2480683       ENSP=ENSP00000360889;H
>> GNC=ELAVL4
>> rs2480683       1:50661411      T       ENSG00000162374 ENST00000448907
>> Transcript      SYNONYMOUS_CODING       847     696     232     P
>> ccC/ccT rs2480683       ENSP=ENSP00000399939;H
>> GNC=ELAVL4
>> rs2480683       1:50661411      T       ENSG00000162374 ENST00000371821
>> Transcript      SYNONYMOUS_CODING       1017    702     234     P
>> ccC/ccT rs2480683       ENSP=ENSP00000360886;H
>>
>>
>>   I would like to see the vcf (below) with the rest of the file like in the
>> example just above
>> CHROM  POS     ID      REF     ALT     QUAL    FILTER
>> Chr1    5002713 rs6669905       T       C       51.36   PASS
>> GT:AD:DP:GQ:PL  0/1:2,3:5:63.59:81,0,64
>>
>>
>> is there an option in VEP 2.5 to do this? could you help?
>> thanks
>> Nat
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe):
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/





More information about the Dev mailing list