[ensembl-dev] VEP Extra output information

Will McLaren wm2 at ebi.ac.uk
Wed Apr 17 15:50:29 BST 2013


Yes, you can customise the fields used and the order they appear in
with --fields; this applies to both VCF and the normal tab-delimited
output.

The delimiter is hardcoded I'm afraid, but I'm not sure what you'd
pick if you did decide to change it. ";" and "," are already used by
the VCF spec, and ":" appears in HGVS notations and other fields.

If you did want to change it, you'd just need to edit lines 1272 and
1275 of ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm.

Will



On 17 April 2013 15:32, Guillermo Marco Puche
<guillermo.marco at sistemasgenomicos.com> wrote:
> Hello Will,
>
>
> On 04/17/13 14:46, Will McLaren wrote:
>
> Hello,
>
> It's difficult (well, in fact impossible) to provide an example where
> every field is populated, since some field types are mutually
> exclusive dependent on the feature type overlapped (for example, you
> will never see the CELL_TYPE field populated for a variant/transcript
> combination).
>
> If you are interested in this for the purposes of how it looks for a
> parser, you really want to be looking at the header line added to the
> VCF by the VEP:
>
> ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
> predicted by VEP. Format:
> Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|EXON|INTRON|HGNC|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|DISTANCE|CLIN_SIG|CANONICAL|SIFT|PolyPhen|GMAF|ENSP|DOMAINS|CCDS|HGVSc|HGVSp|CELL_TYPE|BLOSUM62|CAROL|Conservation|LinkedVariants|INTERPRO|TSSDistance">
>
> This lists the fields that are added in order. Using this you should
> be able to parse what appears in the body of the file.
>
> Here's an example using a bunch of plugins and with the "--everything"
> flag switched on:
>
> ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
> predicted by VEP. Format:
> Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|EXON|INTRON|HGNC|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|DISTANCE|CLIN_SIG|CANONICAL|SIFT|PolyPhen|GMAF|ENSP|DOMAINS|CCDS|HGVSc|HGVSp|CELL_TYPE|BLOSUM62|CAROL|Conservation|LinkedVariants|INTERPRO|TSSDistance">
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
> 21      26960070        rs116645811     G       A       .       .
>
> CSQ=|||||||||||||||||||||||||||||||||||,A|ENSG00000154719|ENST00000352957|Transcript|intron_variant||||||rs116645811||9/9|MRPL39||||||||||A:0.0005|ENSP00000284967||CCDS13573.1|ENST00000352957.4:c.969+1077C>T|||||0.840||ENSP00000284967|,A|ENSG00000154719|ENST00000307301|Transcript|missense_variant|1043|1001|334|T/M|aCg/aTg|rs116645811|10/11||MRPL39|||||||YES|tolerated(0.06)|benign(0.001)|A:0.0005|ENSP00000305682|Low_complexity_(Seg):Seg|CCDS33522.1|ENST00000307301.7:c.1001C>T|ENSP00000305682.7:p.Thr334Met||-1|Neutral(0.940)|0.840||ENSP00000305682|
>
> I like this. It won't be so hard to parse it.
>
> I've I'm not wrong I can even choose the field order with "--fields" flag.
> Is this only working for regular VEP column tabbed output file? Does it work
> with VCF output also?
>
> The only thing I don't like is that delimiter being "|" character is also
> used to fill empty fields. It would be great to change delimiter to another
> special character so parsing is much easier.
>
>
> Thank you.
>
> Best regards,
> Guillermo.
>
> This is from input:
>
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
> 21      26960070        rs116645811     G       A       .       .       .
>
> using the command line:
>
> perl variant_effect_predictor.pl -i test.txt -force -database
> -everything -vcf -plugin Blosum62 -plugin Carol -plugin Conservation
> -plugin LD -plugin ProteinDomains -plugin TSSDistance
>
> Hope this is a bit clearer!
>
> Will
>
> On 17 April 2013 11:25, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com> wrote:
>
> Hello,
>
> I'm looking for an example *.vcf output with ALL the "Extra" parameters.
> I've generated some with VEP script but i'm missing some extras never being
> generated like HGNC.
>
> A few lines VCF with all values would be enough, since i'm planning to parse
> "Extra" column.
>
> It also would be great if it includes most of the plugins outputs also :)
>
> Thank you :)
>
> Best regards,
> Guillermo.
>
>
> On 04/16/13 18:00, Guillermo Marco Puche wrote:
>
> On 04/16/13 14:49, Will McLaren wrote:
>
> Hi Guillermo,
>
> There's two distinct ways you can add additional data to the output
> from the VEP.
>
> 1) Custom annotations - here you simply provide the VEP with a
> tabix-indexed position-based data file, and the VEP does the work of
> finding overlaps with your variant input and the data from the file.
>
> 2) Plugins - you write the code to add to or manipulate the internal
> data structures used by the VEP. In its simplest form, a plugin can be
> simply looking up an attribute of some object and adding it to the
> output.
>
> Writing a plugin requires a basic understanding of the Ensembl API,
> but getting a basic plugin working requires only a very small amount
> of code.
>
> Since additional data is being obtained from multiple sources, APIs, files,
> etc.. I guess plugins are the only way to go for me.
>
> The documentation
> (http://www.ensembl.org/info/docs/variation/vep/vep_script.html#plugins)
> explains all of this, but the best way to see how plugins work is to
> look at the existing plugins at
> https://github.com/ensembl-variation/VEP_plugins. I'd suggest looking
> at Conservation.pm and ProteinSeqs.pm as some relatively simple
> examples of retrieving additional data from the API.
>
> Where are packages like package Conservation; comming from?
>
> You should note that using VCF output you will see repeated elements
> in the INFO field added, since the plugin gets run once for every
> variant/transcript overlap; all data appear under the CSQ field in the
> INFO column. Currently there is no way for the VEP via plugins to add
> separate INFO fields, however this is something we are looking into,
> and in fact would be relatively easy to "hack" in for someone
> determined enough (see subroutine vf_list_to_cons in
> Bio::EnsEMBL::Variation::Utils::VEP).
>
> I'll look further into this tomorrow since I've to go now.
>
> A workaround could be simply generating a temp file with extra columns and
> in the end merge original VCF from VEP script with the output from plugins
> for additional columns.
>
> Maybe I missunderstood you. Correct me if i'm wrong please.
>
> Hope this helps, and feel free to ask further questions!
>
> Will McLaren
> Ensembl Variation
>
> Thank you so much.
>
> Best regards,
> Guillermo.
>
> On 16 April 2013 12:58, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com> wrote:
>
> Hello,
>
> I'm in need to develop some extra features for VEP.
>
> My input files are in VCF format and also my output.
>
> But I want to add several additional columns for extra data at the VCF out.
>
> For example,AA conservation score, Biobase description, Biobase link, MAF
> populations, Flanking sequence, Gene description, InterPro_ID and more..
>
> I've been reading the documents and I'm a bit confused about "Custom
> annotations".
> I think since the data I want is extra on the output and not in the input,
> what I should do is develop several Plugins to obtain all the values I need.
>
> I think most of them can be obtained through the Ensembl API even if I'm new
> to this. Other will require more hard coding.
>
> I hope someone can clarify me a bit on this matter.
>
> Thank you.
>
> Best regards,
> Guillermo.
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>




More information about the Dev mailing list