[ensembl-dev] VEP Extra output information

Will McLaren wm2 at ebi.ac.uk
Wed Apr 17 13:46:49 BST 2013


Hello,

It's difficult (well, in fact impossible) to provide an example where
every field is populated, since some field types are mutually
exclusive dependent on the feature type overlapped (for example, you
will never see the CELL_TYPE field populated for a variant/transcript
combination).

If you are interested in this for the purposes of how it looks for a
parser, you really want to be looking at the header line added to the
VCF by the VEP:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
predicted by VEP. Format:
Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|EXON|INTRON|HGNC|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|DISTANCE|CLIN_SIG|CANONICAL|SIFT|PolyPhen|GMAF|ENSP|DOMAINS|CCDS|HGVSc|HGVSp|CELL_TYPE|BLOSUM62|CAROL|Conservation|LinkedVariants|INTERPRO|TSSDistance">

This lists the fields that are added in order. Using this you should
be able to parse what appears in the body of the file.

Here's an example using a bunch of plugins and with the "--everything"
flag switched on:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
predicted by VEP. Format:
Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|EXON|INTRON|HGNC|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|DISTANCE|CLIN_SIG|CANONICAL|SIFT|PolyPhen|GMAF|ENSP|DOMAINS|CCDS|HGVSc|HGVSp|CELL_TYPE|BLOSUM62|CAROL|Conservation|LinkedVariants|INTERPRO|TSSDistance">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
21      26960070        rs116645811     G       A       .       .
 CSQ=|||||||||||||||||||||||||||||||||||,A|ENSG00000154719|ENST00000352957|Transcript|intron_variant||||||rs116645811||9/9|MRPL39||||||||||A:0.0005|ENSP00000284967||CCDS13573.1|ENST00000352957.4:c.969+1077C>T|||||0.840||ENSP00000284967|,A|ENSG00000154719|ENST00000307301|Transcript|missense_variant|1043|1001|334|T/M|aCg/aTg|rs116645811|10/11||MRPL39|||||||YES|tolerated(0.06)|benign(0.001)|A:0.0005|ENSP00000305682|Low_complexity_(Seg):Seg|CCDS33522.1|ENST00000307301.7:c.1001C>T|ENSP00000305682.7:p.Thr334Met||-1|Neutral(0.940)|0.840||ENSP00000305682|

This is from input:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
21      26960070        rs116645811     G       A       .       .       .

using the command line:

perl variant_effect_predictor.pl -i test.txt -force -database
-everything -vcf -plugin Blosum62 -plugin Carol -plugin Conservation
-plugin LD -plugin ProteinDomains -plugin TSSDistance

Hope this is a bit clearer!

Will

On 17 April 2013 11:25, Guillermo Marco Puche
<guillermo.marco at sistemasgenomicos.com> wrote:
> Hello,
>
> I'm looking for an example *.vcf output with ALL the "Extra" parameters.
> I've generated some with VEP script but i'm missing some extras never being
> generated like HGNC.
>
> A few lines VCF with all values would be enough, since i'm planning to parse
> "Extra" column.
>
> It also would be great if it includes most of the plugins outputs also :)
>
> Thank you :)
>
> Best regards,
> Guillermo.
>
>
> On 04/16/13 18:00, Guillermo Marco Puche wrote:
>
> On 04/16/13 14:49, Will McLaren wrote:
>
> Hi Guillermo,
>
> There's two distinct ways you can add additional data to the output
> from the VEP.
>
> 1) Custom annotations - here you simply provide the VEP with a
> tabix-indexed position-based data file, and the VEP does the work of
> finding overlaps with your variant input and the data from the file.
>
> 2) Plugins - you write the code to add to or manipulate the internal
> data structures used by the VEP. In its simplest form, a plugin can be
> simply looking up an attribute of some object and adding it to the
> output.
>
> Writing a plugin requires a basic understanding of the Ensembl API,
> but getting a basic plugin working requires only a very small amount
> of code.
>
> Since additional data is being obtained from multiple sources, APIs, files,
> etc.. I guess plugins are the only way to go for me.
>
> The documentation
> (http://www.ensembl.org/info/docs/variation/vep/vep_script.html#plugins)
> explains all of this, but the best way to see how plugins work is to
> look at the existing plugins at
> https://github.com/ensembl-variation/VEP_plugins. I'd suggest looking
> at Conservation.pm and ProteinSeqs.pm as some relatively simple
> examples of retrieving additional data from the API.
>
> Where are packages like package Conservation; comming from?
>
> You should note that using VCF output you will see repeated elements
> in the INFO field added, since the plugin gets run once for every
> variant/transcript overlap; all data appear under the CSQ field in the
> INFO column. Currently there is no way for the VEP via plugins to add
> separate INFO fields, however this is something we are looking into,
> and in fact would be relatively easy to "hack" in for someone
> determined enough (see subroutine vf_list_to_cons in
> Bio::EnsEMBL::Variation::Utils::VEP).
>
> I'll look further into this tomorrow since I've to go now.
>
> A workaround could be simply generating a temp file with extra columns and
> in the end merge original VCF from VEP script with the output from plugins
> for additional columns.
>
> Maybe I missunderstood you. Correct me if i'm wrong please.
>
> Hope this helps, and feel free to ask further questions!
>
> Will McLaren
> Ensembl Variation
>
> Thank you so much.
>
> Best regards,
> Guillermo.
>
> On 16 April 2013 12:58, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com> wrote:
>
> Hello,
>
> I'm in need to develop some extra features for VEP.
>
> My input files are in VCF format and also my output.
>
> But I want to add several additional columns for extra data at the VCF out.
>
> For example,AA conservation score, Biobase description, Biobase link, MAF
> populations, Flanking sequence, Gene description, InterPro_ID and more..
>
> I've been reading the documents and I'm a bit confused about "Custom
> annotations".
> I think since the data I want is extra on the output and not in the input,
> what I should do is develop several Plugins to obtain all the values I need.
>
> I think most of them can be obtained through the Ensembl API even if I'm new
> to this. Other will require more hard coding.
>
> I hope someone can clarify me a bit on this matter.
>
> Thank you.
>
> Best regards,
> Guillermo.
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>




More information about the Dev mailing list