[ensembl-dev] VEP Extra output information

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Wed Apr 17 15:53:52 BST 2013


Again, thank you so much !

I'm looking further VCFTools, maybe it should be the easiest and 
standard way to parse VCF output from VEP.

Thank you.

Best regards,
Guillermo.

On 04/17/13 16:50, Will McLaren wrote:
> Yes, you can customise the fields used and the order they appear in
> with --fields; this applies to both VCF and the normal tab-delimited
> output.
>
> The delimiter is hardcoded I'm afraid, but I'm not sure what you'd
> pick if you did decide to change it. ";" and "," are already used by
> the VCF spec, and ":" appears in HGVS notations and other fields.
>
> If you did want to change it, you'd just need to edit lines 1272 and
> 1275 of ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm.
>
> Will
>
>
>
> On 17 April 2013 15:32, Guillermo Marco Puche
> <guillermo.marco at sistemasgenomicos.com> wrote:
>> Hello Will,
>>
>>
>> On 04/17/13 14:46, Will McLaren wrote:
>>
>> Hello,
>>
>> It's difficult (well, in fact impossible) to provide an example where
>> every field is populated, since some field types are mutually
>> exclusive dependent on the feature type overlapped (for example, you
>> will never see the CELL_TYPE field populated for a variant/transcript
>> combination).
>>
>> If you are interested in this for the purposes of how it looks for a
>> parser, you really want to be looking at the header line added to the
>> VCF by the VEP:
>>
>> ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
>> predicted by VEP. Format:
>> Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|EXON|INTRON|HGNC|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|DISTANCE|CLIN_SIG|CANONICAL|SIFT|PolyPhen|GMAF|ENSP|DOMAINS|CCDS|HGVSc|HGVSp|CELL_TYPE|BLOSUM62|CAROL|Conservation|LinkedVariants|INTERPRO|TSSDistance">
>>
>> This lists the fields that are added in order. Using this you should
>> be able to parse what appears in the body of the file.
>>
>> Here's an example using a bunch of plugins and with the "--everything"
>> flag switched on:
>>
>> ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
>> predicted by VEP. Format:
>> Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|EXON|INTRON|HGNC|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|DISTANCE|CLIN_SIG|CANONICAL|SIFT|PolyPhen|GMAF|ENSP|DOMAINS|CCDS|HGVSc|HGVSp|CELL_TYPE|BLOSUM62|CAROL|Conservation|LinkedVariants|INTERPRO|TSSDistance">
>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
>> 21      26960070        rs116645811     G       A       .       .
>>
>> CSQ=|||||||||||||||||||||||||||||||||||,A|ENSG00000154719|ENST00000352957|Transcript|intron_variant||||||rs116645811||9/9|MRPL39||||||||||A:0.0005|ENSP00000284967||CCDS13573.1|ENST00000352957.4:c.969+1077C>T|||||0.840||ENSP00000284967|,A|ENSG00000154719|ENST00000307301|Transcript|missense_variant|1043|1001|334|T/M|aCg/aTg|rs116645811|10/11||MRPL39|||||||YES|tolerated(0.06)|benign(0.001)|A:0.0005|ENSP00000305682|Low_complexity_(Seg):Seg|CCDS33522.1|ENST00000307301.7:c.1001C>T|ENSP00000305682.7:p.Thr334Met||-1|Neutral(0.940)|0.840||ENSP00000305682|
>>
>> I like this. It won't be so hard to parse it.
>>
>> I've I'm not wrong I can even choose the field order with "--fields" flag.
>> Is this only working for regular VEP column tabbed output file? Does it work
>> with VCF output also?
>>
>> The only thing I don't like is that delimiter being "|" character is also
>> used to fill empty fields. It would be great to change delimiter to another
>> special character so parsing is much easier.
>>
>>
>> Thank you.
>>
>> Best regards,
>> Guillermo.
>>
>> This is from input:
>>
>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
>> 21      26960070        rs116645811     G       A       .       .       .
>>
>> using the command line:
>>
>> perl variant_effect_predictor.pl -i test.txt -force -database
>> -everything -vcf -plugin Blosum62 -plugin Carol -plugin Conservation
>> -plugin LD -plugin ProteinDomains -plugin TSSDistance
>>
>> Hope this is a bit clearer!
>>
>> Will
>>
>> On 17 April 2013 11:25, Guillermo Marco Puche
>> <guillermo.marco at sistemasgenomicos.com> wrote:
>>
>> Hello,
>>
>> I'm looking for an example *.vcf output with ALL the "Extra" parameters.
>> I've generated some with VEP script but i'm missing some extras never being
>> generated like HGNC.
>>
>> A few lines VCF with all values would be enough, since i'm planning to parse
>> "Extra" column.
>>
>> It also would be great if it includes most of the plugins outputs also :)
>>
>> Thank you :)
>>
>> Best regards,
>> Guillermo.
>>
>>
>> On 04/16/13 18:00, Guillermo Marco Puche wrote:
>>
>> On 04/16/13 14:49, Will McLaren wrote:
>>
>> Hi Guillermo,
>>
>> There's two distinct ways you can add additional data to the output
>> from the VEP.
>>
>> 1) Custom annotations - here you simply provide the VEP with a
>> tabix-indexed position-based data file, and the VEP does the work of
>> finding overlaps with your variant input and the data from the file.
>>
>> 2) Plugins - you write the code to add to or manipulate the internal
>> data structures used by the VEP. In its simplest form, a plugin can be
>> simply looking up an attribute of some object and adding it to the
>> output.
>>
>> Writing a plugin requires a basic understanding of the Ensembl API,
>> but getting a basic plugin working requires only a very small amount
>> of code.
>>
>> Since additional data is being obtained from multiple sources, APIs, files,
>> etc.. I guess plugins are the only way to go for me.
>>
>> The documentation
>> (http://www.ensembl.org/info/docs/variation/vep/vep_script.html#plugins)
>> explains all of this, but the best way to see how plugins work is to
>> look at the existing plugins at
>> https://github.com/ensembl-variation/VEP_plugins. I'd suggest looking
>> at Conservation.pm and ProteinSeqs.pm as some relatively simple
>> examples of retrieving additional data from the API.
>>
>> Where are packages like package Conservation; comming from?
>>
>> You should note that using VCF output you will see repeated elements
>> in the INFO field added, since the plugin gets run once for every
>> variant/transcript overlap; all data appear under the CSQ field in the
>> INFO column. Currently there is no way for the VEP via plugins to add
>> separate INFO fields, however this is something we are looking into,
>> and in fact would be relatively easy to "hack" in for someone
>> determined enough (see subroutine vf_list_to_cons in
>> Bio::EnsEMBL::Variation::Utils::VEP).
>>
>> I'll look further into this tomorrow since I've to go now.
>>
>> A workaround could be simply generating a temp file with extra columns and
>> in the end merge original VCF from VEP script with the output from plugins
>> for additional columns.
>>
>> Maybe I missunderstood you. Correct me if i'm wrong please.
>>
>> Hope this helps, and feel free to ask further questions!
>>
>> Will McLaren
>> Ensembl Variation
>>
>> Thank you so much.
>>
>> Best regards,
>> Guillermo.
>>
>> On 16 April 2013 12:58, Guillermo Marco Puche
>> <guillermo.marco at sistemasgenomicos.com> wrote:
>>
>> Hello,
>>
>> I'm in need to develop some extra features for VEP.
>>
>> My input files are in VCF format and also my output.
>>
>> But I want to add several additional columns for extra data at the VCF out.
>>
>> For example,AA conservation score, Biobase description, Biobase link, MAF
>> populations, Flanking sequence, Gene description, InterPro_ID and more..
>>
>> I've been reading the documents and I'm a bit confused about "Custom
>> annotations".
>> I think since the data I want is extra on the output and not in the input,
>> what I should do is develop several Plugins to obtain all the values I need.
>>
>> I think most of them can be obtained through the Ensembl API even if I'm new
>> to this. Other will require more hard coding.
>>
>> I hope someone can clarify me a bit on this matter.
>>
>> Thank you.
>>
>> Best regards,
>> Guillermo.
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130417/903ff494/attachment.html>


More information about the Dev mailing list