[ensembl-dev] VEP Extra output information

Guillermo Marco Puche guillermo.marco at sistemasgenomicos.com
Mon Apr 22 17:09:42 BST 2013


Hello Will,

After applying your fixes there's still no output.

Something must be wrong with TABIX file handler. $pos_string seems 
correct, i've been printing on screen the values and they seem correct :)

On the other hand "while TABIX" loop is not being executed. It never 
iterates through this loop. But there's also no error with file handler. 
So I don't know what is wrong.

Guillermo.

On 04/22/13 17:56, Will McLaren wrote:
> A couple of other bugs I've spotted - you've got some variables being
> declared in the wrong scope which means you won't get anything in your
> results. Your run method should look like:
>
> sub run {
> my ($self, $tva) = @_;
> my $vf = $tva->variation_feature;
> my $pos_string = sprintf("%s:%i-%i", $vf->{chr}, $vf->{start}, $vf->{end});
>
> #my $fichero = "test.vcf";
>
> my $info;
>
> open TABIX, sprintf("tabix %s %s |", $self->{file}, $pos_string);
> #open TABIX file handler for current position
> while(<TABIX>){
> chomp;
> if (my $line =~ /^#/){ next; } #skip header line
> my @split = split /\t/;
> $info = $split[7]; #store 8th column (INFO)
> }
> close TABIX; #close TABIX file handler
>
>      return {
>      "SAMPLES" => $info,
>      };
> }
>
> On 22 April 2013 16:38, Will McLaren <wm2 at ebi.ac.uk> wrote:
>> You need to store the file name on the object itself. In the new method, replace
>>
>> my $file = $self->params->[0];
>>
>> with
>>
>> $self->{file} = $self->params->[0];
>>
>> Will
>>
>> On 22 April 2013 16:33, Guillermo Marco Puche
>> <guillermo.marco at sistemasgenomicos.com> wrote:
>>> Hello,
>>>
>>> Thank you for that information Will. It's so useful.
>>>
>>> Ok I've managed to almost finish the plugin:
>>> https://github.com/guillermomarco/vcf_input/blob/master/vcf_input.pm
>>>
>>> I'm still getting an error with sprintf when trying to run it with VEP.
>>>
>>> 2013-04-22 17:18:07 - Warning: plugin
>>> vcf_input,/home/likewise-open/SGNET/gmarco/scripts/genomics/global/annotation/ensembl71/VCF_Input/test.vcf.gz
>>> version (0.1) does not match the current VEP version (71)
>>> 2013-04-22 17:18:07 - You may experience unexpected behaviour with this
>>> plugin
>>> 2013-04-22 17:18:07 - Loaded plugin: vcf_input
>>> 2013-04-22 17:18:07 - Output fields redefined (26 defined)
>>> 2013-04-22 17:18:09 - INFO: Database will be accessed when using --hgvs
>>> 2013-04-22 17:18:09 - Starting...
>>> 2013-04-22 17:18:09 - Read 3 variants into buffer
>>> 2013-04-22 17:18:09 - Reading transcript data from cache and/or database
>>> [===============================================]  [ 100% ]
>>> 2013-04-22 17:18:09 - Retrieved 74 transcripts (0 mem, 74 cached, 0 DB, 0
>>> duplicates)
>>> 2013-04-22 17:18:09 - Reading regulatory data from cache and/or database
>>> [===============================================]  [ 100% ]
>>> 2013-04-22 17:18:09 - Retrieved 539 regulatory features (0 mem, 539 cached,
>>> 0 DB, 0 duplicates)
>>> 2013-04-22 17:18:09 - Calculating consequences
>>> [>                                              ]    [ 5% ]
>>> ERROR: Forked process failed
>>> Use of uninitialized value in sprintf at
>>> /home/likewise-open/SGNET/gmarco/.vep/Plugins/vcf_input.pm line 69.
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>> On 04/22/13 14:34, Will McLaren wrote:
>>>
>>> Hello,
>>>
>>> The plugin is run once for each combination of variant and overlapped
>>> feature.
>>>
>>> Let's say your variant overlaps 4 transcripts (they may be splice
>>> variants of the same gene, or 4 different genes, the principal is the
>>> same). In this case, the plugin's "run" method will be executed 4
>>> times for that variant, once for each transcript. The $tva
>>> TranscriptVariationAllele object in each run will have a different
>>> transcript "attached" to it. I'd recommend stepping through how a
>>> plugin works using Perl's debugger - simply add the line:
>>>
>>> $DB::single = 1;
>>>
>>> somewhere at the start of the run method, then run the vep with perl's -d
>>> flag:
>>>
>>> perl -d variant_effect_predictor.pl [etc]
>>>
>>> I'm not sure, but it seems like you're preloading all of the VCF in
>>> your plugin at the beginning, and then trying to add the info one at a
>>> time. This is not an ideal way to do it, as for large VCF files you
>>> may run in to memory usage issues, and especially using an array you
>>> may not be able to reliably link the lines from your VCF to the lines
>>> of input going into the VEP (a hash keyed on position would be much
>>> better).
>>>
>>> A much better way to do it would be to prepare the VCF file with
>>> tabix. The tabix utility can then retrieve just the relevant line from
>>> the VCF on demand (this is very quick, and you can cache data on the
>>> plugin's hash structure between separate executions of the "run"
>>> method).
>>>
>>> In the dbNSFP.pm plugin on GitHub, I do something very very similar to
>>> this. First, I get the variation feature object being passed to the
>>> "run" method - this contains the genomic coordinates of the current
>>> variant:
>>>
>>> my $vf = $tva->variation_feature;
>>>
>>> I then create a string to pass to tabix, which is chr:start-end; this
>>> means tabix will retrieve the lines from your VCF in that range:
>>>
>>> my $pos_string = sprintf("%s:%i-%i", $vf->{chr}, $vf->{start}, $vf->{end});
>>>
>>> I then run tabix and open the output as a pipe:
>>>
>>> open TABIX, sprintf("tabix %s %s |", $self->{file}, $pos_string);
>>>
>>> I can then read lines of VCF from the <TABIX> filehandle, parse them,
>>> and finally add the data to the %return hash that gets sent back at
>>> the end of the plugin.
>>>
>>> This return hash must contain key/value pairs that will be printed in
>>> the output. For example, lets say I want to add the variant name from
>>> the VCF file I've just parsed:
>>>
>>> return {
>>>    "VAR_NAME" => $var_name,
>>> }
>>>
>>> where $var_name = 'rs123'. Then in the output you would see (normal,
>>> tab-delimited):
>>>
>>> VAR_NAME=rs123
>>>
>>> appear in the Extra column of your output file. You could add multiple
>>> values for VAR_NAME, but you'd have to write that as a string, for
>>> example:
>>>
>>> return {
>>>    "VAR_NAME" => join(",", ($var_name1, $var_name2)),
>>> }
>>>
>>> which would give you e.g.
>>>
>>> VAR_NAME=rs123,rs456
>>>
>>> I'm afraid also at this juncture I have to point out that I'm nearing
>>> the limit of support I'm meant to be giving out to one individual.
>>> While we are here to help and will answer any reasonable questions you
>>> have, we have to stop short of doing people's jobs for them! Anything
>>> more than a base level of help might have to be considered as a
>>> collaboration, and this would require communication between our
>>> respective supervisors.
>>>
>>> I hope that the documentation on the website and the example code
>>> (which we try to comment as thoroughly as we can) should be enough to
>>> keep you going, and of course I don't want to put you off using and
>>> helping us improve the VEP.
>>>
>>> Regards
>>>
>>> Will
>>>
>>> On 22 April 2013 12:59, Guillermo Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>> Hello,
>>>
>>> I'm starting to develop a simple plugin to write the INFO column from VCF
>>> input into VEP output.
>>> As far as I've been seen in Git VEP Plugin repo, VEP script will right the
>>> value returned by the plugin in run function.
>>>
>>> Suppose that I've an array of values with the INFO column. Something like
>>> this:
>>> my @info_column = ("info_row1","info_row2","info_row3")
>>>
>>> An array containing the content of INFO column for each line of VCF input.
>>> How do I associate each value to the corresponding VEP line output?
>>> I guess I cannot simply return the array as the result of my plugin.
>>>
>>> Is plugin executed for every line by VEP script?
>>>
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>> On 04/22/13 12:07, Guillermo Marco Puche wrote:
>>>
>>> Hello,
>>>
>>> Me neither. So I've no clue. I hope someone else can help me.
>>>
>>> I've also been looking the plugin code you mentioned.
>>> I don't really see how to extract the columns from input VCF and intersect
>>> them with VEP output.
>>>
>>> Regards,
>>> Guillermo.
>>>
>>> On 04/22/13 12:01, Will McLaren wrote:
>>>
>>> Hello,
>>>
>>> I haven't used VCFannotate myself, perhaps I was wrong!
>>>
>>> I know of other VEP users who have used it though, maybe someone on
>>> the list will read this email and can give you some help.
>>>
>>> Cheers
>>>
>>> Will
>>>
>>> On 22 April 2013 10:58, Guillermo Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>> Hello Will,
>>>
>>> It seems VCFannotate is made for "Intersect the records in the VCF file with
>>> targets provided in a BED file.".
>>> How I'm supposed to intersect the output from vep script (VCF or VEP file)
>>> with my input file VCF?
>>>
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>> On 04/19/13 11:31, Will McLaren wrote:
>>>
>>> Hi Guillermo,
>>>
>>> The --custom system doesn't quite work like that. Currently it is set
>>> up to either provide only the ID or the coordinates of any features it
>>> finds overlapping your variants in the custom file. It can't pull
>>> particular fields from a VCF in the way you describe here.
>>>
>>> To do so, you'd either have to write a plugin to do this (see the
>>> dbNSFP.pm plugin for an example of doing similar), or use VCFannotate,
>>> which I believe can do this sort of thing out of the box.
>>>
>>> Regards
>>>
>>> Will
>>>
>>> On 19 April 2013 07:42, Guillermo Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>> Hello,
>>>
>>> I'm trying to get the following fields from the VCF input with the --custom
>>> flag.
>>> I want to add the following columns to the VEP output file:
>>>
>>> #CHROM        POS     ID      REF     ALT     QUA
>>>
>>>  From what I've been reading this is possible to achieve using custom flag
>>> and VCF input, since third column is used as identifier (ID, ie: rs6054257)
>>>
>>>
>>> I've been trying with the following command:
>>>
>>> ./variant_effect_predictor.pl -i myinput.vcf.gz -format vcf -o myoutput.vep
>>> --cache --everything --maf_1kg --force_overwrite --plugin
>>> Condel,/home/likewise-open/SGNET/gmarco/.vep/Plugins/config/Condel/config,b
>>> --custom myinput.vcf.gz,CHROM,vcf,exact,0 --fields
>>> CHROM,Existing_variation,AFR_MAF,AMR_MAF,ASN_MAF,EUR_MAF,GMAF,Feature,Feature_type,HGVSc,HGVSp,Consequence,Domains,MOTIF_NAME,MOTIF_POS,HIGH_INF_POS,Condel,SIFT,Polyphen,Cell_Type,Canonical,CCDS,Intron,Exon
>>>
>>> I got an output like this:
>>>
>>> #CHROM    Existing_variation    AFR_MAF    AMR_MAF    ASN_MAF    EUR_MAF
>>> GMAF    Feature    Feature_type    HGVSc    HGVSp    Consequence    Domains
>>> MOTIF_NAME    MOTIF_POS    HIGH_INF_POS    Condel    SIFT    Polyphen
>>> Cell_Type    Canonical    CCDS    Intron    Exon
>>>
>>> 1:6500735-6500735    -    -    -    -    -    -    NM_031475.2    Transcript
>>> NM_031475.2:c.725C>T    NP_113663.2:p.Thr242Ile    missense_variant    -
>>> -    -    -    deleterious(0.765)    deleterious(0.03)    -    -    -    -
>>> -    -
>>> 1:6501044-6501044    rs2311045    0.28    0.12    0.21    0.13    G:0.1822
>>> ENSR00000074413    RegulatoryFeature    -    -    regulatory_region_variant
>>> -    -    -    -    -    -    -    -    -    -    -    -
>>> 1:6501044-6501044    rs2311045    0.28    0.12    0.21    0.13    G:0.1822
>>> CCDS70.1    Transcript    CCDS70.1:c.909C>G    CCDS70.1:c.909C>G(p.=)
>>> synonymous_variant    -    -    -    -    -    -    -    -    -    CCDS70.1
>>> -    -
>>>
>>> Position being show in CHROM column makes no sense to me if it's the key
>>> identifier. If you're using the "exact" configuration in custom flag with no
>>> overlapping why it's an interval shown?
>>>
>>> I would like that POS being shown in a second column called POS like in
>>> original VCF and so on with the rest of custom missing fields. Output format
>>> would be:
>>>
>>> #CHROM        POS     ID      REF     ALT     QUA     Existing_variation    AFR_MAF    AMR_MAF
>>> ASN_MAF    EUR_MAF    GMAF    Feature    Feature_type    HGVSc    HGVSp
>>> Consequence    Domains    MOTIF_NAME    MOTIF_POS    HIGH_INF_POS    Condel
>>> SIFT    Polyphen    Cell_Type    Canonical &nbs
>>>   p;
>>> CCDS    Intron    Exon
>>> chr1  6501044 rs2311045 0.28    0.12    0.21    0.13    G:0.1822
>>> ENSR00000074413    RegulatoryFeature    -    -    regulatory_region_variant
>>> -    -    -    -    -    -    -    -    -    -    -    -
>>>
>>> I've been experiencing errors if I try with the following custom flag:
>>> --custom myinput.vcf.gz,CHROM,POS,ID,REF,ALT,QUA,vcf,exact,0
>>> I've no idea how to are more than one custom flag at a time, or not even if
>>> this is possible. What would be the correct way to do this?
>>>
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>> On 04/18/13 13:55, Guillermo Marco Puche wrote:
>>>
>>> Hello,
>>>
>>> --fields command is working flawlessly ! I love it. It has saved me so much
>>> work.
>>>
>>> ./variant_effect_predictor.pl -i
>>> /home/likewise-open/SGNET/gmarco/VEP_71/in/Oto2_collect_not_annotated.vcf -o
>>> /home/likewise-open/SGNET/gmarco/VEP_71/out/output.fields -format vcf
>>> --cache --everything --maf_1kg --force_overwrite --fork 2 --plugin
>>> Condel,/home/likewise-open/SGNET/gmarco/.vep/Plugins/config/Condel/config,b
>>> --fields
>>> Existing_variation,AFR_MAF,AMR_MAF,ASN_MAF,EUR_MAF,GMAF,Feature,Feature_type,HGVSc,HGVSp,Consequence,Domains,MOTIF_NAME,MOTIF_POS,HIGH_INF_POS,Condel,SIFT,Polyphen,Cell_Type,Canonical,CCDS,Intron,Exon
>>>
>>>
>>> Now I need to figure out how to create a final output file which is the
>>> relation of VCF input (Chromosome, Position, Ref_Allele, Var_Allele) with
>>> the VEP output. To display all variants info for each chromosome.
>>>
>>> Guillermo.
>>>
>>> On 04/18/13 10:40, Will McLaren wrote:
>>>
>>> Hello,
>>>
>>> The only way to do this would be to specify each Extra column as a
>>> separate column using --fields.
>>>
>>> Will
>>>
>>> On 18 April 2013 08:29, Guillermo Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>> Hello,
>>>
>>> Finally I'm not going to use VCF format as output.
>>>  From original input VFC I need to print into my output Chromosome, Position,
>>> Ref_Allele and Var_Allele columns.
>>>
>>> I prefer standard VEP column tabbed file for output, since it's much easier
>>> to parse "Extra" column because all extra parameters are delimited by ";".
>>> Is there any way to force VEP to print empty extra parameters?
>>>
>>> ie:
>>>
>>> 1_6508122_G/C    1:6508122    C    ENSESTG00000022320    ENSESTT00000056337
>>> Transcript    downstream_gene_variant    -    -    -    -    -    rs11808508
>>> AFR_MAF=;DISTANCE=2305;GMAF=;ASN_MAF=;EUR_MAF=;ENSP=ENSESTP00000056337;CANONICAL=YES;AMR_MAF=
>>>
>>> Or simply fill print empty extra empty fields with =EMPTY.
>>>
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>> On 04/17/13 16:53, Guillermo Marco Puche wrote:
>>>
>>> Again, thank you so much !
>>>
>>> I'm looking further VCFTools, maybe it should be the easiest and standard
>>> way to parse VCF output from VEP.
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>> On 04/17/13 16:50, Will McLaren wrote:
>>>
>>> Yes, you can customise the fields used and the order they appear in
>>> with --fields; this applies to both VCF and the normal tab-delimited
>>> output.
>>>
>>> The delimiter is hardcoded I'm afraid, but I'm not sure what you'd
>>> pick if you did decide to change it. ";" and "," are already used by
>>> the VCF spec, and ":" appears in HGVS notations and other fields.
>>>
>>> If you did want to change it, you'd just need to edit lines 1272 and
>>> 1275 of ensembl-variation/modules/Bio/EnsEMBL/Variation/Utils/VEP.pm.
>>>
>>> Will
>>>
>>>
>>>
>>> On 17 April 2013 15:32, Guillermo Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>> Hello Will,
>>>
>>>
>>> On 04/17/13 14:46, Will McLaren wrote:
>>>
>>> Hello,
>>>
>>> It's difficult (well, in fact impossible) to provide an example where
>>> every field is populated, since some field types are mutually
>>> exclusive dependent on the feature type overlapped (for example, you
>>> will never see the CELL_TYPE field populated for a variant/transcript
>>> combination).
>>>
>>> If you are interested in this for the purposes of how it looks for a
>>> parser, you really want to be looking at the header line added to the
>>> VCF by the VEP:
>>>
>>> ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
>>> predicted by VEP. Format:
>>> Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|EXON|INTRON|HGNC|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|DISTANCE|CLIN_SIG|CANONICAL|SIFT|PolyPhen|GMAF|ENSP|DOMAINS|CCDS|HGVSc|HGVSp|CELL_TYPE|BLOSUM62|CAROL|Conservation|LinkedVariants|INTERPRO|TSSDistance">
>>>
>>> This lists the fields that are added in order. Using this you should
>>> be able to parse what appears in the body of the file.
>>>
>>> Here's an example using a bunch of plugins and with the "--everything"
>>> flag switched on:
>>>
>>> ##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence type as
>>> predicted by VEP. Format:
>>> Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|EXON|INTRON|HGNC|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|DISTANCE|CLIN_SIG|CANONICAL|SIFT|PolyPhen|GMAF|ENSP|DOMAINS|CCDS|HGVSc|HGVSp|CELL_TYPE|BLOSUM62|CAROL|Conservation|LinkedVariants|INTERPRO|TSSDistance">
>>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
>>> 21      26960070        rs116645811     G       A       .       .
>>>
>>> CSQ=|||||||||||||||||||||||||||||||||||,A|ENSG00000154719|ENST00000352957|Transcript|intron_variant||||||rs116645811||9/9|MRPL39||||||||||A:0.0005|ENSP00000284967||CCDS13573.1|ENST00000352957.4:c.969+1077C>T|||||0.840||ENSP00000284967|,A|ENSG00000154719|ENST00000307301|Transcript|missense_variant|1043|1001|334|T/M|aCg/aTg|rs116645811|10/11||MRPL39|||||||YES|tolerated(0.06)|benign(0.001)|A:0.0005|ENSP00000305682|Low_complexity_(Seg):Seg|CCDS33522.1|ENST00000307301.7:c.1001C>T|ENSP00000305682.7:p.Thr334Met||-1|Neutral(0.940)|0.840||ENSP00000305682|
>>>
>>> I like this. It won't be so hard to parse it.
>>>
>>> I've I'm not wrong I can even choose the field order with "--fields" flag.
>>> Is this only working for regular VEP column tabbed output file? Does it work
>>> with VCF output also?
>>>
>>> The only thing I don't like is that delimiter being "|" character is also
>>> used to fill empty fields. It would be great to change delimiter to another
>>> special character so parsing is much easier.
>>>
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>> This is from input:
>>>
>>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
>>> 21      26960070        rs116645811     G       A       .       .       .
>>>
>>> using the command line:
>>>
>>> perl variant_effect_predictor.pl -i test.txt -force -database
>>> -everything -vcf -plugin Blosum62 -plugin Carol -plugin Conservation
>>> -plugin LD -plugin ProteinDomains -plugin TSSDistance
>>>
>>> Hope this is a bit clearer!
>>>
>>> Will
>>>
>>> On 17 April 2013 11:25, Guillermo Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>> Hello,
>>>
>>> I'm looking for an example *.vcf output with ALL the "Extra" parameters.
>>> I've generated some with VEP script but i'm missing some extras never being
>>> generated like HGNC.
>>>
>>> A few lines VCF with all values would be enough, since i'm planning to parse
>>> "Extra" column.
>>>
>>> It also would be great if it includes most of the plugins outputs also :)
>>>
>>> Thank you :)
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>>
>>> On 04/16/13 18:00, Guillermo Marco Puche wrote:
>>>
>>> On 04/16/13 14:49, Will McLaren wrote:
>>>
>>> Hi Guillermo,
>>>
>>> There's two distinct ways you can add additional data to the output
>>> from the VEP.
>>>
>>> 1) Custom annotations - here you simply provide the VEP with a
>>> tabix-indexed position-based data file, and the VEP does the work of
>>> finding overlaps with your variant input and the data from the file.
>>>
>>> 2) Plugins - you write the code to add to or manipulate the internal
>>> data structures used by the VEP. In its simplest form, a plugin can be
>>> simply looking up an attribute of some object and adding it to the
>>> output.
>>>
>>> Writing a plugin requires a basic understanding of the Ensembl API,
>>> but getting a basic plugin working requires only a very small amount
>>> of code.
>>>
>>> Since additional data is being obtained from multiple sources, APIs, files,
>>> etc.. I guess plugins are the only way to go for me.
>>>
>>> The documentation
>>> (http://www.ensembl.org/info/docs/variation/vep/vep_script.html#plugins)
>>> explains all of this, but the best way to see how plugins work is to
>>> look at the existing plugins at
>>> https://github.com/ensembl-variation/VEP_plugins. I'd suggest looking
>>> at Conservation.pm and ProteinSeqs.pm as some relatively simple
>>> examples of retrieving additional data from the API.
>>>
>>> Where are packages like package Conservation; comming from?
>>>
>>> You should note that using VCF output you will see repeated elements
>>> in the INFO field added, since the plugin gets run once for every
>>> variant/transcript overlap; all data appear under the CSQ field in the
>>> INFO column. Currently there is no way for the VEP via plugins to add
>>> separate INFO fields, however this is something we are looking into,
>>> and in fact would be relatively easy to "hack" in for someone
>>> determined enough (see subroutine vf_list_to_cons in
>>> Bio::EnsEMBL::Variation::Utils::VEP).
>>>
>>> I'll look further into this tomorrow since I've to go now.
>>>
>>> A workaround could be simply generating a temp file with extra columns and
>>> in the end merge original VCF from VEP script with the output from plugins
>>> for additional columns.
>>>
>>> Maybe I missunderstood you. Correct me if i'm wrong please.
>>>
>>> Hope this helps, and feel free to ask further questions!
>>>
>>> Will McLaren
>>> Ensembl Variation
>>>
>>> Thank you so much.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>> On 16 April 2013 12:58, Guillermo Marco Puche
>>> <guillermo.marco at sistemasgenomicos.com> wrote:
>>>
>>> Hello,
>>>
>>> I'm in need to develop some extra features for VEP.
>>>
>>> My input files are in VCF format and also my output.
>>>
>>> But I want to add several additional columns for extra data at the VCF out.
>>>
>>> For example,AA conservation score, Biobase description, Biobase link, MAF
>>> populations, Flanking sequence, Gene description, InterPro_ID and more..
>>>
>>> I've been reading the documents and I'm a bit confused about "Custom
>>> annotations".
>>> I think since the data I want is extra on the output and not in the input,
>>> what I should do is develop several Plugins to obtain all the values I need.
>>>
>>> I think most of them can be obtained through the Ensembl API even if I'm new
>>> to this. Other will require more hard coding.
>>>
>>> I hope someone can clarify me a bit on this matter.
>>>
>>> Thank you.
>>>
>>> Best regards,
>>> Guillermo.
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130422/11b5ea2f/attachment.html>


More information about the Dev mailing list