[ensembl-dev] problem with VEP and bigwig
Will McLaren
wm2 at ebi.ac.uk
Tue Feb 26 16:33:14 GMT 2013
Hi Pierre,
Thanks for finding this - it looks like the way the VEP is parsing bigWig
files is a bit messed up. I'll take a look.
Regarding the VCF, currently there's no way to get anything but either the
identifier (the third column of the VCF) or the coordinates - this is the
last parameter as described here
http://www.ensembl.org/info/docs/variation/vep/vep_script.html#custom_options
A plugin might not be much help, as this jumps in after the custom
annotation has been added, and the only data retained in memory is the
chrom, start, end and name/identifier - the rest of the data from the VCF
(or whatever custom file) is not kept. The plugin would have to re-read the
data from file, which is not too complex to do (see cache_custom_annotation
in VEP.pm for the code we use).
VCFtools has some components that may be of use to you - I know others have
used vcf-annotate.
Hope this helps
Will McLaren
Ensembl Variation
On 26 February 2013 15:21, Pierre Lindenbaum <
pierre.lindenbaum at univ-nantes.fr> wrote:
> Hi all,
>
> I cannot make VEP working with a bigwig file.
>
> In the following script, I
>
> * create a VCF with one variation
> * create a wig file with one range overlaping the variation
> * convert the wig to bigwig
> * annotate the variation with VEP:
>
> echo "fixedStep chrom=22 start=38823000 step=1 span=1000" > test.wig
> echo "99" >> test.wig
> echo "22 48823000">> chrominfo.txt
> /path/to/ucsc/wigToBigWig test.wig chrominfo.txt test.bw
> /path/to/ucsc/bigWigSummary test.bw 22 38823170 38823190 1
> 99
> echo "##fileformat=VCFv4.1" > test.vep.vcf
> echo "#CHROM POS ID REF ALT QUAL FILTER INFO" >>
> test.vep.vcf
> echo "22 38823180 . G T 100.0 . ." >> test.vep.vcf
> /path/to/variant_effect_**predictor/variant_effect_**predictor.pl<http://variant_effect_predictor.pl>--write_cache --cache --dir cache \
> --fasta human_g1k_v37.fasta \
> --format vcf --force_overwrite -\
> --custom test.bw,MYBIGWIG,bigwig,**overlap,0 \
> -i test.vep.vcf -o test.vep.txt
>
> Here is the output:
>
>
> 2013-02-26 16:31:56 - Checking/creating FASTA index
> 2013-02-26 16:31:56 - Read existing cache info
> 2013-02-26 16:31:57 - Starting...
> 2013-02-26 16:31:57 - Read 1 variants into buffer
> 2013-02-26 16:31:57 - Reading transcript data from cache and/or database
> [=============================**==================] [ 100% ]
> 2013-02-26 16:31:57 - Retrieved 271 transcripts (0 mem, 271 cached, 0 DB,
> 0 duplicates)
> 2013-02-26 16:31:57 - Analyzing chromosome 22
> 2013-02-26 16:31:57 - Caching custom annotations
> [=============================**==================] [ 100% ]
> 2013-02-26 16:31:57 - Retrieved 2 custom annotations (2 MYBIGWIG)
> 2013-02-26 16:31:57 - Analyzing custom annotations
> [> ] [ 0% ]Argument
> "fixedStep chrom=22 start=38823000 step=1 span=1000" isn't numeric in
> numeric ge (>=) at /path/to/variant_effect_**predictor/Bio/EnsEMBL/**Variation/Utils/VEP.pm
> line 1915.
> [=============================**==================] [ 100% ]
> 2013-02-26 16:31:57 - Analyzing variants
> [=============================**==================] [ 100% ]
> 2013-02-26 16:31:57 - Calculating consequences
> 2013-02-26 16:31:57 - Processed 1 total variants (1 vars/sec, 1 vars/sec
> total)
> 2013-02-26 16:31:57 - Finished!
>
>
> and the file test.vep.txt
>
> ## ENSEMBL VARIANT EFFECT PREDICTOR v2.8
> ## Output produced at 2013-02-26 16:31:57
> ## Connected to homo_sapiens_core_70_37 on ensembldb.ensembl.org
> ## Using cache in /commun/data/pubdb/ensembl/**vep/cache/homo_sapiens/70
> ## Using API version 70, DB version 70
> ## Extra column keys:
> ## CELL_TYPE : List of cell types and classifications for regulatory
> feature
> ## DISTANCE : Shortest distance from variant to transcript
> ## MYBIGWIG : test.bw (overlap)
> #Uploaded_variation Location Allele Gene Feature Feature_type
> Consequence cDNA_position CDS_position Protein_position
> Amino_acids Codons Existing_variation Extra
> 22_38823180_G/T 22:38823180 T ENSG00000228620 ENST00000433230
> Transcript non_coding_exon_variant,nc_**transcript_variant 395 -
> - - --
> 22_38823180_G/T 22:38823180 T ENSG00000168135 ENST00000303592
> Transcript missense_variant 1217 958 320 P/T Cct/Act -
>
>
> while I'm here :-) when I use --custom with a VCF indexed with tabix, VEP
> only shows the range where here found the data (e.g:" --custom
> /path/to/ALL.wgs.phase1_**release_v3.20101123.snps_**
> indels_sv.sites.vcf.gz,**1KGRel3,vcf,exact,1 " )
>
>
> rs3887390 22:46136619 T - - - intergenic_variant - -
> - - - - 1KGRel3=22:46136619-461366
>
>
> Is it possible to display something else (e.g. a component of the INFO
> field) or should I write a plugin ?
>
>
> Thank you,
>
> Pierre
>
>
> ______________________________**_________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/**mailman/listinfo/dev<http://lists.ensembl.org/mailman/listinfo/dev>
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130226/82921115/attachment.html>
More information about the Dev
mailing list