[ensembl-dev] problem with VEP and bigwig

Will McLaren wm2 at ebi.ac.uk
Mon Apr 8 17:13:41 BST 2013


Hi Pierre,

Sorry for the delay with this - I've fixed the VEP's bigWig parsing.

Regards

Will McLaren
Ensembl Variation

On 26 February 2013 16:33, Will McLaren <wm2 at ebi.ac.uk> wrote:
> Hi Pierre,
>
> Thanks for finding this - it looks like the way the VEP is parsing bigWig
> files is a bit messed up. I'll take a look.
>
> Regarding the VCF, currently there's no way to get anything but either the
> identifier (the third column of the VCF) or the coordinates - this is the
> last parameter as described here
> http://www.ensembl.org/info/docs/variation/vep/vep_script.html#custom_options
>
> A plugin might not be much help, as this jumps in after the custom
> annotation has been added, and the only data retained in memory is the
> chrom, start, end and name/identifier - the rest of the data from the VCF
> (or whatever custom file) is not kept. The plugin would have to re-read the
> data from file, which is not too complex to do (see cache_custom_annotation
> in VEP.pm for the code we use).
>
> VCFtools has some components that may be of use to you - I know others have
> used vcf-annotate.
>
> Hope this helps
>
> Will McLaren
> Ensembl Variation
>
>
> On 26 February 2013 15:21, Pierre Lindenbaum
> <pierre.lindenbaum at univ-nantes.fr> wrote:
>>
>> Hi all,
>>
>> I cannot make VEP working with a bigwig file.
>>
>> In the following script, I
>>
>> * create a VCF with one variation
>> * create a wig file with one range overlaping the variation
>> * convert the wig to bigwig
>> * annotate the variation with VEP:
>>
>>     echo "fixedStep chrom=22 start=38823000 step=1 span=1000" > test.wig
>>     echo "99" >> test.wig
>>     echo "22 48823000">> chrominfo.txt
>>     /path/to/ucsc/wigToBigWig test.wig chrominfo.txt test.bw
>>     /path/to/ucsc/bigWigSummary test.bw 22 38823170 38823190 1
>>     99
>>     echo "##fileformat=VCFv4.1" > test.vep.vcf
>>     echo "#CHROM    POS    ID    REF    ALT    QUAL    FILTER INFO" >>
>> test.vep.vcf
>>     echo "22    38823180    .    G    T    100.0    .    ." >>
>> test.vep.vcf
>>     /path/to/variant_effect_predictor/variant_effect_predictor.pl
>> --write_cache  --cache --dir cache  \
>>              --fasta human_g1k_v37.fasta \
>>              --format vcf --force_overwrite -\
>>              --custom  test.bw,MYBIGWIG,bigwig,overlap,0 \
>>              -i  test.vep.vcf -o test.vep.txt
>>
>> Here is the output:
>>
>>
>> 2013-02-26 16:31:56 - Checking/creating FASTA index
>> 2013-02-26 16:31:56 - Read existing cache info
>> 2013-02-26 16:31:57 - Starting...
>> 2013-02-26 16:31:57 - Read 1 variants into buffer
>> 2013-02-26 16:31:57 - Reading transcript data from cache and/or database
>> [===============================================]  [ 100% ]
>> 2013-02-26 16:31:57 - Retrieved 271 transcripts (0 mem, 271 cached, 0 DB,
>> 0 duplicates)
>> 2013-02-26 16:31:57 - Analyzing chromosome 22
>> 2013-02-26 16:31:57 - Caching custom annotations
>> [===============================================]  [ 100% ]
>> 2013-02-26 16:31:57 - Retrieved 2 custom annotations (2 MYBIGWIG)
>> 2013-02-26 16:31:57 - Analyzing custom annotations
>> [>                                              ]    [ 0% ]Argument
>> "fixedStep chrom=22 start=38823000 step=1 span=1000" isn't numeric in
>> numeric ge (>=) at
>> /path/to/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm line
>> 1915.
>> [===============================================]  [ 100% ]
>> 2013-02-26 16:31:57 - Analyzing variants
>> [===============================================]  [ 100% ]
>> 2013-02-26 16:31:57 - Calculating consequences
>> 2013-02-26 16:31:57 - Processed 1 total variants (1 vars/sec, 1 vars/sec
>> total)
>> 2013-02-26 16:31:57 - Finished!
>>
>>
>> and the file test.vep.txt
>>
>> ## ENSEMBL VARIANT EFFECT PREDICTOR v2.8
>> ## Output produced at 2013-02-26 16:31:57
>> ## Connected to homo_sapiens_core_70_37 on ensembldb.ensembl.org
>> ## Using cache in /commun/data/pubdb/ensembl/vep/cache/homo_sapiens/70
>> ## Using API version 70, DB version 70
>> ## Extra column keys:
>> ## CELL_TYPE : List of cell types and classifications for regulatory
>> feature
>> ## DISTANCE : Shortest distance from variant to transcript
>> ## MYBIGWIG    : test.bw (overlap)
>> #Uploaded_variation    Location    Allele    Gene    Feature Feature_type
>> Consequence    cDNA_position    CDS_position Protein_position    Amino_acids
>> Codons    Existing_variation Extra
>> 22_38823180_G/T    22:38823180    T    ENSG00000228620 ENST00000433230
>> Transcript non_coding_exon_variant,nc_transcript_variant    395    -    - -
>> --
>> 22_38823180_G/T    22:38823180    T    ENSG00000168135 ENST00000303592
>> Transcript    missense_variant    1217    958 320    P/T    Cct/Act    -
>>
>>
>> while I'm here :-) when I use --custom with a VCF indexed with tabix, VEP
>> only shows the range where here found the data (e.g:" --custom
>> /path/to/ALL.wgs.phase1_release_v3.20101123.snps_indels_sv.sites.vcf.gz,1KGRel3,vcf,exact,1
>> "  )
>>
>>
>> rs3887390    22:46136619    T    -    -    - intergenic_variant    -    -
>> -    -    -    - 1KGRel3=22:46136619-461366
>>
>>
>> Is it possible to display something else (e.g. a component of the INFO
>> field) or should I write a plugin ?
>>
>>
>> Thank you,
>>
>> Pierre
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>




More information about the Dev mailing list