[ensembl-dev] Could not find variation cache for

Schmucki, Roland roland.schmucki at roche.com
Mon Jun 8 15:35:17 BST 2015


Hi Will

Many thanks for your explanations.
However, the tools claims that it cannot find the --variant_class option

 perl variant_effect_predictor.pl --no_progress --variant_c rass --biotype
--numbers --offline --custom ../ref/pao1.gff.gz,pao1-genes,gff,overlap,0
--format vcf -i ./test.vcf -o ./test.txt --species pao1 --dir_cache
./variant_effect_predictor_version79/cache_files
Unknown option: variant_class
ERROR: Failed to parse command-line flags

I am using version 79, is this a version issue?

Also, I could not find the gtf/gff specifications via the given second link?

Thanks for help!

Best,
R.

On Mon, Jun 8, 2015 at 10:38 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:

> Hi Roland,
>
> You can ignore that warning message; when you specify --everything, it
> switches on a few options which tell the VEP to expect to find cache files
> containing co-located variants. Since you generated your cache yourself,
> these files don't exist, which is why the code is complaining. You can
> either continue to ignore the warnings, or substitute --everything for the
> list of flags specified here:
>
>
> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_everything
>
> In fact in your case only the following will work with a user-generated
> cache anyway: --variant_class, --biotype, --numbers
>
> Regarding the lack of protein-changing results, there is every chance that
> the cache has not been generated correctly from the GTF. I notice you
> converted a GFF; it's worth checking that the requirements on the input GTF
> are quite strict, see
> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_everything
>
> It is on our to-do list to make this script compatible with a wider
> spectrum of GFF/GTF formatting.
>
> Regards
>
> Will
>
> On 5 June 2015 at 13:52, Schmucki, Roland <roland.schmucki at roche.com>
> wrote:
>
>> Dear Will
>>
>> Thank you very much for the quick response.
>> I would like to post this issue to the public Ensembl mailing list.
>> Here is a brief description of the problem I encountered:
>>
>>
>> When running VEP with ensembl annotation files I get errors of the form
>> "Could not find variation cache for Chromosome..."
>>
>> I downloaded a  genome (i.e. pao1, $name.fa) and annotation ($name.gff3)
>> from Ensembl ftp and then created the cache files according to the VEP
>> tutorial:
>>
>>
>> sort -k1,1 -k4,4n $name.gff | bgzip > $name.gff.gz
>> tabix -p gff $name.gff.gz
>> ./cufflinks/gffread $name.gff -T -o $name.gtf
>> perl gtf2vep.pl -i $name.gtf -f $name.fa -d 79 -s $name --dir
>> variant_effect_predictor_version79/cache_files_
>> and move the cache files to the correct location manually.
>>
>> This all seem to have worked fine without any error or warning messages.
>> Then I mapped the reads to the genome, ran Freebayes (variants.vcf with
>> 2700 variants) and at the very end applied VEP with the following command:
>>
>>
>> perl variant_effect_predictor.pl --everything --offline --custom
>> $name.gff.gz,$name-genes,gff,overlap,0 --format vcf -i variants.vcf -o
>> variants.txt --species $name --dir_cache $VEP_DATA
>>
>>
>> The variable VEP_DATA points to the corresponding cache file:
>> with the following files (creation date and file size) there in:
>> $VEP_DATA/pao1/79/Chromosome/
>> 292135 Jun  5 09:10 3000001-4000000.gz
>> 294904 Jun  5 09:10 1000001-2000000.gz
>> 290186 Jun  5 09:10 1-1000000.gz
>> 290763 Jun  5 09:10 5000001-6000000.gz
>> 284789 Jun  5 09:10 2000001-3000000.gz
>> 292462 Jun  5 09:10 4000001-5000000.gz
>> 78483 Jun  5 09:10 6000001-7000000.gz
>>
>>
>> When I run VEP I get the following errors and warnings (See attached log
>> file for all details):
>> WARNING: Could not find variation cache for Chromosome:1-1000000
>> WARNING: Could not find variation cache for Chromosome:5000001-6000000
>> etc.
>>
>>
>> I don't understand why I got this errors/warnings?
>> Thanks a lot for any advice!
>>
>> Best,
>>
>> R.
>>
>>
>> PS: there is an output file generated with variant annotations of the
>> form:
>>
>> #Uploaded_variation     Location        Allele  Gene    Feature
>> Feature_type    Consequence     cDNA_position   CDS_position    Pro
>> tein_position        Amino_acids     Codons  Existing_variation      Extra
>> Chromosome_2415_G/T     Chromosome:2415 T       gene:PA0005
>> transcript:AAG03395     Transcript      downstream_gene_variant -
>>        -       -       -       -       -
>> IMPACT=MODIFIER;pao1-genes=gene:PA0002,exon_Chromosome:2056-3159,CDS:AAG03392,transc
>>
>> However, no amino acid changes are found which is unlikely.
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 

Roland Schmucki, PhD
Computational Biologist, Pharmaceutical Sciences
Roche Pharma Research and Early Development


Roche Innovation Center Basel

F. Hoffmann-La Roche Ltd
Grenzacherstrasse 124
4070 Basel

Switzerland
Phone +41 61 687 13 30




Confidentiality Note: This message is intended only for the use of the
named recipient(s) and may contain confidential and/or proprietary
information. If you are not the intended recipient, please contact the
sender and delete this message. Any unauthorized use of the information
contained in this message is prohibited.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150608/f0d7cbcc/attachment.html>


More information about the Dev mailing list