[ensembl-dev] Could not find variation cache for
Schmucki, Roland
roland.schmucki at roche.com
Mon Jun 8 15:35:17 BST 2015
Hi Will
Many thanks for your explanations.
However, the tools claims that it cannot find the --variant_class option
perl variant_effect_predictor.pl --no_progress --variant_c rass --biotype
--numbers --offline --custom ../ref/pao1.gff.gz,pao1-genes,gff,overlap,0
--format vcf -i ./test.vcf -o ./test.txt --species pao1 --dir_cache
./variant_effect_predictor_version79/cache_files
Unknown option: variant_class
ERROR: Failed to parse command-line flags
I am using version 79, is this a version issue?
Also, I could not find the gtf/gff specifications via the given second link?
Thanks for help!
Best,
R.
On Mon, Jun 8, 2015 at 10:38 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
> Hi Roland,
>
> You can ignore that warning message; when you specify --everything, it
> switches on a few options which tell the VEP to expect to find cache files
> containing co-located variants. Since you generated your cache yourself,
> these files don't exist, which is why the code is complaining. You can
> either continue to ignore the warnings, or substitute --everything for the
> list of flags specified here:
>
>
> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_everything
>
> In fact in your case only the following will work with a user-generated
> cache anyway: --variant_class, --biotype, --numbers
>
> Regarding the lack of protein-changing results, there is every chance that
> the cache has not been generated correctly from the GTF. I notice you
> converted a GFF; it's worth checking that the requirements on the input GTF
> are quite strict, see
> http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_everything
>
> It is on our to-do list to make this script compatible with a wider
> spectrum of GFF/GTF formatting.
>
> Regards
>
> Will
>
> On 5 June 2015 at 13:52, Schmucki, Roland <roland.schmucki at roche.com>
> wrote:
>
>> Dear Will
>>
>> Thank you very much for the quick response.
>> I would like to post this issue to the public Ensembl mailing list.
>> Here is a brief description of the problem I encountered:
>>
>>
>> When running VEP with ensembl annotation files I get errors of the form
>> "Could not find variation cache for Chromosome..."
>>
>> I downloaded a genome (i.e. pao1, $name.fa) and annotation ($name.gff3)
>> from Ensembl ftp and then created the cache files according to the VEP
>> tutorial:
>>
>>
>> sort -k1,1 -k4,4n $name.gff | bgzip > $name.gff.gz
>> tabix -p gff $name.gff.gz
>> ./cufflinks/gffread $name.gff -T -o $name.gtf
>> perl gtf2vep.pl -i $name.gtf -f $name.fa -d 79 -s $name --dir
>> variant_effect_predictor_version79/cache_files_
>> and move the cache files to the correct location manually.
>>
>> This all seem to have worked fine without any error or warning messages.
>> Then I mapped the reads to the genome, ran Freebayes (variants.vcf with
>> 2700 variants) and at the very end applied VEP with the following command:
>>
>>
>> perl variant_effect_predictor.pl --everything --offline --custom
>> $name.gff.gz,$name-genes,gff,overlap,0 --format vcf -i variants.vcf -o
>> variants.txt --species $name --dir_cache $VEP_DATA
>>
>>
>> The variable VEP_DATA points to the corresponding cache file:
>> with the following files (creation date and file size) there in:
>> $VEP_DATA/pao1/79/Chromosome/
>> 292135 Jun 5 09:10 3000001-4000000.gz
>> 294904 Jun 5 09:10 1000001-2000000.gz
>> 290186 Jun 5 09:10 1-1000000.gz
>> 290763 Jun 5 09:10 5000001-6000000.gz
>> 284789 Jun 5 09:10 2000001-3000000.gz
>> 292462 Jun 5 09:10 4000001-5000000.gz
>> 78483 Jun 5 09:10 6000001-7000000.gz
>>
>>
>> When I run VEP I get the following errors and warnings (See attached log
>> file for all details):
>> WARNING: Could not find variation cache for Chromosome:1-1000000
>> WARNING: Could not find variation cache for Chromosome:5000001-6000000
>> etc.
>>
>>
>> I don't understand why I got this errors/warnings?
>> Thanks a lot for any advice!
>>
>> Best,
>>
>> R.
>>
>>
>> PS: there is an output file generated with variant annotations of the
>> form:
>>
>> #Uploaded_variation Location Allele Gene Feature
>> Feature_type Consequence cDNA_position CDS_position Pro
>> tein_position Amino_acids Codons Existing_variation Extra
>> Chromosome_2415_G/T Chromosome:2415 T gene:PA0005
>> transcript:AAG03395 Transcript downstream_gene_variant -
>> - - - - -
>> IMPACT=MODIFIER;pao1-genes=gene:PA0002,exon_Chromosome:2056-3159,CDS:AAG03392,transc
>>
>> However, no amino acid changes are found which is unlikely.
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
--
Roland Schmucki, PhD
Computational Biologist, Pharmaceutical Sciences
Roche Pharma Research and Early Development
Roche Innovation Center Basel
F. Hoffmann-La Roche Ltd
Grenzacherstrasse 124
4070 Basel
Switzerland
Phone +41 61 687 13 30
Confidentiality Note: This message is intended only for the use of the
named recipient(s) and may contain confidential and/or proprietary
information. If you are not the intended recipient, please contact the
sender and delete this message. Any unauthorized use of the information
contained in this message is prohibited.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150608/f0d7cbcc/attachment.html>
More information about the Dev
mailing list