[ensembl-dev] build cache from gtf file
Schmucki, Roland
roland.schmucki at roche.com
Wed Nov 11 09:22:10 GMT 2015
Hello!
I would like to build a VEP cache from a GTF file which I downloaded from
Ensembl (Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.2.29.gtf)
The following commands were used to create the cache and were applied on a
test vcf file that includes all sorts of variants (missense and silent
SNPs, short indels, etc):
set VEPDIR=variant_effect_predictor_version79
set REF=Escherichia_coli_str_k_12_substr_mg1655.fa
set species=Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.2.29
perl $VEPDIR/gtf2vep.pl -i $species.gtf -f
Escherichia_coli_str_k_12_substr_mg1655.fa -d 79 -species $species --dir
cache_files
rm -rf ${species}
mv cache_files${species} ${species}
perl $VEPDIR/variant_effect_predictor.pl --force_overwrite -offline -i
test.vcf -o test_${species}_vep.txt -species $species --dir .
It works well and without any warning:
Building the cache:
2015-11-11 10:09:08 - Checking/creating FASTA index
2015-11-11 10:09:08 - Processing chromosome Chromosome
2015-11-11 10:09:17 - All done!
Applying to test.vcf gives
2015-11-11 10:09:50 - Starting...
2015-11-11 10:09:50 - Detected format of input file as vcf
2015-11-11 10:09:50 - Read 387 variants into buffer
2015-11-11 10:09:50 - Reading transcript data from cache and/or database
[================================================================================================================================================================================================================================]
[ 100% ]
2015-11-11 10:09:51 - Retrieved 4497 transcripts (0 mem, 4497 cached, 0 DB,
0 duplicates)
2015-11-11 10:09:51 - Analyzing chromosome Chromosome
2015-11-11 10:09:51 - Analyzing variants
[================================================================================================================================================================================================================================]
[ 100% ]
2015-11-11 10:09:53 - Calculating consequences
[================================================================================================================================================================================================================================]
[ 100% ]
2015-11-11 10:09:55 - Processed 387 total variants (77 vars/sec, 77
vars/sec total)
2015-11-11 10:09:55 - Wrote stats summary to
test_Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.2.29_vep.txt_summary.html
2015-11-11 10:09:55 - Finished!
However, in the output I do not obtain the amino acid changes/codons as
well as the position of the changes in the protein:
Chromosome_66528_T/C Chromosome:66528 C b0061 AAC73172
Transcript
non_coding_transcript_exon_variant,non_coding_transcript_variant 23
- - - - - IMPACT=MODIFIER;STRAND=-1
On the other side, I get all this information when I download the pre-built
cache file (escherichia_coli_str_k_12_substr_mg1655) and run it on the
command line using (source:
ftp://ftp.ensemblgenomes.org/pub/bacteria/current/):
set species=escherichia_coli_str_k_12_substr_mg1655
perl $VEPDIR/variant_effect_predictor.pl --force_overwrite -offline -i
test.vcf -o test_${species}_vep.txt -species $species --dir .
Chromosome_66528_T/C Chromosome:66528 C b0061 AAC73172
Transcript missense_variant 23 23 8 Q/R
cAg/cGg - IMPACT=MODERATE;STRAND=-1
Does anyone know how to build/apply the cache from a GTF file so that I get
the same output as from the pre-built cache?
I want to compare the downloaded GTF file with the one that was used to
generate the pre-built cache files (in order to fully understand the
required format).
Moreover, I would like to understand how to make a valid GTF for other
genomes assemblies and annotations (which are not in Ensembl) so that I can
create my own VEP cache files.
Thanks for any help and suggestions!
Roland
--
Roland Schmucki, PhD
Computational Biologist, Pharmaceutical Sciences
Roche Pharma Research and Early Development
Roche Innovation Center Basel
F. Hoffmann-La Roche Ltd
Grenzacherstrasse 124
4070 Basel
Switzerland
Phone +41 61 687 13 30
Confidentiality Note: This message is intended only for the use of the
named recipient(s) and may contain confidential and/or proprietary
information. If you are not the intended recipient, please contact the
sender and delete this message. Any unauthorized use of the information
contained in this message is prohibited.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151111/81fd6b01/attachment.html>
More information about the Dev
mailing list