[ensembl-dev] Request to add one species to VEP pre-built cache

Christian Cole (Staff) C.Cole at dundee.ac.uk
Tue Jul 28 10:09:00 BST 2015

Hi Will,

Just a quick tip. Using the perl -n switch avoids 'while(<>) { }' and -l switch avoids having to terminate print statements with '\n'. So your code can be tidied up a touch with:
gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -lne 'if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id";} else {print}' > 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa

Also, is the space in '> $id' intentional? That's not typical behaviour for fasta files.


From: <dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org>> on behalf of Will McLaren
Reply-To: Ensembl developers list
Date: Monday, 27 July 2015 17:27
To: Ensembl developers list
Subject: Re: [ensembl-dev] Request to add one species to VEP pre-built cache

Hi Dan,

We have in fact just updated our GTF converter script to support GFF too (get the new release, 81, for this capability).

However, giving it a go just now with that file I noticed the FASTA file supplied doesn't play nicely with our indexer, so I tweaked the FASTA to get it to run. Long story short, here's the cache:


And here's the long story, i.e. what I did to generate it if you want to do the same:

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/CHR_Un/44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz
gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -e 'while(<>) { if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id\n";} else {print}}' > 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
perl gtf2vep.pl<http://gtf2vep.pl> -i ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz -fasta 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa -species zonotrichia_albicollis

Then run the VEP as follows:

perl variant_effect_predictor.pl<http://variant_effect_predictor.pl> -offline -species zonotrichia_albicollis -i variants.vcf


Will McLaren
Ensembl Variation

On 27 July 2015 at 16:49, Dan Sun <meredithfy at gmail.com<mailto:meredithfy at gmail.com>> wrote:

I was trying to build a cache from GTF for white-throated sparrow by myself following the tutorial, but was not successful. If possible, could you please add this species to the download list? I would really appreciate that!

You may download the GFF3 annotation for this species from NCBI ftp (ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz) and convert it to GTF.

Thank you very much!


Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/

The University of Dundee is a registered Scottish Charity, No: SC015096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150728/5dec97a5/attachment.html>

More information about the Dev mailing list