[ensembl-dev] Request to add one species to VEP pre-built cache
Christian Cole (Staff)
C.Cole at dundee.ac.uk
Tue Jul 28 10:09:00 BST 2015
Hi Will,
Just a quick tip. Using the perl -n switch avoids 'while(<>) { }' and -l switch avoids having to terminate print statements with '\n'. So your code can be tidied up a touch with:
gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -lne 'if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id";} else {print}' > 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
Also, is the space in '> $id' intentional? That's not typical behaviour for fasta files.
Cheers,
Chris
From: <dev-bounces at ensembl.org<mailto:dev-bounces at ensembl.org>> on behalf of Will McLaren
Reply-To: Ensembl developers list
Date: Monday, 27 July 2015 17:27
To: Ensembl developers list
Subject: Re: [ensembl-dev] Request to add one species to VEP pre-built cache
Hi Dan,
We have in fact just updated our GTF converter script to support GFF too (get the new release, 81, for this capability).
However, giving it a go just now with that file I noticed the FASTA file supplied doesn't play nicely with our indexer, so I tweaked the FASTA to get it to run. Long story short, here's the cache:
https://dl.dropboxusercontent.com/u/12936195/zonotrichia_albicollis.tar.gz
And here's the long story, i.e. what I did to generate it if you want to do the same:
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/CHR_Un/44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz
gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -e 'while(<>) { if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id\n";} else {print}}' > 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
perl gtf2vep.pl<http://gtf2vep.pl> -i ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz -fasta 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa -species zonotrichia_albicollis
Then run the VEP as follows:
perl variant_effect_predictor.pl<http://variant_effect_predictor.pl> -offline -species zonotrichia_albicollis -i variants.vcf
Regards
Will McLaren
Ensembl Variation
On 27 July 2015 at 16:49, Dan Sun <meredithfy at gmail.com<mailto:meredithfy at gmail.com>> wrote:
Hi,
I was trying to build a cache from GTF for white-throated sparrow by myself following the tutorial, but was not successful. If possible, could you please add this species to the download list? I would really appreciate that!
You may download the GFF3 annotation for this species from NCBI ftp (ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz) and convert it to GTF.
Thank you very much!
--
Dan
_______________________________________________
Dev mailing list Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/
The University of Dundee is a registered Scottish Charity, No: SC015096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150728/5dec97a5/attachment.html>
More information about the Dev
mailing list