[ensembl-dev] Request to add one species to VEP pre-built cache
Will McLaren
wm2 at ebi.ac.uk
Tue Jul 28 10:16:08 BST 2015
Thanks Chris - always good to shorten one-liners.
And you're correct, the space is not intentional; the command should be:
gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -lne
'if(/^\>/) { $id = (split /\|/, $_)[3]; print ">$id";} else {print}' >
44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
Regards
Will
On 28 July 2015 at 10:09, Christian Cole (Staff) <C.Cole at dundee.ac.uk>
wrote:
> Hi Will,
>
> Just a quick tip. Using the perl -n switch avoids 'while(<>) { }' and -l
> switch avoids having to terminate print statements with '\n'. So your code
> can be tidied up a touch with:
> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -lne
> 'if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id";} else {print}' >
> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>
> Also, is the space in '> $id' intentional? That's not typical
> behaviour for fasta files.
> Cheers,
>
> Chris
>
> From: <dev-bounces at ensembl.org> on behalf of Will McLaren
> Reply-To: Ensembl developers list
> Date: Monday, 27 July 2015 17:27
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] Request to add one species to VEP pre-built
> cache
>
> Hi Dan,
>
> We have in fact just updated our GTF converter script to support GFF too
> (get the new release, 81, for this capability).
>
> However, giving it a go just now with that file I noticed the FASTA file
> supplied doesn't play nicely with our indexer, so I tweaked the FASTA to
> get it to run. Long story short, here's the cache:
>
>
> https://dl.dropboxusercontent.com/u/12936195/zonotrichia_albicollis.tar.gz
>
> And here's the long story, i.e. what I did to generate it if you want to
> do the same:
>
> wget
> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
> wget
> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/CHR_Un/44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz
> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -e
> 'while(<>) { if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id\n";} else
> {print}}' > 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
> perl gtf2vep.pl -i ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
> -fasta 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa -species
> zonotrichia_albicollis
>
> Then run the VEP as follows:
>
> perl variant_effect_predictor.pl -offline -species
> zonotrichia_albicollis -i variants.vcf
>
> Regards
>
> Will McLaren
> Ensembl Variation
>
>
>
>
> On 27 July 2015 at 16:49, Dan Sun <meredithfy at gmail.com> wrote:
>
>> Hi,
>>
>> I was trying to build a cache from GTF for white-throated sparrow by
>> myself following the tutorial, but was not successful. If possible, could
>> you please add this species to the download list? I would really appreciate
>> that!
>>
>> You may download the GFF3 annotation for this species from NCBI ftp (
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz)
>> and convert it to GTF.
>>
>> Thank you very much!
>>
>> --
>> Dan
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150728/28f61061/attachment.html>
More information about the Dev
mailing list