[ensembl-dev] Request to add one species to VEP pre-built cache

Will McLaren wm2 at ebi.ac.uk
Tue Jul 28 10:16:08 BST 2015


Thanks Chris - always good to shorten one-liners.

And you're correct, the space is not intentional; the command should be:

gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -lne
'if(/^\>/) { $id = (split /\|/, $_)[3]; print ">$id";} else {print}' >
44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa

Regards

Will

On 28 July 2015 at 10:09, Christian Cole (Staff) <C.Cole at dundee.ac.uk>
wrote:

>   Hi Will,
>
>  Just a quick tip. Using the perl -n switch avoids 'while(<>) { }' and -l
> switch avoids having to terminate print statements with '\n'. So your code
> can be tidied up a touch with:
> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -lne
> 'if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id";} else {print}' >
> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>
>    Also, is the space in '> $id' intentional? That's not typical
> behaviour for fasta files.
> Cheers,
>
>  Chris
>
>   From: <dev-bounces at ensembl.org> on behalf of Will McLaren
> Reply-To: Ensembl developers list
> Date: Monday, 27 July 2015 17:27
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] Request to add one species to VEP pre-built
> cache
>
>   Hi Dan,
>
>  We have in fact just updated our GTF converter script to support GFF too
> (get the new release, 81, for this capability).
>
>  However, giving it a go just now with that file I noticed the FASTA file
> supplied doesn't play nicely with our indexer, so I tweaked the FASTA to
> get it to run. Long story short, here's the cache:
>
>
> https://dl.dropboxusercontent.com/u/12936195/zonotrichia_albicollis.tar.gz
>
>  And here's the long story, i.e. what I did to generate it if you want to
> do the same:
>
>  wget
> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
>  wget
> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/CHR_Un/44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz
>  gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -e
> 'while(<>) { if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id\n";} else
> {print}}' > 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
> perl gtf2vep.pl -i ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
> -fasta 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa -species
> zonotrichia_albicollis
>
>  Then run the VEP as follows:
>
>  perl variant_effect_predictor.pl -offline -species
> zonotrichia_albicollis -i variants.vcf
>
>  Regards
>
>  Will McLaren
> Ensembl Variation
>
>
>
>
> On 27 July 2015 at 16:49, Dan Sun <meredithfy at gmail.com> wrote:
>
>> Hi,
>>
>>  I was trying to build a cache from GTF for white-throated sparrow by
>> myself following the tutorial, but was not successful. If possible, could
>> you please add this species to the download list? I would really appreciate
>> that!
>>
>>  You may download the GFF3 annotation for this species from NCBI ftp (
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz)
>> and convert it to GTF.
>>
>>  Thank you very much!
>>
>>  --
>>  Dan
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150728/28f61061/attachment.html>


More information about the Dev mailing list