[ensembl-dev] Request to add one species to VEP pre-built cache

Dan Sun meredithfy at gmail.com
Wed Jul 29 22:21:43 BST 2015


Hi Will and Christian,

Thank you both for your help.

I have an additional question. Once I annotated my vcf file using your
cache, I notice non-coding variants are marked "intergenic variant" instead
of something like "non coding exon variant". For example, NW_005081553.1:
4008346G->T is a variant located in an exon of non-coding transcripts of
gene KHDRBS2 (XR_270793.1, XR_270792.1, XR_270795.1, XR_270797.1,
XR_270794.1). You have any ideas about how to improve the annotation of
SNPs in exons of non-coding genes for this species? You can find these
non-coding transcripts in the GFF3 file you downloaded from NCBI.

Thanks!

Best,
Dan

On Tue, Jul 28, 2015 at 5:52 AM, Christian Cole (Staff) <C.Cole at dundee.ac.uk
> wrote:

>   Sorry, I couldn't leave this alone. I don't think I've done enough
> coding lately ;)
>
>  You can shorten it a fair bit further with the magic -a (auto-split) and
> -p (auto-print) switches:
> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl
> -F'/\|/' -lape 's/^>.*/>$F[3]/' >
> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>
>  -a splits each line by the pattern given by -F (whitespace by default)
> and puts it into @F
> -p puts while{<>} { print } around your code
>
>  Using substitution rather than an if() simplifies the defline fix.
> Although, it's a lot less legible.
>
>  OK. I feel better now...
> Cheers,
>
>  Chris
>
>   From: <dev-bounces at ensembl.org> on behalf of Will McLaren
> Reply-To: Ensembl developers list
> Date: Tuesday, 28 July 2015 10:16
>
> To: Ensembl developers list
> Subject: Re: [ensembl-dev] Request to add one species to VEP pre-built
> cache
>
>   Thanks Chris - always good to shorten one-liners.
>
>  And you're correct, the space is not intentional; the command should be:
>
> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -lne
> 'if(/^\>/) { $id = (split /\|/, $_)[3]; print ">$id";} else {print}' >
> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>
>  Regards
>
> Will
>
> On 28 July 2015 at 10:09, Christian Cole (Staff) <C.Cole at dundee.ac.uk>
> wrote:
>
>>   Hi Will,
>>
>>  Just a quick tip. Using the perl -n switch avoids 'while(<>) { }' and
>> -l switch avoids having to terminate print statements with '\n'. So your
>> code can be tidied up a touch with:
>> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -lne
>> 'if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id";} else {print}' >
>> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>>
>>    Also, is the space in '> $id' intentional? That's not typical
>> behaviour for fasta files.
>> Cheers,
>>
>>  Chris
>>
>>   From: <dev-bounces at ensembl.org> on behalf of Will McLaren
>> Reply-To: Ensembl developers list
>> Date: Monday, 27 July 2015 17:27
>> To: Ensembl developers list
>> Subject: Re: [ensembl-dev] Request to add one species to VEP pre-built
>> cache
>>
>>   Hi Dan,
>>
>>  We have in fact just updated our GTF converter script to support GFF
>> too (get the new release, 81, for this capability).
>>
>>  However, giving it a go just now with that file I noticed the FASTA
>> file supplied doesn't play nicely with our indexer, so I tweaked the FASTA
>> to get it to run. Long story short, here's the cache:
>>
>>
>> https://dl.dropboxusercontent.com/u/12936195/zonotrichia_albicollis.tar.gz
>>
>>  And here's the long story, i.e. what I did to generate it if you want
>> to do the same:
>>
>>  wget
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
>>  wget
>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/CHR_Un/44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz
>>  gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -e
>> 'while(<>) { if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id\n";} else
>> {print}}' > 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>> perl gtf2vep.pl -i ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
>> -fasta 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa -species
>> zonotrichia_albicollis
>>
>>  Then run the VEP as follows:
>>
>>  perl variant_effect_predictor.pl -offline -species
>> zonotrichia_albicollis -i variants.vcf
>>
>>  Regards
>>
>>  Will McLaren
>> Ensembl Variation
>>
>>
>>
>>
>> On 27 July 2015 at 16:49, Dan Sun <meredithfy at gmail.com> wrote:
>>
>>> Hi,
>>>
>>>  I was trying to build a cache from GTF for white-throated sparrow by
>>> myself following the tutorial, but was not successful. If possible, could
>>> you please add this species to the download list? I would really appreciate
>>> that!
>>>
>>>  You may download the GFF3 annotation for this species from NCBI ftp (
>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz)
>>> and convert it to GTF.
>>>
>>>  Thank you very much!
>>>
>>>  --
>>>  Dan
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 
Dan Sun
Graduate student of Bioinformatics
School of Biology
Georgia Institute of Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150729/cee05882/attachment.html>


More information about the Dev mailing list