[ensembl-dev] Request to add one species to VEP pre-built cache

Dan Sun meredithfy at gmail.com
Thu Jul 30 15:21:08 BST 2015


Hi Will,

Thank you! It works like a charm.

Have a great day!

Dan


On Thu, Jul 30, 2015 at 5:49 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:

> Hi Dan,
>
> Thanks for the report, we are still working on ironing out some issues in
> the GFF parser.
>
> I've added some fixes to the release/81 version of gtf2vep.pl which
> should correct the problems you are seeing.
>
> Regards
>
> Will
>
> On 29 July 2015 at 22:21, Dan Sun <meredithfy at gmail.com> wrote:
>
>> Hi Will and Christian,
>>
>> Thank you both for your help.
>>
>> I have an additional question. Once I annotated my vcf file using your
>> cache, I notice non-coding variants are marked "intergenic variant" instead
>> of something like "non coding exon variant". For example, NW_005081553.1:
>> 4008346G->T is a variant located in an exon of non-coding transcripts of
>> gene KHDRBS2 (XR_270793.1, XR_270792.1, XR_270795.1, XR_270797.1,
>> XR_270794.1). You have any ideas about how to improve the annotation of
>> SNPs in exons of non-coding genes for this species? You can find these
>> non-coding transcripts in the GFF3 file you downloaded from NCBI.
>>
>> Thanks!
>>
>> Best,
>> Dan
>>
>> On Tue, Jul 28, 2015 at 5:52 AM, Christian Cole (Staff) <
>> C.Cole at dundee.ac.uk> wrote:
>>
>>>   Sorry, I couldn't leave this alone. I don't think I've done enough
>>> coding lately ;)
>>>
>>>  You can shorten it a fair bit further with the magic -a (auto-split)
>>> and -p (auto-print) switches:
>>> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl
>>> -F'/\|/' -lape 's/^>.*/>$F[3]/' >
>>> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>>>
>>>  -a splits each line by the pattern given by -F (whitespace by default)
>>> and puts it into @F
>>> -p puts while{<>} { print } around your code
>>>
>>>  Using substitution rather than an if() simplifies the defline fix.
>>> Although, it's a lot less legible.
>>>
>>>  OK. I feel better now...
>>> Cheers,
>>>
>>>  Chris
>>>
>>>   From: <dev-bounces at ensembl.org> on behalf of Will McLaren
>>> Reply-To: Ensembl developers list
>>> Date: Tuesday, 28 July 2015 10:16
>>>
>>> To: Ensembl developers list
>>> Subject: Re: [ensembl-dev] Request to add one species to VEP pre-built
>>> cache
>>>
>>>   Thanks Chris - always good to shorten one-liners.
>>>
>>>  And you're correct, the space is not intentional; the command should
>>> be:
>>>
>>> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -lne
>>> 'if(/^\>/) { $id = (split /\|/, $_)[3]; print ">$id";} else {print}' >
>>> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>>>
>>>  Regards
>>>
>>> Will
>>>
>>> On 28 July 2015 at 10:09, Christian Cole (Staff) <C.Cole at dundee.ac.uk>
>>> wrote:
>>>
>>>>   Hi Will,
>>>>
>>>>  Just a quick tip. Using the perl -n switch avoids 'while(<>) { }' and
>>>> -l switch avoids having to terminate print statements with '\n'. So your
>>>> code can be tidied up a touch with:
>>>> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -lne
>>>> 'if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id";} else {print}' >
>>>> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>>>>
>>>>    Also, is the space in '> $id' intentional? That's not typical
>>>> behaviour for fasta files.
>>>> Cheers,
>>>>
>>>>  Chris
>>>>
>>>>   From: <dev-bounces at ensembl.org> on behalf of Will McLaren
>>>> Reply-To: Ensembl developers list
>>>> Date: Monday, 27 July 2015 17:27
>>>> To: Ensembl developers list
>>>> Subject: Re: [ensembl-dev] Request to add one species to VEP pre-built
>>>> cache
>>>>
>>>>   Hi Dan,
>>>>
>>>>  We have in fact just updated our GTF converter script to support GFF
>>>> too (get the new release, 81, for this capability).
>>>>
>>>>  However, giving it a go just now with that file I noticed the FASTA
>>>> file supplied doesn't play nicely with our indexer, so I tweaked the FASTA
>>>> to get it to run. Long story short, here's the cache:
>>>>
>>>>
>>>> https://dl.dropboxusercontent.com/u/12936195/zonotrichia_albicollis.tar.gz
>>>>
>>>>  And here's the long story, i.e. what I did to generate it if you want
>>>> to do the same:
>>>>
>>>>  wget
>>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
>>>>  wget
>>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/CHR_Un/44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz
>>>>  gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -e
>>>> 'while(<>) { if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id\n";} else
>>>> {print}}' > 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>>>> perl gtf2vep.pl -i ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
>>>> -fasta 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa -species
>>>> zonotrichia_albicollis
>>>>
>>>>  Then run the VEP as follows:
>>>>
>>>>  perl variant_effect_predictor.pl -offline -species
>>>> zonotrichia_albicollis -i variants.vcf
>>>>
>>>>  Regards
>>>>
>>>>  Will McLaren
>>>> Ensembl Variation
>>>>
>>>>
>>>>
>>>>
>>>> On 27 July 2015 at 16:49, Dan Sun <meredithfy at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>  I was trying to build a cache from GTF for white-throated sparrow by
>>>>> myself following the tutorial, but was not successful. If possible, could
>>>>> you please add this species to the download list? I would really appreciate
>>>>> that!
>>>>>
>>>>>  You may download the GFF3 annotation for this species from NCBI ftp (
>>>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz)
>>>>> and convert it to GTF.
>>>>>
>>>>>  Thank you very much!
>>>>>
>>>>>  --
>>>>>  Dan
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>
>>>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>>
>> --
>> Dan Sun
>> Graduate student of Bioinformatics
>> School of Biology
>> Georgia Institute of Technology
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>


-- 
Dan Sun
Graduate student of Bioinformatics
School of Biology
Georgia Institute of Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150730/289d3dff/attachment.html>


More information about the Dev mailing list