[ensembl-dev] Request to add one species to VEP pre-built cache

Fiona Cunningham fiona at ebi.ac.uk
Tue Aug 4 15:37:52 BST 2015


Hi Dan,

Thanks for getting in touch. The VEP considers each variant separately,
even if they are in the same codon. This is because the variants may be on
different strands. You can add information on this using a plugin e.g.
https://github.com/ensembl-variation/VEP_plugins/blob/master/SameCodon.pm

See more info here:
http://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html

Fiona
-----------------------------------------------------------------
Fiona Cunningham, PhD
Variation Annotation Coordinator,
European Bioinformatics Institute (EMBL-EBI)
www.ensembl.org || www.lrg-sequence.org

On 30 July 2015 at 23:00, Dan Sun <meredithfy at gmail.com> wrote:

> Hi Will,
>
> Thanks again! I have another minor bug to report.
>
> For mutations in the same codon, VEP annotates them separately. This could
> sometimes cause problems. The following is the VEP output for two mutations
> in the same codon:
>
> NW_005081561.1_649917_G/A NW_005081561.1:649917 A 102066196 XM_005485125.1
> Transcript missense_variant 6577 6577 2193 D/N Gat/Aat -
> IMPACT=MODERATE;STRAND=1
> NW_005081561.1_649918_A/G NW_005081561.1:649918 G 102066196 XM_005485125.1
> Transcript missense_variant 6578 6578 2193 D/G gAt/gGt -
> IMPACT=MODERATE;STRAND=1
>
> However, instead of GAT -> AAT or GAT -> GGT, the true mutation is GAT ->
> AGT. The amino acid changes from D to S, not to N or G. I think an output
> like this might make more sense:
>
> NW_005081561.1_649917_GA/AG NW_005081561.1:649917-649918 AG 102066196
> XM_005485125.1 Transcript missense_variant 6577-6578 6577-6578 2193 D/S
> Gat/AGt - IMPACT=MODERATE;STRAND=1
>
> Thanks,
> Dan
>
> On Thu, Jul 30, 2015 at 10:21 AM, Dan Sun <meredithfy at gmail.com> wrote:
>
>> Hi Will,
>>
>> Thank you! It works like a charm.
>>
>> Have a great day!
>>
>> Dan
>>
>>
>> On Thu, Jul 30, 2015 at 5:49 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>>
>>> Hi Dan,
>>>
>>> Thanks for the report, we are still working on ironing out some issues
>>> in the GFF parser.
>>>
>>> I've added some fixes to the release/81 version of gtf2vep.pl which
>>> should correct the problems you are seeing.
>>>
>>> Regards
>>>
>>> Will
>>>
>>> On 29 July 2015 at 22:21, Dan Sun <meredithfy at gmail.com> wrote:
>>>
>>>> Hi Will and Christian,
>>>>
>>>> Thank you both for your help.
>>>>
>>>> I have an additional question. Once I annotated my vcf file using your
>>>> cache, I notice non-coding variants are marked "intergenic variant" instead
>>>> of something like "non coding exon variant". For example, NW_005081553.1:
>>>> 4008346G->T is a variant located in an exon of non-coding transcripts of
>>>> gene KHDRBS2 (XR_270793.1, XR_270792.1, XR_270795.1, XR_270797.1,
>>>> XR_270794.1). You have any ideas about how to improve the annotation of
>>>> SNPs in exons of non-coding genes for this species? You can find these
>>>> non-coding transcripts in the GFF3 file you downloaded from NCBI.
>>>>
>>>> Thanks!
>>>>
>>>> Best,
>>>> Dan
>>>>
>>>> On Tue, Jul 28, 2015 at 5:52 AM, Christian Cole (Staff) <
>>>> C.Cole at dundee.ac.uk> wrote:
>>>>
>>>>> Sorry, I couldn't leave this alone. I don't think I've done enough
>>>>> coding lately ;)
>>>>>
>>>>> You can shorten it a fair bit further with the magic -a (auto-split)
>>>>> and -p (auto-print) switches:
>>>>> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl
>>>>> -F'/\|/' -lape 's/^>.*/>$F[3]/' >
>>>>> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>>>>>
>>>>> -a splits each line by the pattern given by -F (whitespace by default)
>>>>> and puts it into @F
>>>>> -p puts while{<>} { print } around your code
>>>>>
>>>>> Using substitution rather than an if() simplifies the defline fix.
>>>>> Although, it's a lot less legible.
>>>>>
>>>>> OK. I feel better now...
>>>>> Cheers,
>>>>>
>>>>> Chris
>>>>>
>>>>> From: <dev-bounces at ensembl.org> on behalf of Will McLaren
>>>>> Reply-To: Ensembl developers list
>>>>> Date: Tuesday, 28 July 2015 10:16
>>>>>
>>>>> To: Ensembl developers list
>>>>> Subject: Re: [ensembl-dev] Request to add one species to VEP
>>>>> pre-built cache
>>>>>
>>>>> Thanks Chris - always good to shorten one-liners.
>>>>>
>>>>> And you're correct, the space is not intentional; the command should
>>>>> be:
>>>>>
>>>>> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl
>>>>> -lne 'if(/^\>/) { $id = (split /\|/, $_)[3]; print ">$id";} else {print}' >
>>>>> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>>>>>
>>>>> Regards
>>>>>
>>>>> Will
>>>>>
>>>>> On 28 July 2015 at 10:09, Christian Cole (Staff) <C.Cole at dundee.ac.uk>
>>>>> wrote:
>>>>>
>>>>>> Hi Will,
>>>>>>
>>>>>> Just a quick tip. Using the perl -n switch avoids 'while(<>) { }' and
>>>>>> -l switch avoids having to terminate print statements with '\n'. So your
>>>>>> code can be tidied up a touch with:
>>>>>> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl
>>>>>> -lne 'if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id";} else {print}'
>>>>>> > 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>>>>>>
>>>>>> Also, is the space in '> $id' intentional? That's not typical
>>>>>> behaviour for fasta files.
>>>>>> Cheers,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> From: <dev-bounces at ensembl.org> on behalf of Will McLaren
>>>>>> Reply-To: Ensembl developers list
>>>>>> Date: Monday, 27 July 2015 17:27
>>>>>> To: Ensembl developers list
>>>>>> Subject: Re: [ensembl-dev] Request to add one species to VEP
>>>>>> pre-built cache
>>>>>>
>>>>>> Hi Dan,
>>>>>>
>>>>>> We have in fact just updated our GTF converter script to support GFF
>>>>>> too (get the new release, 81, for this capability).
>>>>>>
>>>>>> However, giving it a go just now with that file I noticed the FASTA
>>>>>> file supplied doesn't play nicely with our indexer, so I tweaked the FASTA
>>>>>> to get it to run. Long story short, here's the cache:
>>>>>>
>>>>>>
>>>>>> https://dl.dropboxusercontent.com/u/12936195/zonotrichia_albicollis.tar.gz
>>>>>>
>>>>>> And here's the long story, i.e. what I did to generate it if you want
>>>>>> to do the same:
>>>>>>
>>>>>> wget
>>>>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz
>>>>>> wget
>>>>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/CHR_Un/44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz
>>>>>> gzip -dc 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa.gz | perl -e
>>>>>> 'while(<>) { if(/^\>/) { $id = (split /\|/, $_)[3]; print "> $id\n";} else
>>>>>> {print}}' > 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa
>>>>>> perl gtf2vep.pl -i
>>>>>> ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz -fasta
>>>>>> 44394_ref_Zonotrichia_albicollis-1.0.1_chrUn.fa -species
>>>>>> zonotrichia_albicollis
>>>>>>
>>>>>> Then run the VEP as follows:
>>>>>>
>>>>>> perl variant_effect_predictor.pl -offline -species
>>>>>> zonotrichia_albicollis -i variants.vcf
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Will McLaren
>>>>>> Ensembl Variation
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 27 July 2015 at 16:49, Dan Sun <meredithfy at gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was trying to build a cache from GTF for white-throated sparrow by
>>>>>>> myself following the tutorial, but was not successful. If possible, could
>>>>>>> you please add this species to the download list? I would really appreciate
>>>>>>> that!
>>>>>>>
>>>>>>> You may download the GFF3 annotation for this species from NCBI ftp (
>>>>>>> ftp://ftp.ncbi.nlm.nih.gov/genomes/Zonotrichia_albicollis/GFF/ref_Zonotrichia_albicollis-1.0.1_scaffolds.gff3.gz)
>>>>>>> and convert it to GTF.
>>>>>>>
>>>>>>> Thank you very much!
>>>>>>>
>>>>>>> --
>>>>>>> Dan
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> The University of Dundee is a registered Scottish Charity, No:
>>>>>> SC015096
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list    Dev at ensembl.org
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>
>>>>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list    Dev at ensembl.org
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Dan Sun
>>>> Graduate student of Bioinformatics
>>>> School of Biology
>>>> Georgia Institute of Technology
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>>
>> --
>> Dan Sun
>> Graduate student of Bioinformatics
>> School of Biology
>> Georgia Institute of Technology
>>
>
>
>
> --
> Dan Sun
> Graduate student of Bioinformatics
> School of Biology
> Georgia Institute of Technology
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20150804/f2546836/attachment.html>


More information about the Dev mailing list