[ensembl-dev] VEP 76 gtf2vep

Ksenia Krasileva krasileva at ucdavis.edu
Tue Sep 30 20:32:59 BST 2014


The files I am currently using for wheat as a benchmark are hosted here:

ftp://ftp.ensemblgenomes.org/pub/plants/release-22/gff3/triticum_aestivum/Triticum_aestivum.IWGSP1.22.gff3.gz
ftp://ftp.ensemblgenomes.org/pub/plants/release-23/gff3/triticum_aestivum/Triticum_aestivum.IWGSP1.23.gff3.gz

Is there another version formatted differently somewhere else in the
database? Please, let me know.

Best wishes,

Ksenia

Ksenia Krasileva, PhD
USDA NIFA Post Doctoral Scholar
Department of Plant Sciences
University of California, Davis
124 Robbins Hall
Davis, CA 95616

Email: krasileva at ucdavis.edu
Twitter: @kseniakrasileva <https://twitter.com/kseniakrasileva>
Web: http://dubcovskylab.ucdavis.edu/lab-member/ksenia-v-krasileva

On Wed, Sep 24, 2014 at 2:09 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:

> Hi Ksenia,
>
> These GFF files do not match the specification of GTF required for the
> gtf2vep.pl script to work.
>
> See http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gtf
> for specifications. Looking at your files, the following are not fulfilled:
>
> - the exon line does not have transcript_id, gene_id and exon_number
> defined
> - the CDS line does not have transcript_id and exon_number defined
> - the source column is set to "ensembl" rather than some biotype e.g.
> "protein_coding"
>
> The GTF files provided by the Ensembl project can give you an idea of what
> the format should be like:
>
> http://www.ensembl.org/info/data/ftp/index.html
>
> The following is an example from C.elegans, just showing the first exon:
>
> V       protein_coding  exon    7651    7822    .       -       .
> gene_id "WBGene00002061"; transcript_id "B0348.6a.1"; exon_number "1";
> gene_name "ife-3"; gene_source "ensembl"; gene_biotype "protein_coding";
> transcript_name "B0348.6a.1"; transcript_source "ensembl"; exon_id
> "WBGene00002061.e1";
> V       protein_coding  CDS     7651    7818    .       -       0
> gene_id "WBGene00002061"; transcript_id "B0348.6a.1"; exon_number "1";
> gene_name "ife-3"; gene_source "ensembl"; gene_biotype "protein_coding";
> transcript_name "B0348.6a.1"; transcript_source "ensembl"; protein_id
> "B0348.6a.1";
>
> HTH
>
> Will McLaren
> Ensembl Variation
>
> On 23 September 2014 21:00, Ksenia Krasileva <krasileva at ucdavis.edu>
> wrote:
>
>> Dear developers team,
>>
>> I am working towards using a custom annotation of wheat genes in variant
>> effect prediction with VEP.
>>
>> While building cache with gtf2vep.pl, I see that my exon and transcript
>> features are cached, but CDS features are most likely not read correctly
>> (gene biotype gets re-set to 'pseudogene' by fix_transcript and there is
>> no translation). VEP is able to use this cache but the prediction is not
>> correct as there is no translation or CDS.
>>
>> I tried to de-bug by running gtf2vep.pl with an example single exon gene
>> from Ensembl annotation for Triticum aestivum. I tried both v22 and v23
>> annotations and both give me the same result as before - biotype gets
>> re-set to 'pseudogene' in cache and there is no CDS/translation. Attached
>> are two test input gtfs that I am using from
>> Triticum_aestivum.IWGSP1.22.gff3 and Triticum_aestivum.IWGSP1.23.gff3
>> respectively.
>>
>> The command line is below:
>>
>> perl gtf2vep.pl -i test.v23.gff -f IWGSC_CSS_AB-TGAC_v1.fa -d 23 -s
>> wheat_custom
>>
>> I appreciate your suggestions of what might be going on.
>>
>> Thank you in advance,
>>
>> Ksenia
>>
>>
>> Ksenia Krasileva, PhD
>> USDA NIFA Post Doctoral Scholar
>> Department of Plant Sciences
>> University of California, Davis
>> 124 Robbins Hall
>> Davis, CA 95616
>>
>> Email: krasileva at ucdavis.edu
>> Twitter: @kseniakrasileva <https://twitter.com/kseniakrasileva>
>> Web: http://dubcovskylab.ucdavis.edu/lab-member/ksenia-v-krasileva
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140930/d121afad/attachment.html>


More information about the Dev mailing list