[ensembl-dev] VEP 76 gtf2vep

Will McLaren wm2 at ebi.ac.uk
Wed Sep 24 10:09:30 BST 2014


Hi Ksenia,

These GFF files do not match the specification of GTF required for the
gtf2vep.pl script to work.

See http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gtf
for specifications. Looking at your files, the following are not fulfilled:

- the exon line does not have transcript_id, gene_id and exon_number defined
- the CDS line does not have transcript_id and exon_number defined
- the source column is set to "ensembl" rather than some biotype e.g.
"protein_coding"

The GTF files provided by the Ensembl project can give you an idea of what
the format should be like:

http://www.ensembl.org/info/data/ftp/index.html

The following is an example from C.elegans, just showing the first exon:

V       protein_coding  exon    7651    7822    .       -       .
gene_id "WBGene00002061"; transcript_id "B0348.6a.1"; exon_number "1";
gene_name "ife-3"; gene_source "ensembl"; gene_biotype "protein_coding";
transcript_name "B0348.6a.1"; transcript_source "ensembl"; exon_id
"WBGene00002061.e1";
V       protein_coding  CDS     7651    7818    .       -       0
gene_id "WBGene00002061"; transcript_id "B0348.6a.1"; exon_number "1";
gene_name "ife-3"; gene_source "ensembl"; gene_biotype "protein_coding";
transcript_name "B0348.6a.1"; transcript_source "ensembl"; protein_id
"B0348.6a.1";

HTH

Will McLaren
Ensembl Variation

On 23 September 2014 21:00, Ksenia Krasileva <krasileva at ucdavis.edu> wrote:

> Dear developers team,
>
> I am working towards using a custom annotation of wheat genes in variant
> effect prediction with VEP.
>
> While building cache with gtf2vep.pl, I see that my exon and transcript
> features are cached, but CDS features are most likely not read correctly
> (gene biotype gets re-set to 'pseudogene' by fix_transcript and there is
> no translation). VEP is able to use this cache but the prediction is not
> correct as there is no translation or CDS.
>
> I tried to de-bug by running gtf2vep.pl with an example single exon gene
> from Ensembl annotation for Triticum aestivum. I tried both v22 and v23
> annotations and both give me the same result as before - biotype gets
> re-set to 'pseudogene' in cache and there is no CDS/translation. Attached
> are two test input gtfs that I am using from
> Triticum_aestivum.IWGSP1.22.gff3 and Triticum_aestivum.IWGSP1.23.gff3
> respectively.
>
> The command line is below:
>
> perl gtf2vep.pl -i test.v23.gff -f IWGSC_CSS_AB-TGAC_v1.fa -d 23 -s
> wheat_custom
>
> I appreciate your suggestions of what might be going on.
>
> Thank you in advance,
>
> Ksenia
>
>
> Ksenia Krasileva, PhD
> USDA NIFA Post Doctoral Scholar
> Department of Plant Sciences
> University of California, Davis
> 124 Robbins Hall
> Davis, CA 95616
>
> Email: krasileva at ucdavis.edu
> Twitter: @kseniakrasileva <https://twitter.com/kseniakrasileva>
> Web: http://dubcovskylab.ucdavis.edu/lab-member/ksenia-v-krasileva
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140924/691291f7/attachment.html>


More information about the Dev mailing list