[ensembl-dev] VEP 76 gtf2vep

Arnaud Kerhornou arnaud at ebi.ac.uk
Tue Sep 30 20:51:10 BST 2014


Hi Knesia,

we also provide gtf
ftp://ftp.ensemblgenomes.org/pub/plants/release-23/gtf/triticum_aestivum/

Arnaud

> The files I am currently using for wheat as a benchmark are hosted here:
>
> ftp://ftp.ensemblgenomes.org/pub/plants/release-22/gff3/triticum_aestivum/Triticum_aestivum.IWGSP1.22.gff3.gz
> ftp://ftp.ensemblgenomes.org/pub/plants/release-23/gff3/triticum_aestivum/Triticum_aestivum.IWGSP1.23.gff3.gz
>
> Is there another version formatted differently somewhere else in the
> database? Please, let me know.
>
> Best wishes,
>
> Ksenia
>
> Ksenia Krasileva, PhD
> USDA NIFA Post Doctoral Scholar
> Department of Plant Sciences
> University of California, Davis
> 124 Robbins Hall
> Davis, CA 95616
>
> Email: krasileva at ucdavis.edu
> Twitter: @kseniakrasileva <https://twitter.com/kseniakrasileva>
> Web: http://dubcovskylab.ucdavis.edu/lab-member/ksenia-v-krasileva
>
> On Wed, Sep 24, 2014 at 2:09 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
>
>> Hi Ksenia,
>>
>> These GFF files do not match the specification of GTF required for the
>> gtf2vep.pl script to work.
>>
>> See http://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gtf
>> for specifications. Looking at your files, the following are not
>> fulfilled:
>>
>> - the exon line does not have transcript_id, gene_id and exon_number
>> defined
>> - the CDS line does not have transcript_id and exon_number defined
>> - the source column is set to "ensembl" rather than some biotype e.g.
>> "protein_coding"
>>
>> The GTF files provided by the Ensembl project can give you an idea of
>> what
>> the format should be like:
>>
>> http://www.ensembl.org/info/data/ftp/index.html
>>
>> The following is an example from C.elegans, just showing the first exon:
>>
>> V       protein_coding  exon    7651    7822    .       -       .
>> gene_id "WBGene00002061"; transcript_id "B0348.6a.1"; exon_number "1";
>> gene_name "ife-3"; gene_source "ensembl"; gene_biotype "protein_coding";
>> transcript_name "B0348.6a.1"; transcript_source "ensembl"; exon_id
>> "WBGene00002061.e1";
>> V       protein_coding  CDS     7651    7818    .       -       0
>> gene_id "WBGene00002061"; transcript_id "B0348.6a.1"; exon_number "1";
>> gene_name "ife-3"; gene_source "ensembl"; gene_biotype "protein_coding";
>> transcript_name "B0348.6a.1"; transcript_source "ensembl"; protein_id
>> "B0348.6a.1";
>>
>> HTH
>>
>> Will McLaren
>> Ensembl Variation
>>
>> On 23 September 2014 21:00, Ksenia Krasileva <krasileva at ucdavis.edu>
>> wrote:
>>
>>> Dear developers team,
>>>
>>> I am working towards using a custom annotation of wheat genes in
>>> variant
>>> effect prediction with VEP.
>>>
>>> While building cache with gtf2vep.pl, I see that my exon and transcript
>>> features are cached, but CDS features are most likely not read
>>> correctly
>>> (gene biotype gets re-set to 'pseudogene' by fix_transcript and there
>>> is
>>> no translation). VEP is able to use this cache but the prediction is
>>> not
>>> correct as there is no translation or CDS.
>>>
>>> I tried to de-bug by running gtf2vep.pl with an example single exon
>>> gene
>>> from Ensembl annotation for Triticum aestivum. I tried both v22 and v23
>>> annotations and both give me the same result as before - biotype gets
>>> re-set to 'pseudogene' in cache and there is no CDS/translation.
>>> Attached
>>> are two test input gtfs that I am using from
>>> Triticum_aestivum.IWGSP1.22.gff3 and Triticum_aestivum.IWGSP1.23.gff3
>>> respectively.
>>>
>>> The command line is below:
>>>
>>> perl gtf2vep.pl -i test.v23.gff -f IWGSC_CSS_AB-TGAC_v1.fa -d 23 -s
>>> wheat_custom
>>>
>>> I appreciate your suggestions of what might be going on.
>>>
>>> Thank you in advance,
>>>
>>> Ksenia
>>>
>>>
>>> Ksenia Krasileva, PhD
>>> USDA NIFA Post Doctoral Scholar
>>> Department of Plant Sciences
>>> University of California, Davis
>>> 124 Robbins Hall
>>> Davis, CA 95616
>>>
>>> Email: krasileva at ucdavis.edu
>>> Twitter: @kseniakrasileva <https://twitter.com/kseniakrasileva>
>>> Web: http://dubcovskylab.ucdavis.edu/lab-member/ksenia-v-krasileva
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>










More information about the Dev mailing list