[ensembl-dev] Problem with gtf2vep.pl

Harris, Ronald Alan rharris1 at bcm.edu
Wed Jan 22 05:39:07 GMT 2014


Hi Will,

Thanks for your help with this. Yes, I am aware of the Ensembl rhesus gene annotations, but the RhesusBase annotations I am using are newer and include RNA-Seq data, so they are supposed to be better than the Ensembl predictions.

Your modified version of gtf2vep.pl does indeed make the cache files, but I am now running into issues running vep using the cache files. I am getting the following errors. I took your approach to stopping the errors by checking for undefined values and dealing with them. Each successive error occurs after fixing an error.

Can't call method "strand" on an undefined value at /home/rharris1/work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Transcript.pm line 1141, <GEN0> line 5035.

Can't call method "strand" on an undefined value at /home/rharris1/work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Transcript.pm line 1095, <GEN0> line 5035.

Can't call method "phase" on an undefined value at /home/rharris1/work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Transcript.pm line 930, <GEN0> line 45035.

Can't call method "strand" on an undefined value at /home/rharris1/work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Translation.pm line 399, <GEN0> line 45035

Can't call method "strand" on an undefined value at /home/rharris1/work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Translation.pm line 428, <GEN0> line 45035.

With each successive fix, vep runs a bit farther, but I get a lot of the following warnings:

Use of uninitialized value in addition (+) at /home/rharris1/work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Variation/Utils/VariationEffect.pm line 536, <GEN0> line 5035.
Use of uninitialized value in subtraction (-) at /home/rharris1/work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Variation/Utils/VariationEffect.pm line 519, <GEN0> line 5035.

The vep results I do get look to be correct for intergenic, upstream, downstream, and intronic variants, but variants in exons are sometimes not being reported correctly. In several cases, a variant that should be identified as synonymous or missense is identified as "5_prime_UTR_variant,3_prime_UTR_variant".

I just noticed that in the Ensembl gtf file (which I ran through gtf2vep.pl and vep and it worked) the negative strand genes are in reverse chromosome order. I changed my gtf to that order and it ran through the original gtf2vep.pl (before your fixes) without throwing an error, but I still get errors using vep. Should the negative strand genes be in reverse chromosome order?

Please let me know if you have any ideas about this.

Thanks,

Alan
________________________________
From: dev-bounces at ensembl.org [dev-bounces at ensembl.org] On Behalf Of Will McLaren [wm2 at ebi.ac.uk]
Sent: Wednesday, January 08, 2014 7:23 AM
To: Ensembl developers list
Subject: Re: [ensembl-dev] Problem with gtf2vep.pl

Hello Alan,

Thanks for the detailed report. There's an odd bug happening here which I can't get to the bottom of at the moment.

I've added a fix for now which stops the error happening, and the cache builds fine for me from your input.

Since we're in the process of switching our code hosting to Git, for now I have only pushed the fix to our GitHub - you can get the fixed script here:

https://github.com/Ensembl/ensembl-tools/blob/release/74/scripts/variant_effect_predictor/gtf2vep.pl

Let me know if this isn't convenient and I can get the fix pushed to our CVS tree too.

PS I assume you are aware we build a cache file for macaque already? ftp://ftp.ensembl.org/pub/release-74/variation/VEP/

Thanks again

Will McLaren
Ensembl Variation


On 8 January 2014 05:38, Harris, Ronald Alan <rharris1 at bcm.edu<mailto:rharris1 at bcm.edu>> wrote:
Hi,

I have been trying to use gtf2vep.pl<http://gtf2vep.pl> to generate a cache file based on RhesusBase (http://www.rhesusbase.org/) gene annotations on the UCSC rheMac2/Ensembl MMUL_1 assembly. I downloaded their rb2 gene predictions as a gtf file through their UCSC mirror, changed the source column to "protein_coding", added "exon_number" and the appropriate number in the description field, and sorted the annotations by chromosome position. The gtf file can be downloaded from here:

https://bigfile.bcm.edu/download.php?claimID=tnwUAesf9rRRH3u5&claimPasscode=B8mm8RNVZG4Ub6Xy&fid=52811&emailAddr=rharris1@bcm.edu

When I run gtf2vep.pl<http://gtf2vep.pl> I get this error:

Can't call method "start" on an undefined value at gtf2vep.pl<http://gtf2vep.pl> line 376.

This error occurs after generating some of the cache files in the .vep directory. I tried to run gtf2vep.pl<http://gtf2vep.pl> using gtf files with only a single chromosome and it looks like the error consistently occurs when trying to generate the 1-1000000 cache file. Oddly, if I just run gtf2vep.pl<http://gtf2vep.pl> on the annotations from 1-1000000 on a single chromosome I do not get this error.

I don't think this is due to chr in chromosome names because the fasta file I am using has chr in the chromosome names.

I would appreciate any help you could give me with this.

Thanks,

Alan

_______________________________________________
Dev mailing list    Dev at ensembl.org<mailto:Dev at ensembl.org>
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140121/966cc130/attachment.html>


More information about the Dev mailing list