[ensembl-dev] Problem with gtf2vep.pl

Will McLaren wm2 at ebi.ac.uk
Thu Jan 23 12:09:51 GMT 2014


Hi Alan,

I got a chance to look at this again, and I've found a couple of other
issues with the script. I've fixed these and updated the script again on
GitHub.

There was also an issue with your GTF file, in that you had transcript
entries denoted as being protein_coding but without CDS entries to define
the coding sequence region.

I've added a fix to the script such that if these are found the transcript
is converted to the pseudogene biotype, but of course it would be best to
fix the input type.

I've tested a cache built with the initial GTF you sent to me and it works
across a random set of 250k variants from dbSNP, so I think it's working OK
now.

Cheers

Will


On 22 January 2014 05:39, Harris, Ronald Alan <rharris1 at bcm.edu> wrote:

>  Hi Will,
>
> Thanks for your help with this. Yes, I am aware of the Ensembl rhesus
> gene annotations, but the RhesusBase annotations I am using are newer and
> include RNA-Seq data, so they are supposed to be better than the Ensemblpredictions.
>
> Your modified version of gtf2vep.pl does indeed make the cache files, but
> I am now running into issues running vep using the cache files. I am
> getting the following errors. I took your approach to stopping the errors
> by checking for undefined values and dealing with them. Each successive
> error occurs after fixing an error.
>
> Can't call method "strand" on an undefined value at /home/rharris1
> /work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Transcript.pmline 1141, <GEN0>
> line 5035.
>
> Can't call method "strand" on an undefined value at /home/rharris1
> /work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Transcript.pmline 1095, <GEN0>
> line 5035.
>
> Can't call method "phase" on an undefined value at /home/rharris1
> /work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Transcript.pmline 930, <GEN0>
> line 45035.
>
> Can't call method "strand" on an undefined value at /home/rharris1
> /work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Translation.pmline 399, <GEN0>
> line 45035
>
> Can't call method "strand" on an undefined value at /home/rharris1
> /work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Translation.pmline 428, <GEN0>
> line 45035.
>
> With each successive fix, vep runs a bit farther, but I get a lot of the
> following warnings:
>
> Use of uninitialized value in addition (+) at /home/rharris1/work/ensembl
> /vep/variant_effect_predictor//Bio/EnsEMBL/Variation/Utils
> /VariationEffect.pm line 536, <GEN0> line 5035.
> Use of uninitialized value in subtraction (-) at /home/rharris1
> /work/ensembl/vep/variant_effect_predictor//Bio/EnsEMBL/Variation/Utils
> /VariationEffect.pm line 519, <GEN0> line 5035.
>
> The vep results I do get look to be correct for intergenic, upstream,
> downstream, and intronic variants, but variants in exons are sometimes
> not being reported correctly. In several cases, a variant that should be
> identified as synonymous or missense is identified as
> "5_prime_UTR_variant,3_prime_UTR_variant".
>
> I just noticed that in the Ensembl gtf file (which I ran through
> gtf2vep.pl and vep and it worked) the negative strand genes are in
> reverse chromosome order. I changed my gtf to that order and it ran
> through the original gtf2vep.pl (before your fixes) without throwing an
> error, but I still get errors using vep. Should the negative strand
> genes be in reverse chromosome order?
>
> Please let me know if you have any ideas about this.
>
> Thanks,
>
> Alan
>  ------------------------------
> *From:* dev-bounces at ensembl.org [dev-bounces at ensembl.org] On Behalf Of
> Will McLaren [wm2 at ebi.ac.uk]
> *Sent:* Wednesday, January 08, 2014 7:23 AM
> *To:* Ensembl developers list
> *Subject:* Re: [ensembl-dev] Problem with gtf2vep.pl
>
>   Hello Alan,
>
>  Thanks for the detailed report. There's an odd bug happening here which
> I can't get to the bottom of at the moment.
>
>  I've added a fix for now which stops the error happening, and the cache
> builds fine for me from your input.
>
>  Since we're in the process of switching our code hosting to Git, for now
> I have only pushed the fix to our GitHub - you can get the fixed script
> here:
>
>
> https://github.com/Ensembl/ensembl-tools/blob/release/74/scripts/variant_effect_predictor/gtf2vep.pl
>
>  Let me know if this isn't convenient and I can get the fix pushed to our
> CVS tree too.
>
>  PS I assume you are aware we build a cache file for macaque already?
> ftp://ftp.ensembl.org/pub/release-74/variation/VEP/
>
>  Thanks again
>
>  Will McLaren
> Ensembl Variation
>
>
> On 8 January 2014 05:38, Harris, Ronald Alan <rharris1 at bcm.edu> wrote:
>
>>   Hi,
>>
>> I have been trying to use gtf2vep.pl to generate a cache file based
>> on RhesusBase (http://www.rhesusbase.org/) gene annotations on the UCSCrheMac2
>> /Ensembl MMUL_1 assembly. I downloaded their rb2 gene predictions as
>> a gtf file through their UCSC mirror, changed the source column to
>> "protein_coding", added "exon_number" and the appropriate number in the
>> description field, and sorted the annotations by chromosome position.
>> The gtf file can be downloaded from here:
>>
>>
>> https://bigfile.bcm.edu/download.php?claimID=tnwUAesf9rRRH3u5&claimPasscode=B8mm8RNVZG4Ub6Xy&fid=52811&emailAddr=rharris1@bcm.edu
>>
>> When I run gtf2vep.pl I get this error:
>>
>> Can't call method "start" on an undefined value at gtf2vep.pl line 376.
>>
>> This error occurs after generating some of the cache files in the .vepdirectory. I tried to run
>> gtf2vep.pl using gtf files with only a single chromosome and it looks
>> like the error consistently occurs when trying to generate the 1-1000000
>> cache file. Oddly, if I just run gtf2vep.pl on the annotations from
>> 1-1000000 on a single chromosome I do not get this error.
>>
>> I don't think this is due to chr in chromosome names because the fastafile I am using has chrin the chromosome names.
>>
>> I would appreciate any help you could give me with this.
>>
>> Thanks,
>>
>> Alan
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140123/e0aceaa3/attachment.html>


More information about the Dev mailing list