[ensembl-dev] Codon called wrong in VEP when using custom build cache

Heidi Viitaniemi hmviit at utu.fi
Wed Mar 27 12:05:51 GMT 2013


Hi Will,

Just as a follow up, the problem was the old version of the bioperl 
which is in the INSTALL.pl.
Re-installing VEP with a link to pointing to a newer version of bioperl 
did result to the same output with my custom cache asyou got with your run.

Thanks for all your help,
Heidi

27.3.2013 8:53, Heidi Viitaniemi kirjoitti:
> Hi Will,
>
> And thank you very much for looking into this.
>
> The reason why I want to build my own cahce from gtf and fasta of the 
> Gasterosteus_aculeatus is that on groupXIX the two last (of the three 
> ) supercontigs are flipped in the Ensembl genome (Ross&Peichel 2008). 
> I also first thougt that reverse complementing the fasta and gtf for 
> groupXIX was the problem but ENSGACG00000003129 is located on a 
> reagion that I didn't touch in the gtf or the fasta.
>
> I'll check the versions of Bioperl in our server and try using only 
> the fasta and gtf for ENSGACG00000003129.
>
> Thanks,
> Heidi
>
>
> 26.3.2013 16:04, Will McLaren kirjoitti:
>> Hi Heidi,
>>
>> Thanks for your patience, I've had a chance to look at this now.
>>
>> If I build a cache file from the following files:
>>
>> ftp://ftp.ensembl.org/pub/release-70/gtf/gasterosteus_aculeatus/Gasterosteus_aculeatus.BROADS1.70.gtf.gz
>>
>> and
>>
>> ftp://ftp.ensembl.org/pub/release-70/fasta/gasterosteus_aculeatus/dna/Gasterosteus_aculeatus.BROADS1.70.dna.toplevel.fa.gz
>>
>> I get (I think!) the correct output from the VEP:
>>
>> perl gtf2vep.pl <http://gtf2vep.pl> -i 
>> Gasterosteus_aculeatus.BROADS1.70.gtf.gz -fasta 
>> Gasterosteus_aculeatus.BROADS1.70.dna.toplevel.fa -species 
>> gasterosteus_aculeatus -dir test/ -db 70
>> perl variant_effect_predictor.pl <http://variant_effect_predictor.pl> 
>> -i gastero_in.txt -species gasterosteus_aculeatus -force -off -dir 
>> test/ -db 70
>> grep -v # variant_effect_output.txt
>>
>> groupXIX_2822477_C/T    groupXIX:2822477        T ENSGACG00000003129 
>>      ENSGACT00000004109      Transcript      missense_variant  67     
>>   49      17      A/T Gcg/Acg -
>> groupXIX_2822500_T/C    groupXIX:2822500        C ENSGACG00000003129 
>>      ENSGACT00000004109      Transcript      missense_variant  44     
>>   26      9       D/G gAc/gGc -
>> groupXIX_2822523_C/T    groupXIX:2822523        T ENSGACG00000003129 
>>      ENSGACT00000004109      Transcript      initiator_codon_variant 
>>    21      3       1 M/I     atG/atA -
>> groupXIX_2822541_T/A    groupXIX:2822541        A ENSGACG00000003129 
>>      ENSGACT00000004109      Transcript      5_prime_UTR_variant
>>         3       -       -       -       -       -
>>
>> This works the same if I use the version 67 files as it appears you have.
>>
>> So I suspect there is something different about your FASTA file - you 
>> could check that the sequence of the groupXIX file matches that in 
>> the file I link to above (do an md5sum or some such thing).
>>
>> It is also possible that an issue with older versions of BioPerl is 
>> to blame - there was a known bug in the way BioPerl indexes large 
>> FASTA file. Normally for Ensembl we recommend using BioPerl 1.2.3 
>> (which contains the bug), but VEP works fine with the latest version. 
>> I'd try updating your BioPerl install to the latest version, remove 
>> the *.fa.index file that is generated next to your .fa file, and try 
>> re-running gtf2vep.pl <http://gtf2vep.pl>
>>
>> Beyond this it's hard to say what's happening without seeing the 
>> contents of your GTF and FASTA files. If the problem persists, 
>> perhaps you could just pull out the lines in the GTF for 
>> ENSGACT00000004109 and the sequence for groupXIX and if that still 
>> gives you the same problem, send them to me so I can debug.
>>
>> Hope this helps!
>>
>> Will
>>
>>
>> On 19 March 2013 12:07, Heidi Viitaniemi <hmviit at utu.fi 
>> <mailto:hmviit at utu.fi>> wrote:
>>
>>     Hi Will,
>>
>>     And thank you for your response. I'll wait for the solution. I
>>     like the idea that you can incorporate your own data to run VEP.
>>
>>     Thanks,
>>     Heidi Viitaniemi
>>
>>
>>
>>     19.3.2013 13:42, Will McLaren kirjoitti:
>>>     Hello Heidi,
>>>
>>>     Thanks for finding this - the causes of this bug are I believe
>>>     somewhat complex so may take a while to get to the bottom of it.
>>>
>>>     Just wanted to let you know that your mail is not being ignored!
>>>
>>>     Regards
>>>
>>>     Will McLaren
>>>     Ensembl Variation
>>>
>>>
>>>     On 18 March 2013 13:48, Heidi Viitaniemi <hmviit at utu.fi
>>>     <mailto:hmviit at utu.fi>> wrote:
>>>
>>>         Hi,
>>>
>>>         I'm running version 2.7 on a unix server. I want to create a
>>>         custom cache using my own gtf and fasta with gtf2vep.pl
>>>         <http://gtf2vep.pl>. This works without problem and also
>>>         running VEP seems to go fine. The problem is that, in the
>>>         output it seems that the cDNA_position, CDS_position and
>>>         Protein_position are correct given my input gtf file but the
>>>         calls for Amino_acids and Codons seem completely random. If
>>>         I run against the cache retrieved from ensembl these are all
>>>         correct. The version of the genome didn't have an effect on
>>>         the output, the gtf's haven't changed. The gtf and the fasta
>>>         that I'm using for the custom originate from the ensembl
>>>         reference so I don't see any reason why the custom cache
>>>         shouldn't perform the same way as the reference from ensembl
>>>         cache. Could there be bug that somehow messes up the link
>>>         between the custom gtf and fasta in my run? Below are the
>>>         commands I ran and a snippet of the output's I got.
>>>
>>>         Thanks,
>>>         Heidi Viitaniemi
>>>
>>>         For custom cache I'm running (wrong output for Amino_acids
>>>         and Codons)
>>>         perl gtf2vep.pl <http://gtf2vep.pl> -i
>>>         GasAcu1.67_group_xixflip.gtf -f
>>>         gasAcu_group_withoutbac_inv7.fa -d 67 -s
>>>         Gasterosteus_aculeatus_XIXflipped_18032013
>>>         perl variant_effect_predictor.pl
>>>         <http://variant_effect_predictor.pl> -offline 1 -dir
>>>         $HOME/.vep -i ens_realigned_AK_F.var.vcf -format vcf -fork 4
>>>         -db_version 67 -species
>>>         Gasterosteus_aculeatus_XIXflipped_18032013 -numbers
>>>         -per_gene -buffer_size 10000 -o
>>>         VEP_18032013_exon_pergene_AK_F.var.vcf.txt
>>>
>>>         groupXIX_2822477_C/T 	groupXIX:2822477 	T
>>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>>         missense_variant 	67 	49 	17 	G/R 	Gga/Aga 	- 	EXON=1/2
>>>         groupXIX_2822500_T/C 	groupXIX:2822500 	C
>>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>>         missense_variant 	44 	26 	9 	Y/C 	tAt/tGt 	- 	EXON=1/2
>>>         groupXIX_2822523_C/T 	groupXIX:2822523 	T
>>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>>         synonymous_variant 	21 	3 	1 	R 	cgG/cgA 	- 	EXON=1/2
>>>         groupXIX_2822541_T/A 	groupXIX:2822541 	A
>>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>>         5_prime_UTR_variant 	3 	- 	- 	- 	- 	- 	EXON=1/2
>>>
>>>
>>>
>>>         For ensembl cache I'm running (correct output for
>>>         Amino_acids and Codons)
>>>         perl variant_effect_predictor.pl
>>>         <http://variant_effect_predictor.pl> -offline -dir
>>>         $HOME/.vep -i ens_realigned_AK_F.var.vcf -format vcf -fork 4
>>>         -db_version 69 -species gasterosteus_aculeatus -numbers
>>>         -per_gene -buffer_size 10000 -o
>>>         ensVEP_18032013_exon_pergene_AK_F.var.vcf.txt
>>>
>>>         groupXIX_2822477_C/T 	groupXIX:2822477 	T
>>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>>         missense_variant 	67 	49 	17 	A/T 	Gcg/Acg 	- 	EXON=1/2
>>>         groupXIX_2822500_T/C 	groupXIX:2822500 	C
>>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>>         missense_variant 	44 	26 	9 	D/G 	gAc/gGc 	- 	EXON=1/2
>>>         groupXIX_2822523_C/T 	groupXIX:2822523 	T
>>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>>         initiator_codon_variant 	21 	3 	1 	M/I 	atG/atA 	- 	EXON=1/2
>>>         groupXIX_2822541_T/A 	groupXIX:2822541 	A
>>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>>         5_prime_UTR_variant 	3 	- 	- 	- 	- 	- 	EXON=1/2
>>>
>>>
>>>
>>>         -- 
>>>         ______________________________________________
>>>
>>>         Heidi Viitaniemi
>>>         PhD student
>>>         Division of Genetics and Physiology
>>>         Department of Biology
>>>         Itäinen Pitkäkatu 4A, 7th floor (Pharmacity)
>>>         University of Turku
>>>         20520 Turku
>>>
>>>         FINLAND
>>>
>>>
>>>         _______________________________________________
>>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>         Posting guidelines and subscribe/unsubscribe info:
>>>         http://lists.ensembl.org/mailman/listinfo/dev
>>>         Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>>
>>>     _______________________________________________
>>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>>     Ensembl Blog:http://www.ensembl.info/
>>
>>     -- 
>>     ______________________________________________
>>
>>     Heidi Viitaniemi
>>     PhD student
>>     Division of Genetics and Physiology
>>     Department of Biology
>>     Itäinen Pitkäkatu 4A, 7th floor (Pharmacity)
>>     University of Turku
>>     20520 Turku
>>
>>     FINLAND
>>
>>
>>     _______________________________________________
>>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:
>>     http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
> -- 
> ______________________________________________
>
> Heidi Viitaniemi
> PhD student
> Division of Genetics and Physiology
> Department of Biology
> Itäinen Pitkäkatu 4A, 7th floor (Pharmacity)
> University of Turku
> 20520 Turku
>
> FINLAND
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
______________________________________________

Heidi Viitaniemi
PhD student
Division of Genetics and Physiology
Department of Biology
Itäinen Pitkäkatu 4A, 7th floor (Pharmacity)
University of Turku
20520 Turku

FINLAND

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130327/27cfee84/attachment.html>


More information about the Dev mailing list