[ensembl-dev] Codon called wrong in VEP when using custom build cache

Heidi Viitaniemi hmviit at utu.fi
Wed Mar 27 06:53:58 GMT 2013


Hi Will,

And thank you very much for looking into this.

The reason why I want to build my own cahce from gtf and fasta of the 
Gasterosteus_aculeatus is that on groupXIX the two last (of the three ) 
supercontigs are flipped in the Ensembl genome (Ross&Peichel 2008). I 
also first thougt that reverse complementing the fasta and gtf for 
groupXIX was the problem but ENSGACG00000003129 is located on a reagion 
that I didn't touch in the gtf or the fasta.

I'll check the versions of Bioperl in our server and try using only the 
fasta and gtf for ENSGACG00000003129.

Thanks,
Heidi


26.3.2013 16:04, Will McLaren kirjoitti:
> Hi Heidi,
>
> Thanks for your patience, I've had a chance to look at this now.
>
> If I build a cache file from the following files:
>
> ftp://ftp.ensembl.org/pub/release-70/gtf/gasterosteus_aculeatus/Gasterosteus_aculeatus.BROADS1.70.gtf.gz
>
> and
>
> ftp://ftp.ensembl.org/pub/release-70/fasta/gasterosteus_aculeatus/dna/Gasterosteus_aculeatus.BROADS1.70.dna.toplevel.fa.gz
>
> I get (I think!) the correct output from the VEP:
>
> perl gtf2vep.pl <http://gtf2vep.pl> -i 
> Gasterosteus_aculeatus.BROADS1.70.gtf.gz -fasta 
> Gasterosteus_aculeatus.BROADS1.70.dna.toplevel.fa -species 
> gasterosteus_aculeatus -dir test/ -db 70
> perl variant_effect_predictor.pl <http://variant_effect_predictor.pl> 
> -i gastero_in.txt -species gasterosteus_aculeatus -force -off -dir 
> test/ -db 70
> grep -v # variant_effect_output.txt
>
> groupXIX_2822477_C/T    groupXIX:2822477        T ENSGACG00000003129   
>    ENSGACT00000004109      Transcript    missense_variant  67       49 
>      17      A/T Gcg/Acg -
> groupXIX_2822500_T/C    groupXIX:2822500        C ENSGACG00000003129   
>    ENSGACT00000004109      Transcript    missense_variant  44       26 
>      9       D/G gAc/gGc -
> groupXIX_2822523_C/T    groupXIX:2822523        T ENSGACG00000003129   
>    ENSGACT00000004109      Transcript    initiator_codon_variant    21 
>      3       1       M/I   atG/atA -
> groupXIX_2822541_T/A    groupXIX:2822541        A ENSGACG00000003129   
>    ENSGACT00000004109      Transcript    5_prime_UTR_variant
>         3       -       -       -       -       -
>
> This works the same if I use the version 67 files as it appears you have.
>
> So I suspect there is something different about your FASTA file - you 
> could check that the sequence of the groupXIX file matches that in the 
> file I link to above (do an md5sum or some such thing).
>
> It is also possible that an issue with older versions of BioPerl is to 
> blame - there was a known bug in the way BioPerl indexes large FASTA 
> file. Normally for Ensembl we recommend using BioPerl 1.2.3 (which 
> contains the bug), but VEP works fine with the latest version. I'd try 
> updating your BioPerl install to the latest version, remove the 
> *.fa.index file that is generated next to your .fa file, and try 
> re-running gtf2vep.pl <http://gtf2vep.pl>
>
> Beyond this it's hard to say what's happening without seeing the 
> contents of your GTF and FASTA files. If the problem persists, perhaps 
> you could just pull out the lines in the GTF for ENSGACT00000004109 
> and the sequence for groupXIX and if that still gives you the same 
> problem, send them to me so I can debug.
>
> Hope this helps!
>
> Will
>
>
> On 19 March 2013 12:07, Heidi Viitaniemi <hmviit at utu.fi 
> <mailto:hmviit at utu.fi>> wrote:
>
>     Hi Will,
>
>     And thank you for your response. I'll wait for the solution. I
>     like the idea that you can incorporate your own data to run VEP.
>
>     Thanks,
>     Heidi Viitaniemi
>
>
>
>     19.3.2013 13:42, Will McLaren kirjoitti:
>>     Hello Heidi,
>>
>>     Thanks for finding this - the causes of this bug are I believe
>>     somewhat complex so may take a while to get to the bottom of it.
>>
>>     Just wanted to let you know that your mail is not being ignored!
>>
>>     Regards
>>
>>     Will McLaren
>>     Ensembl Variation
>>
>>
>>     On 18 March 2013 13:48, Heidi Viitaniemi <hmviit at utu.fi
>>     <mailto:hmviit at utu.fi>> wrote:
>>
>>         Hi,
>>
>>         I'm running version 2.7 on a unix server. I want to create a
>>         custom cache using my own gtf and fasta with gtf2vep.pl
>>         <http://gtf2vep.pl>. This works without problem and also
>>         running VEP seems to go fine. The problem is that, in the
>>         output it seems that the cDNA_position, CDS_position and
>>         Protein_position are correct given my input gtf file but the
>>         calls for Amino_acids and Codons seem completely random. If I
>>         run against the cache retrieved from ensembl these are all
>>         correct. The version of the genome didn't have an effect on
>>         the output, the gtf's haven't changed. The gtf and the fasta
>>         that I'm using for the custom originate from the ensembl
>>         reference so I don't see any reason why the custom cache
>>         shouldn't perform the same way as the reference from ensembl
>>         cache. Could there be bug that somehow messes up the link
>>         between the custom gtf and fasta in my run? Below are the
>>         commands I ran and a snippet of the output's I got.
>>
>>         Thanks,
>>         Heidi Viitaniemi
>>
>>         For custom cache I'm running (wrong output for Amino_acids
>>         and Codons)
>>         perl gtf2vep.pl <http://gtf2vep.pl> -i
>>         GasAcu1.67_group_xixflip.gtf -f
>>         gasAcu_group_withoutbac_inv7.fa -d 67 -s
>>         Gasterosteus_aculeatus_XIXflipped_18032013
>>         perl variant_effect_predictor.pl
>>         <http://variant_effect_predictor.pl> -offline 1 -dir
>>         $HOME/.vep -i ens_realigned_AK_F.var.vcf -format vcf -fork 4
>>         -db_version 67 -species
>>         Gasterosteus_aculeatus_XIXflipped_18032013 -numbers -per_gene
>>         -buffer_size 10000 -o VEP_18032013_exon_pergene_AK_F.var.vcf.txt
>>
>>         groupXIX_2822477_C/T 	groupXIX:2822477 	T
>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>         missense_variant 	67 	49 	17 	G/R 	Gga/Aga 	- 	EXON=1/2
>>         groupXIX_2822500_T/C 	groupXIX:2822500 	C
>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>         missense_variant 	44 	26 	9 	Y/C 	tAt/tGt 	- 	EXON=1/2
>>         groupXIX_2822523_C/T 	groupXIX:2822523 	T
>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>         synonymous_variant 	21 	3 	1 	R 	cgG/cgA 	- 	EXON=1/2
>>         groupXIX_2822541_T/A 	groupXIX:2822541 	A
>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>         5_prime_UTR_variant 	3 	- 	- 	- 	- 	- 	EXON=1/2
>>
>>
>>
>>         For ensembl cache I'm running (correct output for Amino_acids
>>         and Codons)
>>         perl variant_effect_predictor.pl
>>         <http://variant_effect_predictor.pl> -offline -dir $HOME/.vep
>>         -i ens_realigned_AK_F.var.vcf -format vcf -fork 4 -db_version
>>         69 -species gasterosteus_aculeatus -numbers -per_gene
>>         -buffer_size 10000 -o
>>         ensVEP_18032013_exon_pergene_AK_F.var.vcf.txt
>>
>>         groupXIX_2822477_C/T 	groupXIX:2822477 	T
>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>         missense_variant 	67 	49 	17 	A/T 	Gcg/Acg 	- 	EXON=1/2
>>         groupXIX_2822500_T/C 	groupXIX:2822500 	C
>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>         missense_variant 	44 	26 	9 	D/G 	gAc/gGc 	- 	EXON=1/2
>>         groupXIX_2822523_C/T 	groupXIX:2822523 	T
>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>         initiator_codon_variant 	21 	3 	1 	M/I 	atG/atA 	- 	EXON=1/2
>>         groupXIX_2822541_T/A 	groupXIX:2822541 	A
>>         ENSGACG00000003129 	ENSGACT00000004109 	Transcript
>>         5_prime_UTR_variant 	3 	- 	- 	- 	- 	- 	EXON=1/2
>>
>>
>>
>>         -- 
>>         ______________________________________________
>>
>>         Heidi Viitaniemi
>>         PhD student
>>         Division of Genetics and Physiology
>>         Department of Biology
>>         Itäinen Pitkäkatu 4A, 7th floor (Pharmacity)
>>         University of Turku
>>         20520 Turku
>>
>>         FINLAND
>>
>>
>>         _______________________________________________
>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>         Posting guidelines and subscribe/unsubscribe info:
>>         http://lists.ensembl.org/mailman/listinfo/dev
>>         Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>>     _______________________________________________
>>     Dev mailing listDev at ensembl.org  <mailto:Dev at ensembl.org>
>>     Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>>     Ensembl Blog:http://www.ensembl.info/
>
>     -- 
>     ______________________________________________
>
>     Heidi Viitaniemi
>     PhD student
>     Division of Genetics and Physiology
>     Department of Biology
>     Itäinen Pitkäkatu 4A, 7th floor (Pharmacity)
>     University of Turku
>     20520 Turku
>
>     FINLAND
>
>
>     _______________________________________________
>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-- 
______________________________________________

Heidi Viitaniemi
PhD student
Division of Genetics and Physiology
Department of Biology
Itäinen Pitkäkatu 4A, 7th floor (Pharmacity)
University of Turku
20520 Turku

FINLAND

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130327/9dfa8dab/attachment.html>


More information about the Dev mailing list