[ensembl-dev] Codon called wrong in VEP when using custom build cache
Heidi Viitaniemi
hmviit at utu.fi
Wed Mar 27 06:53:58 GMT 2013
Hi Will,
And thank you very much for looking into this.
The reason why I want to build my own cahce from gtf and fasta of the
Gasterosteus_aculeatus is that on groupXIX the two last (of the three )
supercontigs are flipped in the Ensembl genome (Ross&Peichel 2008). I
also first thougt that reverse complementing the fasta and gtf for
groupXIX was the problem but ENSGACG00000003129 is located on a reagion
that I didn't touch in the gtf or the fasta.
I'll check the versions of Bioperl in our server and try using only the
fasta and gtf for ENSGACG00000003129.
Thanks,
Heidi
26.3.2013 16:04, Will McLaren kirjoitti:
> Hi Heidi,
>
> Thanks for your patience, I've had a chance to look at this now.
>
> If I build a cache file from the following files:
>
> ftp://ftp.ensembl.org/pub/release-70/gtf/gasterosteus_aculeatus/Gasterosteus_aculeatus.BROADS1.70.gtf.gz
>
> and
>
> ftp://ftp.ensembl.org/pub/release-70/fasta/gasterosteus_aculeatus/dna/Gasterosteus_aculeatus.BROADS1.70.dna.toplevel.fa.gz
>
> I get (I think!) the correct output from the VEP:
>
> perl gtf2vep.pl <http://gtf2vep.pl> -i
> Gasterosteus_aculeatus.BROADS1.70.gtf.gz -fasta
> Gasterosteus_aculeatus.BROADS1.70.dna.toplevel.fa -species
> gasterosteus_aculeatus -dir test/ -db 70
> perl variant_effect_predictor.pl <http://variant_effect_predictor.pl>
> -i gastero_in.txt -species gasterosteus_aculeatus -force -off -dir
> test/ -db 70
> grep -v # variant_effect_output.txt
>
> groupXIX_2822477_C/T groupXIX:2822477 T ENSGACG00000003129
> ENSGACT00000004109 Transcript missense_variant 67 49
> 17 A/T Gcg/Acg -
> groupXIX_2822500_T/C groupXIX:2822500 C ENSGACG00000003129
> ENSGACT00000004109 Transcript missense_variant 44 26
> 9 D/G gAc/gGc -
> groupXIX_2822523_C/T groupXIX:2822523 T ENSGACG00000003129
> ENSGACT00000004109 Transcript initiator_codon_variant 21
> 3 1 M/I atG/atA -
> groupXIX_2822541_T/A groupXIX:2822541 A ENSGACG00000003129
> ENSGACT00000004109 Transcript 5_prime_UTR_variant
> 3 - - - - -
>
> This works the same if I use the version 67 files as it appears you have.
>
> So I suspect there is something different about your FASTA file - you
> could check that the sequence of the groupXIX file matches that in the
> file I link to above (do an md5sum or some such thing).
>
> It is also possible that an issue with older versions of BioPerl is to
> blame - there was a known bug in the way BioPerl indexes large FASTA
> file. Normally for Ensembl we recommend using BioPerl 1.2.3 (which
> contains the bug), but VEP works fine with the latest version. I'd try
> updating your BioPerl install to the latest version, remove the
> *.fa.index file that is generated next to your .fa file, and try
> re-running gtf2vep.pl <http://gtf2vep.pl>
>
> Beyond this it's hard to say what's happening without seeing the
> contents of your GTF and FASTA files. If the problem persists, perhaps
> you could just pull out the lines in the GTF for ENSGACT00000004109
> and the sequence for groupXIX and if that still gives you the same
> problem, send them to me so I can debug.
>
> Hope this helps!
>
> Will
>
>
> On 19 March 2013 12:07, Heidi Viitaniemi <hmviit at utu.fi
> <mailto:hmviit at utu.fi>> wrote:
>
> Hi Will,
>
> And thank you for your response. I'll wait for the solution. I
> like the idea that you can incorporate your own data to run VEP.
>
> Thanks,
> Heidi Viitaniemi
>
>
>
> 19.3.2013 13:42, Will McLaren kirjoitti:
>> Hello Heidi,
>>
>> Thanks for finding this - the causes of this bug are I believe
>> somewhat complex so may take a while to get to the bottom of it.
>>
>> Just wanted to let you know that your mail is not being ignored!
>>
>> Regards
>>
>> Will McLaren
>> Ensembl Variation
>>
>>
>> On 18 March 2013 13:48, Heidi Viitaniemi <hmviit at utu.fi
>> <mailto:hmviit at utu.fi>> wrote:
>>
>> Hi,
>>
>> I'm running version 2.7 on a unix server. I want to create a
>> custom cache using my own gtf and fasta with gtf2vep.pl
>> <http://gtf2vep.pl>. This works without problem and also
>> running VEP seems to go fine. The problem is that, in the
>> output it seems that the cDNA_position, CDS_position and
>> Protein_position are correct given my input gtf file but the
>> calls for Amino_acids and Codons seem completely random. If I
>> run against the cache retrieved from ensembl these are all
>> correct. The version of the genome didn't have an effect on
>> the output, the gtf's haven't changed. The gtf and the fasta
>> that I'm using for the custom originate from the ensembl
>> reference so I don't see any reason why the custom cache
>> shouldn't perform the same way as the reference from ensembl
>> cache. Could there be bug that somehow messes up the link
>> between the custom gtf and fasta in my run? Below are the
>> commands I ran and a snippet of the output's I got.
>>
>> Thanks,
>> Heidi Viitaniemi
>>
>> For custom cache I'm running (wrong output for Amino_acids
>> and Codons)
>> perl gtf2vep.pl <http://gtf2vep.pl> -i
>> GasAcu1.67_group_xixflip.gtf -f
>> gasAcu_group_withoutbac_inv7.fa -d 67 -s
>> Gasterosteus_aculeatus_XIXflipped_18032013
>> perl variant_effect_predictor.pl
>> <http://variant_effect_predictor.pl> -offline 1 -dir
>> $HOME/.vep -i ens_realigned_AK_F.var.vcf -format vcf -fork 4
>> -db_version 67 -species
>> Gasterosteus_aculeatus_XIXflipped_18032013 -numbers -per_gene
>> -buffer_size 10000 -o VEP_18032013_exon_pergene_AK_F.var.vcf.txt
>>
>> groupXIX_2822477_C/T groupXIX:2822477 T
>> ENSGACG00000003129 ENSGACT00000004109 Transcript
>> missense_variant 67 49 17 G/R Gga/Aga - EXON=1/2
>> groupXIX_2822500_T/C groupXIX:2822500 C
>> ENSGACG00000003129 ENSGACT00000004109 Transcript
>> missense_variant 44 26 9 Y/C tAt/tGt - EXON=1/2
>> groupXIX_2822523_C/T groupXIX:2822523 T
>> ENSGACG00000003129 ENSGACT00000004109 Transcript
>> synonymous_variant 21 3 1 R cgG/cgA - EXON=1/2
>> groupXIX_2822541_T/A groupXIX:2822541 A
>> ENSGACG00000003129 ENSGACT00000004109 Transcript
>> 5_prime_UTR_variant 3 - - - - - EXON=1/2
>>
>>
>>
>> For ensembl cache I'm running (correct output for Amino_acids
>> and Codons)
>> perl variant_effect_predictor.pl
>> <http://variant_effect_predictor.pl> -offline -dir $HOME/.vep
>> -i ens_realigned_AK_F.var.vcf -format vcf -fork 4 -db_version
>> 69 -species gasterosteus_aculeatus -numbers -per_gene
>> -buffer_size 10000 -o
>> ensVEP_18032013_exon_pergene_AK_F.var.vcf.txt
>>
>> groupXIX_2822477_C/T groupXIX:2822477 T
>> ENSGACG00000003129 ENSGACT00000004109 Transcript
>> missense_variant 67 49 17 A/T Gcg/Acg - EXON=1/2
>> groupXIX_2822500_T/C groupXIX:2822500 C
>> ENSGACG00000003129 ENSGACT00000004109 Transcript
>> missense_variant 44 26 9 D/G gAc/gGc - EXON=1/2
>> groupXIX_2822523_C/T groupXIX:2822523 T
>> ENSGACG00000003129 ENSGACT00000004109 Transcript
>> initiator_codon_variant 21 3 1 M/I atG/atA - EXON=1/2
>> groupXIX_2822541_T/A groupXIX:2822541 A
>> ENSGACG00000003129 ENSGACT00000004109 Transcript
>> 5_prime_UTR_variant 3 - - - - - EXON=1/2
>>
>>
>>
>> --
>> ______________________________________________
>>
>> Heidi Viitaniemi
>> PhD student
>> Division of Genetics and Physiology
>> Department of Biology
>> Itäinen Pitkäkatu 4A, 7th floor (Pharmacity)
>> University of Turku
>> 20520 Turku
>>
>> FINLAND
>>
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>>
>> _______________________________________________
>> Dev mailing listDev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog:http://www.ensembl.info/
>
> --
> ______________________________________________
>
> Heidi Viitaniemi
> PhD student
> Division of Genetics and Physiology
> Department of Biology
> Itäinen Pitkäkatu 4A, 7th floor (Pharmacity)
> University of Turku
> 20520 Turku
>
> FINLAND
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
--
______________________________________________
Heidi Viitaniemi
PhD student
Division of Genetics and Physiology
Department of Biology
Itäinen Pitkäkatu 4A, 7th floor (Pharmacity)
University of Turku
20520 Turku
FINLAND
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130327/9dfa8dab/attachment.html>
More information about the Dev
mailing list