[ensembl-dev] Possible issues in calculating pSyntax by VEP 2.4
S Venkata Suresh Kumar
suresh.surampudi at india.semanticbits.com
Mon Mar 5 08:22:09 GMT 2012
Following are VCF records, used to calculate cSyntax and pSyntax and
have issues with:
1)
#CHROM POS ID REF ALT QUAL FILTER INFO
1 33147450 9 tcta t 29 Pass DP=154
For this record, out put by VEP is
#CHROM POS ID REF ALT QUAL FILTER INFO
1 33147452 10 taaa t 29 Pass DP=154
CSQ=-|CCDS44105.1|CCDS44105.1|Transcript|DOWNSTREAM||||||||,-|CCDS367.2|CCDS367.2|Transcript|STOP_LOST|1445-1447|1445-1447|482-483|V*/E|gTTTag/gag||CCDS367.2:c.1445_1447delTTT|CCDS367.2.1:p.Val482_alX483delinsGluextX,-|ENSESTG00000001282|ENSESTT00000003076|Transcript|DOWNSTREAM||||||||,-|5928|NM_001135255.1|Transcript|3PRIME_UTR|3581-3583||||||NM_001135255.1:c.*2147_*2149delAAA|,-|5928|NM_001135256.1|Transcript|3PRIME_UTR|3484-3486||||||NM_001135256.1:c.*2147_*2149delAAA|,-|ENSESTG00000001282|ENSESTT00000003082|Transcript|DOWNSTREAM||||||||,-|ENSESTG00000001282|ENSESTT00000003089|Transcript|DOWNSTREAM||||||||,-|5928|NM_005610.2|Transcript|3PRIME_UTR|3584-3586||||||NM_005610.2:c.*2147_*2149delAAA|,-|ENSESTG00000001967|ENSESTT00000004911|Transcript|DOWNSTREAM||||||||,-|CCDS53294.1|CCDS53294.1|Transcript|STOP_GAINED|1365-1367|1365-1367|455-456|CL/*|tgTTTa/tga||CCDS53294.1:c.1365_1367delTTT|CCDS53294.1.1:p.Cys455_Leu456delinsX,-|ENSESTG00000001967|ENSESTT00000004908|Transcript|STOP_LOST|6
20-622|452-454|151-152|V*/E|gTTTag/gag||ENSESTT00000004908.1:c.452_454delTTT|ENSESTP00000004908.1:p.Val151_alX152delinsGluextX16,-|81493|NM_001161708.1|Transcript|STOP_GAINED|1465-1467|1365-1367|455-456|CL/*|tgTTTa/tga||NM_001161708.1:c.1365_1367delTTT|NP_001155180.1.1:p.Cys455_Leu456delinsX,-|81493|NM_030786.2|Transcript|STOP_LOST|1545-1547|1445-1447|482-483|V*/E|gTTTag/gag||NM_030786.2:c.1445_1447delTTT|NP_110413.2.1:p.Val482_alX483delinsGluextX16,-|CCDS44106.1|CCDS44106.1|Transcript|DOWNSTREAM||||||||,-|CCDS366.1|CCDS366.1|Transcript|DOWNSTREAM||||||||
For cSyntax NM_030786.2:c.1445_1447delTTT, pSyntax is given as
"NP_110413.2.1:p.Val482_alX483delinsGluextX16". Should this be
"NP_110413.2.1:p.Val482_X483delinsGluextX16" ? (Please observe that
additional text "al" has cropped-up in pSyntax)
2) Differences between mutalyzer output vs ensembl output in pSyntax
For cSyntax NM_001161708.1:c.1429_1430insCTA, ensembl's pSyntax output
is NP_001155180.1.1:p.X477delinsSerLysextX56. Ensembl pSyntax indicates
that there is a deletion of stop codon and insertion of two aminoacids
(SK) at position 477.
For same cSyntax, mutalyzer output is
NM_001161708.1(SYNC_i001):p.(*477Serext*57)
For cSyntax NM_001161708.1:c.1429_1430insCTA, should pSyntax not be:
NP_001155180.1.1:p.X477SerextX57?
3) Possible error in calculating stop codon (between ensembl and mutalyzer)
For deletion variant cSyntax NM_030786.2:c.1446_1448delTTA, pSyntax
output is NP_110413.2.1:p.X483delextX16. This means that from position
483, new stop codon is 16 codons away (excluding 483 and stop codon)
But new stop codon is 17 codons away (as per mutalyzers output,
excluding 483 and stop codon).
mutalyzer's output is: NM_030786.2(SYNC_i001):p.(*483Cysext*17)
3) Non-uniform pSyntax calculation/output between insertions and deletions
For eg.
a) For deletion variant with cSyntax NM_030786.2:c.1446_1448delTTA,
pSyntax output is NP_110413.2.1:p.X483delextX16. Here new aminoacid
replaces stop codon (C) is not calculated and not displayed.
b) For insertion variant with cSyntax NM_001161708.1:c.1429_1430insCTA,
pSyntax output is NP_001155180.1.1:p.X477delinsSerLysextX56, wherein new
amino acid that replaces stop codon is calculated and shown as delins.
===============
Environment:
VEP version: 2.4
perl version:5.14.2
Bioperl version: 1.6.901-2
Linux: 3.2.0-17-generic (ubuntu 12.04)
=====================
--
Regards
vs
More information about the Dev
mailing list