[ensembl-dev] Possible issues in calculating pSyntax by VEP 2.4

S Venkata Suresh Kumar suresh.surampudi at india.semanticbits.com
Mon Mar 5 08:22:09 GMT 2012


Following are VCF records, used to calculate cSyntax and pSyntax and
have issues with:

1)	

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
1	33147450	9	tcta	t	29	Pass	DP=154

For this record, out put by VEP is

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
1	33147452	10	taaa	t	29	Pass	DP=154
CSQ=-|CCDS44105.1|CCDS44105.1|Transcript|DOWNSTREAM||||||||,-|CCDS367.2|CCDS367.2|Transcript|STOP_LOST|1445-1447|1445-1447|482-483|V*/E|gTTTag/gag||CCDS367.2:c.1445_1447delTTT|CCDS367.2.1:p.Val482_alX483delinsGluextX,-|ENSESTG00000001282|ENSESTT00000003076|Transcript|DOWNSTREAM||||||||,-|5928|NM_001135255.1|Transcript|3PRIME_UTR|3581-3583||||||NM_001135255.1:c.*2147_*2149delAAA|,-|5928|NM_001135256.1|Transcript|3PRIME_UTR|3484-3486||||||NM_001135256.1:c.*2147_*2149delAAA|,-|ENSESTG00000001282|ENSESTT00000003082|Transcript|DOWNSTREAM||||||||,-|ENSESTG00000001282|ENSESTT00000003089|Transcript|DOWNSTREAM||||||||,-|5928|NM_005610.2|Transcript|3PRIME_UTR|3584-3586||||||NM_005610.2:c.*2147_*2149delAAA|,-|ENSESTG00000001967|ENSESTT00000004911|Transcript|DOWNSTREAM||||||||,-|CCDS53294.1|CCDS53294.1|Transcript|STOP_GAINED|1365-1367|1365-1367|455-456|CL/*|tgTTTa/tga||CCDS53294.1:c.1365_1367delTTT|CCDS53294.1.1:p.Cys455_Leu456delinsX,-|ENSESTG00000001967|ENSESTT00000004908|Transcript|STOP_LOST|6
20-622|452-454|151-152|V*/E|gTTTag/gag||ENSESTT00000004908.1:c.452_454delTTT|ENSESTP00000004908.1:p.Val151_alX152delinsGluextX16,-|81493|NM_001161708.1|Transcript|STOP_GAINED|1465-1467|1365-1367|455-456|CL/*|tgTTTa/tga||NM_001161708.1:c.1365_1367delTTT|NP_001155180.1.1:p.Cys455_Leu456delinsX,-|81493|NM_030786.2|Transcript|STOP_LOST|1545-1547|1445-1447|482-483|V*/E|gTTTag/gag||NM_030786.2:c.1445_1447delTTT|NP_110413.2.1:p.Val482_alX483delinsGluextX16,-|CCDS44106.1|CCDS44106.1|Transcript|DOWNSTREAM||||||||,-|CCDS366.1|CCDS366.1|Transcript|DOWNSTREAM||||||||


For cSyntax NM_030786.2:c.1445_1447delTTT, pSyntax is given as
"NP_110413.2.1:p.Val482_alX483delinsGluextX16". Should this be
"NP_110413.2.1:p.Val482_X483delinsGluextX16" ? (Please observe that
additional text "al" has cropped-up in pSyntax)


2) Differences between mutalyzer output vs ensembl output in pSyntax

For cSyntax NM_001161708.1:c.1429_1430insCTA, ensembl's pSyntax output
is NP_001155180.1.1:p.X477delinsSerLysextX56. Ensembl pSyntax indicates
that there is a deletion of stop codon and insertion of two aminoacids
(SK) at position 477.

For same cSyntax, mutalyzer output is
NM_001161708.1(SYNC_i001):p.(*477Serext*57)

For cSyntax NM_001161708.1:c.1429_1430insCTA, should pSyntax not be:
NP_001155180.1.1:p.X477SerextX57?

3) Possible error in calculating stop codon (between ensembl and mutalyzer)

For deletion variant cSyntax NM_030786.2:c.1446_1448delTTA, pSyntax
output is NP_110413.2.1:p.X483delextX16. This means that from position
483, new stop codon is 16 codons away (excluding 483 and stop codon)

But new stop codon is 17 codons away (as per mutalyzers output,
excluding 483 and stop codon).

mutalyzer's output is: NM_030786.2(SYNC_i001):p.(*483Cysext*17)

3) Non-uniform pSyntax calculation/output between insertions and deletions

For eg.

a) For deletion variant with cSyntax NM_030786.2:c.1446_1448delTTA,
pSyntax output is NP_110413.2.1:p.X483delextX16. Here new aminoacid
replaces stop codon (C) is not calculated and not displayed.

b) For insertion variant with cSyntax NM_001161708.1:c.1429_1430insCTA,
pSyntax output is NP_001155180.1.1:p.X477delinsSerLysextX56, wherein new
amino acid that replaces stop codon is calculated and shown as delins.

===============
Environment:
VEP version: 2.4
perl version:5.14.2
Bioperl version: 1.6.901-2
Linux: 3.2.0-17-generic (ubuntu 12.04)
=====================
-- 
Regards
vs




More information about the Dev mailing list