[ensembl-dev] Possible issues in calculating pSyntax by VEP 2.4

Will McLaren wm2 at ebi.ac.uk
Mon Mar 5 10:12:25 GMT 2012


Hello,

Thanks for pointing these out.

The logic that goes into producing these HGVS notations is extremely
complex, and as you have found, there are sometimes corner cases where
the VEP (and indeed other programs) produce the wrong output.

We'll take a look at the code to see if we can sort these cases out -
it's always useful to have difficult cases to test the code on!

Thanks

Will McLaren
Ensembl Variation

On 5 March 2012 08:22, S Venkata Suresh Kumar
<suresh.surampudi at india.semanticbits.com> wrote:
> Following are VCF records, used to calculate cSyntax and pSyntax and
> have issues with:
>
> 1)
>
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
> 1       33147450        9       tcta    t       29      Pass    DP=154
>
> For this record, out put by VEP is
>
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
> 1       33147452        10      taaa    t       29      Pass    DP=154
> CSQ=-|CCDS44105.1|CCDS44105.1|Transcript|DOWNSTREAM||||||||,-|CCDS367.2|CCDS367.2|Transcript|STOP_LOST|1445-1447|1445-1447|482-483|V*/E|gTTTag/gag||CCDS367.2:c.1445_1447delTTT|CCDS367.2.1:p.Val482_alX483delinsGluextX,-|ENSESTG00000001282|ENSESTT00000003076|Transcript|DOWNSTREAM||||||||,-|5928|NM_001135255.1|Transcript|3PRIME_UTR|3581-3583||||||NM_001135255.1:c.*2147_*2149delAAA|,-|5928|NM_001135256.1|Transcript|3PRIME_UTR|3484-3486||||||NM_001135256.1:c.*2147_*2149delAAA|,-|ENSESTG00000001282|ENSESTT00000003082|Transcript|DOWNSTREAM||||||||,-|ENSESTG00000001282|ENSESTT00000003089|Transcript|DOWNSTREAM||||||||,-|5928|NM_005610.2|Transcript|3PRIME_UTR|3584-3586||||||NM_005610.2:c.*2147_*2149delAAA|,-|ENSESTG00000001967|ENSESTT00000004911|Transcript|DOWNSTREAM||||||||,-|CCDS53294.1|CCDS53294.1|Transcript|STOP_GAINED|1365-1367|1365-1367|455-456|CL/*|tgTTTa/tga||CCDS53294.1:c.1365_1367delTTT|CCDS53294.1.1:p.Cys455_Leu456delinsX,-|ENSESTG00000001967|ENSESTT00000004908|Transcript|ST
>  OP_LOST|6
> 20-622|452-454|151-152|V*/E|gTTTag/gag||ENSESTT00000004908.1:c.452_454delTTT|ENSESTP00000004908.1:p.Val151_alX152delinsGluextX16,-|81493|NM_001161708.1|Transcript|STOP_GAINED|1465-1467|1365-1367|455-456|CL/*|tgTTTa/tga||NM_001161708.1:c.1365_1367delTTT|NP_001155180.1.1:p.Cys455_Leu456delinsX,-|81493|NM_030786.2|Transcript|STOP_LOST|1545-1547|1445-1447|482-483|V*/E|gTTTag/gag||NM_030786.2:c.1445_1447delTTT|NP_110413.2.1:p.Val482_alX483delinsGluextX16,-|CCDS44106.1|CCDS44106.1|Transcript|DOWNSTREAM||||||||,-|CCDS366.1|CCDS366.1|Transcript|DOWNSTREAM||||||||
>
>
> For cSyntax NM_030786.2:c.1445_1447delTTT, pSyntax is given as
> "NP_110413.2.1:p.Val482_alX483delinsGluextX16". Should this be
> "NP_110413.2.1:p.Val482_X483delinsGluextX16" ? (Please observe that
> additional text "al" has cropped-up in pSyntax)
>
>
> 2) Differences between mutalyzer output vs ensembl output in pSyntax
>
> For cSyntax NM_001161708.1:c.1429_1430insCTA, ensembl's pSyntax output
> is NP_001155180.1.1:p.X477delinsSerLysextX56. Ensembl pSyntax indicates
> that there is a deletion of stop codon and insertion of two aminoacids
> (SK) at position 477.
>
> For same cSyntax, mutalyzer output is
> NM_001161708.1(SYNC_i001):p.(*477Serext*57)
>
> For cSyntax NM_001161708.1:c.1429_1430insCTA, should pSyntax not be:
> NP_001155180.1.1:p.X477SerextX57?
>
> 3) Possible error in calculating stop codon (between ensembl and mutalyzer)
>
> For deletion variant cSyntax NM_030786.2:c.1446_1448delTTA, pSyntax
> output is NP_110413.2.1:p.X483delextX16. This means that from position
> 483, new stop codon is 16 codons away (excluding 483 and stop codon)
>
> But new stop codon is 17 codons away (as per mutalyzers output,
> excluding 483 and stop codon).
>
> mutalyzer's output is: NM_030786.2(SYNC_i001):p.(*483Cysext*17)
>
> 3) Non-uniform pSyntax calculation/output between insertions and deletions
>
> For eg.
>
> a) For deletion variant with cSyntax NM_030786.2:c.1446_1448delTTA,
> pSyntax output is NP_110413.2.1:p.X483delextX16. Here new aminoacid
> replaces stop codon (C) is not calculated and not displayed.
>
> b) For insertion variant with cSyntax NM_001161708.1:c.1429_1430insCTA,
> pSyntax output is NP_001155180.1.1:p.X477delinsSerLysextX56, wherein new
> amino acid that replaces stop codon is calculated and shown as delins.
>
> ===============
> Environment:
> VEP version: 2.4
> perl version:5.14.2
> Bioperl version: 1.6.901-2
> Linux: 3.2.0-17-generic (ubuntu 12.04)
> =====================
> --
> Regards
> vs
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list