[ensembl-dev] protein position column for indels

David Tamborero david.tamborero at gmail.com
Thu Jul 25 16:11:20 BST 2019

Hi ensembl devs,

I m struggling to fully understand how the 'protein position' column is
calculated when I check the variant hgvsp representation

this happens only for indels; some examples (left=hgvsp entry;
right=protein position entry):

ENSP00000277541.6:p.Gln2444ThrfsTer34   2444
ENSP00000256474.2:p.Lys159ArgfsTer14   158-159
ENSP00000324856.6:p.Tyr253SerfsTer32   252-254

inframe deletions:
ENSP00000356379.4:p.Tyr1373del   1373-1374
ENSP00000361824.3:p.Glu2207del    2207
ENSP00000339004.3:p.His57del      53-54
ENSP00000268125.5:p.Phe96_Phe99del    96-99
ENSP00000413720.3:p.Ala171_Ala174del    171-175
ENSP00000368332.4:p.Ala114_Ala115del    110-112

inframe insertions:
ENSP00000369497.3:p.Glu238_Ser239insArg   239
ENSP00000339867.2:p.Asp687_Gly688insPhe   687-688
ENSP00000445920.1:p.Val188_Ala192dup   188-192
ENSP00000361824.3:p.Arg2308_Met2309dup   2308-2310

I m guessing that this may be related in part to right/left alignement
discrepancies in the reported coordinates between these two columns (e.g.
ENSP00000368332.4:p.Ala114_Ala115del --> 110-112 or
ENSP00000339004.3:p.His57del  ---> 53-54) ?

and that there is certain issue that sometimes makes you report in the
protein column 'n' or 'n+1' positions -where n is the number of affected
residues according to the HGVSp (e.g.
ENSP00000277541.6:p.Gln2444ThrfsTer34-->2444  or
ENSP00000445920.1:p.Val188_Ala192dup  -- > 188-192  report 'n'  whereas
ENSP00000413720.3:p.Ala171_Ala174del -->171-175 or
ENSP00000368332.4:p.Ala114_Ala115del-->110-112 report 'n+1')?

apologies if this is documented somewhere, i ve been not able to find the
details of that entry

thanks in advance!
