[ensembl-dev] protein position column for indels
david.tamborero at gmail.com
Fri Jul 26 11:48:05 BST 2019
Could not be clearer, thanks a lot!
(and +1 to have the possibility of having the 3'shifted info)
Have a nice weekend
El vie., 26 jul. 2019 12:43, Andrew Parton <aparton at ebi.ac.uk> escribió:
> Hi David,
> Thank you for your query, there are a couple of reasons for these
> 1) Insertions/deletions are always described in their most 3’ position in
> HGVS notation. So if, for example, you insert an A into a repeated region
> of As, the HGVS output will be reported at the most 3’ region, whereas the
> protein position column will report the position as it was given to VEP. We
> are currently looking at shifting all variants 3’ by default, and will
> include this in a future release.
> 2) The protein position column will cover all input locations (including
> the reference), while the HGVS output will use only a minimal allele
> string. For example, in the sequence
> ATG CTG
> Then input of an insertion of a T in position [3,4] in the standard VCF
> format of ‘chr 3 varName G GT’ would provide a range of 1-2 for the protein
> position (as it is also considering the reference G that was given), while
> the HGVS would recognise the insertion as only being in position 2.
> I think these two cases cover all of your examples. If you have any more
> questions, or any particular examples that you’d like us to take a closer
> look at, please let us know.
> Kind Regards,
> > On 25 Jul 2019, at 16:11, David Tamborero <david.tamborero at gmail.com>
> > Hi ensembl devs,
> > I m struggling to fully understand how the 'protein position' column is
> calculated when I check the variant hgvsp representation
> > this happens only for indels; some examples (left=hgvsp entry;
> right=protein position entry):
> > frameshift:
> > ENSP00000277541.6:p.Gln2444ThrfsTer34 2444
> > ENSP00000256474.2:p.Lys159ArgfsTer14 158-159
> > ENSP00000324856.6:p.Tyr253SerfsTer32 252-254
> > inframe deletions:
> > ENSP00000356379.4:p.Tyr1373del 1373-1374
> > ENSP00000361824.3:p.Glu2207del 2207
> > ENSP00000339004.3:p.His57del 53-54
> > ENSP00000268125.5:p.Phe96_Phe99del 96-99
> > ENSP00000413720.3:p.Ala171_Ala174del 171-175
> > ENSP00000368332.4:p.Ala114_Ala115del 110-112
> > inframe insertions:
> > ENSP00000369497.3:p.Glu238_Ser239insArg 239
> > ENSP00000339867.2:p.Asp687_Gly688insPhe 687-688
> > ENSP00000445920.1:p.Val188_Ala192dup 188-192
> > ENSP00000361824.3:p.Arg2308_Met2309dup 2308-2310
> > I m guessing that this may be related in part to right/left alignement
> discrepancies in the reported coordinates between these two columns (e.g.
> ENSP00000368332.4:p.Ala114_Ala115del --> 110-112 or
> ENSP00000339004.3:p.His57del ---> 53-54) ?
> > and that there is certain issue that sometimes makes you report in the
> protein column 'n' or 'n+1' positions -where n is the number of affected
> residues according to the HGVSp (e.g.
> ENSP00000277541.6:p.Gln2444ThrfsTer34-->2444 or
> ENSP00000445920.1:p.Val188_Ala192dup -- > 188-192 report 'n' whereas
> ENSP00000413720.3:p.Ala171_Ala174del -->171-175 or
> ENSP00000368332.4:p.Ala114_Ala115del-->110-112 report 'n+1')?
> > apologies if this is documented somewhere, i ve been not able to find
> the details of that entry
> > thanks in advance!
> > d
> > _______________________________________________
> > Dev mailing list Dev at ensembl.org
> > Posting guidelines and subscribe/unsubscribe info:
> > Ensembl Blog: http://www.ensembl.info/
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Dev