[ensembl-dev] Incorrect HGVSp start coordinate for certain delins variants

Reece Hart reece at harts.net
Tue May 14 05:48:06 BST 2013


Hi-

VEP 2.8 and VEP 71 appear to have a bug in which the start coordinate for
protein HGVSp effects are repeated for certain delins variants.

For example, with this VCF line as input:
  3 10191482 CVID1003553 A ATTT 60 PASS
VEP 2.3 correctly returns
  HGVSp=ENSP00000256474.2:p.Lys159delinsIleX
  HGVSp=ENSP00000344757.2:p.Lys118delinsIleX
for two transcripts, whereas VEP 2.8 and VEP 71 each return
  HGVSp=ENSP00000256474.2:p.Lys159159delinsIleX
  HGVSp=ENSP00000344757.2:p.Lys118118delinsIleX
(two coding transcripts)

Notice that the start coordinates, 159 and 118, are repeated VEP 2.8 and 71.


More examples:
5 112175315 CVID1010109 T TAAA 60 PASS
9 98238379 CVID6007616 AT ACTGCTGC 60 PASS
10 43610045 CVID4000412 AG ATTCT 60 PASS
10 89711990 CVID4000640 TTCC TATAAAT 60 PASS
17 7578202 CVID6007473 AC AACCA 60 PASS
17 78063675 CVID6004403 A ATGT 60 PASS


The error appears to be in TranscriptVariationAllele.pm:794:

$hgvs_notation->{'hgvs'} .= $ref_pep_first . $hgvs_notation->{start} .
$hgvs_notation->{end} . $hgvs_notation->{type} . $hgvs_notation->{alt} ;

Having both $ref_pep_first and $hgvs_notation->{start} has the effect of
repeating the starting coordinate. Removing $hgvs_notation->{start} from
the above line solves this problem for these cases, but I'm unsure that I
fully understand the logic that is implemented in _get_hgvs_protein_format
or the impact of this change on other cases.


-Reece
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130513/532a4d88/attachment.html>


More information about the Dev mailing list