[ensembl-dev] Incorrect HGVSp start coordinate for certain delins variants
Reece Hart
reece at harts.net
Tue May 14 05:48:06 BST 2013
Hi-
VEP 2.8 and VEP 71 appear to have a bug in which the start coordinate for
protein HGVSp effects are repeated for certain delins variants.
For example, with this VCF line as input:
3 10191482 CVID1003553 A ATTT 60 PASS
VEP 2.3 correctly returns
HGVSp=ENSP00000256474.2:p.Lys159delinsIleX
HGVSp=ENSP00000344757.2:p.Lys118delinsIleX
for two transcripts, whereas VEP 2.8 and VEP 71 each return
HGVSp=ENSP00000256474.2:p.Lys159159delinsIleX
HGVSp=ENSP00000344757.2:p.Lys118118delinsIleX
(two coding transcripts)
Notice that the start coordinates, 159 and 118, are repeated VEP 2.8 and 71.
More examples:
5 112175315 CVID1010109 T TAAA 60 PASS
9 98238379 CVID6007616 AT ACTGCTGC 60 PASS
10 43610045 CVID4000412 AG ATTCT 60 PASS
10 89711990 CVID4000640 TTCC TATAAAT 60 PASS
17 7578202 CVID6007473 AC AACCA 60 PASS
17 78063675 CVID6004403 A ATGT 60 PASS
The error appears to be in TranscriptVariationAllele.pm:794:
$hgvs_notation->{'hgvs'} .= $ref_pep_first . $hgvs_notation->{start} .
$hgvs_notation->{end} . $hgvs_notation->{type} . $hgvs_notation->{alt} ;
Having both $ref_pep_first and $hgvs_notation->{start} has the effect of
repeating the starting coordinate. Removing $hgvs_notation->{start} from
the above line solves this problem for these cases, but I'm unsure that I
fully understand the logic that is implemented in _get_hgvs_protein_format
or the impact of this change on other cases.
-Reece
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20130513/532a4d88/attachment.html>
More information about the Dev
mailing list