[ensembl-dev] VEP creates bad hgvsc

Sarah Hunt seh at ebi.ac.uk
Thu Nov 1 13:43:19 GMT 2018


Hi Michael,

Apologies for the delay in getting back to you.

This does look like problem caused by the miss-match between the 
reference genome sequence and the transcript. We currently use NCBI's 
alignments to 'correct' the underlying reference to match the transcript 
sequence and use this in consequence calling and codon derivation, which 
is why the codons here are reported as '*C*GG/*C*GG'. We are currently 
updating our HGVS writing code to also use this 'corrected' sequence.

We take transcript mapping information from NCBI, which is why we report 
the position as 1027, in agreement with dbSNP. Where are you seeing the 
mapping to 1099?

Best wishes,

Sarah


On 24/10/2018 20:32, Michael Yourshaw wrote:
>
> This question relates to VEP 93 Human.
>
>
> I've seen a few cases where the hgvsc computed by VEP seems to be wrong.
>
> for example the genomic variant chr1:942451T>C (rs6672356) for the 
> canonical transcripts produces these hgvsc values in VEP:
>
>   - 'ENST00000342066.7:c.1027T>C'
>
>   - 'NM_152486.2:c.1027T>C'
>
>
> Looking at the cDNA sequence, the reference for NM_152486.2:c.1027 is 
> not T but C.
>
>
> As best I can tell, chr1:942451 does not map to cDNA NM_152486.2 
> at 1027, but rather to 1099, which contains the expected T as a reference.
>
>
> The Alamut annotator for this variant fails with the reason 
> "Transcript NM_152486.2: Genome/Transcript discrepancy: Alternate 
> genomic nucleotide (C) same as transcript nucleotide (Assembly: GRCh38)"
>
>
> dbSNP for rs6672356 contains
>
>   NM_152486.2:c.1027C=
>   NM_152486.2:c.1027C>T
>
>
> This particular issue has a discussion in BioStars 
> (https://www.biostars.org/p/239892/). But I do not think the 
> explanation suggested there applies: that it is just a difference 
> between RefSeq and the reference genome.
>
>
> These variants also seem to have a similar problem:
>
> rs10902758 NC_000004.12:g.654854G>A NM_000283.3:c.958G>A
> *** NM_000283.3:c.958G>A: Variant reference (G) does not agree with 
> reference sequence (A)
> rs10902758 NC_000004.12:g.654854G>A NM_001145291.1:c.958G>A
> *** NM_001145291.1:c.958G>A: Variant reference (G) does not agree with 
> reference sequence (A)
> rs10902758 NC_000004.12:g.654854G>A NM_001145292.1:c.121G>A
> *** NM_001145292.1:c.121G>A: Variant reference (G) does not agree with 
> reference sequence (A)
> rs10902758 NC_000004.12:g.654854G>A XM_011513473.2:c.1177G>A
>
>> Michael Yourshaw
> myourshaw at gmail.com <mailto:myourshaw at ucla.edu>
>
> This message is intended only for the use of the addressee and may 
> contain information that is PRIVILEGED and CONFIDENTIAL, and/or may 
> contain ATTORNEY WORK PRODUCT. If you are not the intended recipient, 
> you are hereby notified that any dissemination of this communication 
> is strictly prohibited. If you have received this communication in 
> error, please erase all copies of the message and its attachments and 
> notify us immediately. Thank you.
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20181101/10ae282c/attachment.html>


More information about the Dev mailing list