[ensembl-dev] Probably incorrect HGVS on GRCh37 RefSeq

Andrew Parton aparton at ebi.ac.uk
Tue Jan 21 11:50:02 GMT 2020


Hi,

Yep, that’s correct.

One thing to be aware of however is that our HGVS code shifts variants reported in repeated regions in the 3’ direction by default, while our CDS position is not shifted in such a way. This is the most common cause of CDS position and HGVSc position mismatch, although it can also be caused by these RefSeq alignment mismatches.

Kind Regards,
Andrew

> On 21 Jan 2020, at 11:08, Wallace Ko <myko at l3-bioinfo.com> wrote:
> 
> Hi Andrew,
> 
> Thanks for the prompt response.
> May I assume that this is just the problem of HGVS calculation and CDS position is already corrected by RefSeq alignment in such case?
> 
> Regards,
> Wallace Ko
> 
> 
> On Tue, Jan 21, 2020 at 6:30 PM Andrew Parton <aparton at ebi.ac.uk <mailto:aparton at ebi.ac.uk>> wrote:
> Hi Wallace,
> 
> Thanks for this report, it is an issue we are aware of. As you identified, not all RefSeq transcripts completely match the reference genome. In cases where they don't, we are now using alignment files provided by NCBI to create a new reference, matching the transcript, and use this for consequence calling.
> 
> Our HGVS calculation does not currently use this reference modification, but it is something we are working on and aim to release later this year. VEP can report reference miss-matches for GRCh38, but these data are not available for GRCh37.
> 
> More details on the differences to the reference genome and correcting transcript models using BAM can be found here:  https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#refseq <https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#refseq>
> 
> Let us know if there’s anything else we can do to help.
> 
> Kind Regards,
> Andrew
> 
>> On 21 Jan 2020, at 09:23, Wallace Ko <myko at l3-bioinfo.com <mailto:myko at l3-bioinfo.com>> wrote:
>> 
>> Hi Ensembl Developers,
>> 
>> The variant NC_000012.11:g.103249104C>A is annotated by online VEP and offline cached VEP (99, RefSeq, GRCh37) as:
>> HGVSc: NM_000277.1:c.517G>T
>> HGVSp: NP_000268.1:p.Gln172His
>> CDS Position: 516
>> On the other hand, ClinVar <https://www.ncbi.nlm.nih.gov/clinvar/variation/664621/> reports the variant as NM_000277.3:c.516G>T (NP_000268.1:p.Gln172His). Besides, blast result shows that there is a 1-bp gap between c.303 and c.304 when NM_000277.1 is aligned to NC_000012.11. And even VEP itself reports the CDS position as 516.
>> 
>> All these make me believe that the HGVSc reported should be at c.516 instead of c.517.
>> 
>> Regards,
>> Wallace Ko
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org>
>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200121/7c2e4051/attachment.html>


More information about the Dev mailing list