[ensembl-dev] Probably incorrect HGVS on GRCh37 RefSeq

Wallace Ko myko at l3-bioinfo.com
Mon Aug 31 09:19:06 BST 2020


Hi Andrew,

Since VEP 100 (GRCh37) the REFSEQ_MATCH column is filled with content. Is
it reliable to use this column to determine if the HGVS code is
probably incorrect because of RefSeq alignment mismatch?

Or shall I simply use the BAM_EDIT column for the purpose?

And is my understanding of the BAM_EDIT value below correct (according to
this Github issue
<https://github.com/Ensembl/ensembl-vep/issues/265#issuecomment-415416679>)?

   - *-*: no mismatch is found. Annotations and HGVS code are both fine.
   - *OK*: mismatch is found and fix is applied. Annotations are fine. HGVS
   code is fixed too but could still be incorrect in some cases.
   - *FAILED*: mismatch is found and fix could not be applied. Both
   annotations and HGVS code could be incorrect.


Regards,
Wallace Ko


On Tue, Jan 21, 2020 at 7:50 PM Andrew Parton <aparton at ebi.ac.uk> wrote:

> Hi,
>
> Yep, that’s correct.
>
> One thing to be aware of however is that our HGVS code shifts variants
> reported in repeated regions in the 3’ direction by default, while our CDS
> position is not shifted in such a way. This is the most common cause of CDS
> position and HGVSc position mismatch, although it can also be caused by
> these RefSeq alignment mismatches.
>
> Kind Regards,
> Andrew
>
> On 21 Jan 2020, at 11:08, Wallace Ko <myko at l3-bioinfo.com> wrote:
>
> Hi Andrew,
>
> Thanks for the prompt response.
> May I assume that this is just the problem of HGVS calculation and CDS
> position is already corrected by RefSeq alignment in such case?
>
> Regards,
> Wallace Ko
>
>
> On Tue, Jan 21, 2020 at 6:30 PM Andrew Parton <aparton at ebi.ac.uk> wrote:
>
>> Hi Wallace,
>>
>> Thanks for this report, it is an issue we are aware of. As you
>> identified, not all RefSeq transcripts completely match the reference
>> genome. In cases where they don't, we are now using alignment files
>> provided by NCBI to create a new reference, matching the transcript, and
>> use this for consequence calling.
>>
>> Our HGVS calculation does not currently use this reference modification,
>> but it is something we are working on and aim to release later this year.
>> VEP can report reference miss-matches for GRCh38, but these data are not
>> available for GRCh37.
>>
>> More details on the differences to the reference genome and correcting
>> transcript models using BAM can be found here:
>> https://www.ensembl.org/info/docs/tools/vep/script/vep_other.html#refseq
>>
>> Let us know if there’s anything else we can do to help.
>>
>> Kind Regards,
>> Andrew
>>
>> On 21 Jan 2020, at 09:23, Wallace Ko <myko at l3-bioinfo.com> wrote:
>>
>> Hi Ensembl Developers,
>>
>> The variant NC_000012.11:g.103249104C>A is annotated by online VEP and
>> offline cached VEP (99, RefSeq, GRCh37) as:
>>
>>    - HGVSc: NM_000277.1:*c.517*G>T
>>    - HGVSp: NP_000268.1:p.Gln172His
>>    - CDS Position: 516
>>
>> On the other hand, ClinVar
>> <https://www.ncbi.nlm.nih.gov/clinvar/variation/664621/> reports the
>> variant as NM_000277.3:*c.516*G>T (NP_000268.1:p.Gln172His). Besides,
>> blast result shows that there is a 1-bp gap between c.303 and c.304
>> when NM_000277.1 is aligned to NC_000012.11. And even VEP itself reports
>> the CDS position as 516.
>>
>> All these make me believe that the HGVSc reported should be at c.516
>> instead of c.517.
>>
>> Regards,
>> Wallace Ko
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
>> Ensembl Blog: http://www.ensembl.info/
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20200831/62ebe679/attachment.html>


More information about the Dev mailing list