[ensembl-dev] Seemingly incorrect HGVS for RefSeq on GRCh37

Wallace Ko myko at l3-bioinfo.com
Tue Feb 19 11:08:57 GMT 2019


Hi Andrew,

Thank your very much. Your response is very clear and helpful to me already.

Regards,
Wallace Ko


On Tue, Feb 19, 2019 at 6:30 PM Andrew Parton <aparton at ebi.ac.uk> wrote:

> Hi Wallace,
>
> Thank you for your query, I’ve had a look and it seems like there is an
> issue mapping these transcripts to the reference genome.
>
> To provide RefSeq annotations within VEP, we use NCBI-provided GFF files
> (to provide the transcript set) and BAM files (to align these transcripts
> to the reference genome). The FAILED value in BAM_EDIT shows that VEP is
> failing to map the transcript to the reference genome using this BAM file,
> so as a best guess VEP uses the reference genome as a transcript sequence
> in this location. Meaning that, depending on how much the RefSeq transcript
> differs from the reference genome, it is possible that the HGVS will be
> incorrect.
>
> For the second result, there’s no mapping within the BAM file for this
> transcript, so we can’t map it to the reference genome.
>
> Filtering transcripts like this is a little tricky - you could remove all
> transcripts with BAM_EDIT=FAILED attached, however identifying transcripts
> where an alignment is simply missing from the NCBI-provided BAM won’t be
> covered in this case. We do have flags to identify these transcripts within
> our GRCh38 dataset, however we do not have a recommended practise to remove
> these within GRCh37 I’m afraid. Additionally, we know that there are fewer
> transcripts that land in both of these cases within our GRCh38 dataset,
> however that doesn’t help your current position.
>
> Sorry that I couldn’t give you a more helpful response. If you have any
> further questions, please don’t hesitate to ask.
>
> Kind Regards,
> Andrew
>
> On 18 Feb 2019, at 14:57, Wallace Ko <myko at l3-bioinfo.com> wrote:
>
> Hi,
>
> When the variant '11 70742675 G/A' is annotated with VEP, there are 2
> results with high and moderate impact:
>
>    1. NM_012309.4:c.958C>T (cDNA position: 1070)
>    2. NM_012309.3:c.959C>T (cDNA position: 1037)
>
> For the first result, the actual nucleotide at NM_012309.4:n.1070 is T.
> May I assume that the FAILED value in BAM EDIT field indicates that the
> HGVS is incorrect?
>
> For the second result, the actual nucleotide at NM_012309.3:n.1037 is A.
> And the result of blat on NC_000011.9 show that NM_012309.3:n.1037 aligns
> to nowhere. I wonder how VEP would produce result. And do you have any
> suggested practice to filter them?
>
> Thanks,
> Wallace Ko
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190219/daf56862/attachment.html>


More information about the Dev mailing list