[ensembl-dev] Seemingly incorrect HGVS for RefSeq on GRCh37

Tue Feb 19 10:29:59 GMT 2019

Hi Wallace,

Thank you for your query, I’ve had a look and it seems like there is an issue mapping these transcripts to the reference genome.

To provide RefSeq annotations within VEP, we use NCBI-provided GFF files (to provide the transcript set) and BAM files (to align these transcripts to the reference genome). The FAILED value in BAM_EDIT shows that VEP is failing to map the transcript to the reference genome using this BAM file, so as a best guess VEP uses the reference genome as a transcript sequence in this location. Meaning that, depending on how much the RefSeq transcript differs from the reference genome, it is possible that the HGVS will be incorrect.

For the second result, there’s no mapping within the BAM file for this transcript, so we can’t map it to the reference genome.

Filtering transcripts like this is a little tricky - you could remove all transcripts with BAM_EDIT=FAILED attached, however identifying transcripts where an alignment is simply missing from the NCBI-provided BAM won’t be covered in this case. We do have flags to identify these transcripts within our GRCh38 dataset, however we do not have a recommended practise to remove these within GRCh37 I’m afraid. Additionally, we know that there are fewer transcripts that land in both of these cases within our GRCh38 dataset, however that doesn’t help your current position. 

Sorry that I couldn’t give you a more helpful response. If you have any further questions, please don’t hesitate to ask.

Kind Regards,
Andrew

> On 18 Feb 2019, at 14:57, Wallace Ko <myko at l3-bioinfo.com> wrote:
> 
> Hi,
> 
> When the variant '11 70742675 G/A' is annotated with VEP, there are 2 results with high and moderate impact:
> NM_012309.4:c.958C>T (cDNA position: 1070)
> NM_012309.3:c.959C>T (cDNA position: 1037)
> For the first result, the actual nucleotide at NM_012309.4:n.1070 is T. May I assume that the FAILED value in BAM EDIT field indicates that the HGVS is incorrect?
> 
> For the second result, the actual nucleotide at NM_012309.3:n.1037 is A. And the result of blat on NC_000011.9 show that NM_012309.3:n.1037 aligns to nowhere. I wonder how VEP would produce result. And do you have any suggested practice to filter them?
> 
> Thanks,
> Wallace Ko
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190219/1d5ba604/attachment.html>