[ensembl-dev] Strange Ensembl/RefSeq VEP annotation for variant

Sarah Hunt seh at ebi.ac.uk
Wed Dec 2 13:43:30 GMT 2015


Hi Svein,

In Ensembl, our annotation is based on the reference genome but RefSeq 
transcripts can differ from the reference which causes problems like 
this in a occasional cases.
In this instance, the reference has a single base deletion with respect 
to the RefSeq transcript. The absense of the base in the reference has 
caused the Ensembl transcript to have a 2 base intron within what is a 
contiguous exon in ReSeq - this accounts for the 3 base difference 
between the transcripts. The RefSeq/reference missmatch also causes the 
problems you observe for the RefSeq analysis.

If you look at position 3124 it in the alignment between the two 
transcripts, you can see the Ensembl transcript has a 3 base deletion 
with respect to the RefSeq transcript:

http://grch37.ensembl.org/Homo_sapiens/Transcript/Similarity/Align?db=core;extdb=refseq_mrna;g=ENSG00000090006;r=19:41099072-41135725;sequence=NM_003573.2;t=ENST00000204005

If you look at Intron 24-25 on the Exon view you can see the extra 
intron this introduces:

http://grch37.ensembl.org/Homo_sapiens/Transcript/Exons?db=core;extdb=refseq_mrna;g=ENSG00000090006;r=19:41099072-41135725;sequence=NM_003573.2;t=ENST00000204005

The GRCh37 site uses an old gene set, but for more recent sets we hold 
information on when the RefSeq transcript differs from the Ensembl 
transcript and report such discrepancies with command line VEP. We are 
seeking to resolve this problem.

Best wishes,

Sarah


On 02/12/2015 11:55, Svein Tore Koksrud Seljebotn wrote:
> Hi,
>
> we encountered one variant that gives a bit confusing annotation 
> output from VEP (GRCh37, release 82).
>
> The variant is: 19:41133005 G>A (rs200607327).
>
> If it's still available, an online VEP annotation can be found here: 
> http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=1Vp8p7UidQSVCQfB-1297423 
> .
>
>
> We use Refseq transcript output for NM_003573.2, and got the following:
>
> NM_003573.2:c.4200G>A |NP_003564.2:p.Met1400Ile | ATG/ATA
>
> For the corresponding Ensembl transcript ENST00000204005 [1], we get 
> the following:
>
> ENST00000204005.9:c.4198G>A | ENSP00000204005.9:p.Gly1400Arg | GGG/AGG
>
> In dbSNP and other databases, the correct cDNA position for the RefSeq 
> transcript for this variant is 4201, not 4200.
>
> So I have two questions:
>
> 1. Why is there a three base difference between the two transcripts 
> (4201 vs 4198)?
>
> 2. Is there something going wrong in the calculation of the RefSeq 
> data? Note the frameshift for the codons, resulting in wrong protein 
> as well.
>
>
> Best regards,
> Svein Tore Koksrud Seljebotn
>
>
> [1] 
> http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000090006;r=19:41099072-41135725;t=ENST00000204005;tl=1Vp8p7UidQSVCQfB-1297423
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151202/5254a93a/attachment.html>


More information about the Dev mailing list