[ensembl-dev] Strange Ensembl/RefSeq VEP annotation for variant
Sarah Hunt
seh at ebi.ac.uk
Wed Dec 2 13:43:30 GMT 2015
Hi Svein,
In Ensembl, our annotation is based on the reference genome but RefSeq
transcripts can differ from the reference which causes problems like
this in a occasional cases.
In this instance, the reference has a single base deletion with respect
to the RefSeq transcript. The absense of the base in the reference has
caused the Ensembl transcript to have a 2 base intron within what is a
contiguous exon in ReSeq - this accounts for the 3 base difference
between the transcripts. The RefSeq/reference missmatch also causes the
problems you observe for the RefSeq analysis.
If you look at position 3124 it in the alignment between the two
transcripts, you can see the Ensembl transcript has a 3 base deletion
with respect to the RefSeq transcript:
http://grch37.ensembl.org/Homo_sapiens/Transcript/Similarity/Align?db=core;extdb=refseq_mrna;g=ENSG00000090006;r=19:41099072-41135725;sequence=NM_003573.2;t=ENST00000204005
If you look at Intron 24-25 on the Exon view you can see the extra
intron this introduces:
http://grch37.ensembl.org/Homo_sapiens/Transcript/Exons?db=core;extdb=refseq_mrna;g=ENSG00000090006;r=19:41099072-41135725;sequence=NM_003573.2;t=ENST00000204005
The GRCh37 site uses an old gene set, but for more recent sets we hold
information on when the RefSeq transcript differs from the Ensembl
transcript and report such discrepancies with command line VEP. We are
seeking to resolve this problem.
Best wishes,
Sarah
On 02/12/2015 11:55, Svein Tore Koksrud Seljebotn wrote:
> Hi,
>
> we encountered one variant that gives a bit confusing annotation
> output from VEP (GRCh37, release 82).
>
> The variant is: 19:41133005 G>A (rs200607327).
>
> If it's still available, an online VEP annotation can be found here:
> http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=1Vp8p7UidQSVCQfB-1297423
> .
>
>
> We use Refseq transcript output for NM_003573.2, and got the following:
>
> NM_003573.2:c.4200G>A |NP_003564.2:p.Met1400Ile | ATG/ATA
>
> For the corresponding Ensembl transcript ENST00000204005 [1], we get
> the following:
>
> ENST00000204005.9:c.4198G>A | ENSP00000204005.9:p.Gly1400Arg | GGG/AGG
>
> In dbSNP and other databases, the correct cDNA position for the RefSeq
> transcript for this variant is 4201, not 4200.
>
> So I have two questions:
>
> 1. Why is there a three base difference between the two transcripts
> (4201 vs 4198)?
>
> 2. Is there something going wrong in the calculation of the RefSeq
> data? Note the frameshift for the codons, resulting in wrong protein
> as well.
>
>
> Best regards,
> Svein Tore Koksrud Seljebotn
>
>
> [1]
> http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000090006;r=19:41099072-41135725;t=ENST00000204005;tl=1Vp8p7UidQSVCQfB-1297423
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20151202/5254a93a/attachment.html>
More information about the Dev
mailing list