[ensembl-dev] same CDS, different exons; was Re: VEP: reporting HGVS identifiers with RefSeq accessions

Matthew Astley mca at sanger.ac.uk
Wed Feb 15 12:28:09 GMT 2012


On Tue, Feb 14, 2012 at 10:39:09PM -0800, Reece Hart wrote:

> In the [previous] I excerpted 3 prominent cases.
[...]
> 3) NM_001006120.2 overlaps ENST00000382673 and has an identical
> translation *but has a different exon structure*. This is the case I
> alluded to in my previous email that might cause a coding variant to
> appear as non-coding or vice versa.

I was curious about 3), so asked the Havana annotator next to me.  I'm
not sure how relevant the ensuing conversation is to VEP, sorry.


Here's a link to the region in Ensembl, enabling some tracks and
showing two of the four duplicated genes in the region
  http://v.gd/add5rs

The other two are around 23.7 Mb

ENST00000382673 is a transcript which spans the two copies, taking its
last exon from the second copy.  Mark thinks this is probably an
incorrect gene prediction.

He also pointed out the sequence gap at 23.90Mb - it suggests the
region is tricky to sequence.

The supporting sequences
  http://www.ncbi.nlm.nih.gov/nuccore/NM_001006121,NM_001006120,NM_005058,NM_001006118
are very similar, except at the ends.


I think the conclusion was that it can happen, that CDS is the same
for a different exon structure, but in this case it's a bug.

I hope this is helpful.  Please let us know if you have questions,

-- 
Matthew




More information about the Dev mailing list