[ensembl-dev] VEP: reporting HGVS identifiers with RefSeq accessions

Reece Hart reece at harts.net
Wed Feb 15 06:25:23 GMT 2012


Hi Will-

Thanks for the fast reply.

On Tue, Feb 14, 2012 at 2:05 AM, Will McLaren <wm2 at ebi.ac.uk> wrote:
> For those rows of results that have both RefSeq and CCDS identifiers,
> you know that the coding sequence is the same

Just to confirm: You're saying that if a CCDS and RefSeq tag are
provided, then the RefSeq has the same exon structure (i.e., genomic
coordinates) as ENSTs? I use --ccds and --xref_refseq already, but I
wasn't confident that 1) the provided NM necessarily corresponded to
the CCDS transcript, 2) that even if it did, that the CDS equivalence
implied exon-structure equivalence between the ENST and NM.

> For those rows that don't have CCDS identifiers, you would have to
> compare only the protein sequence

Aren't there cases when exon structures differ for identical CDS? I
remember from last year finding a few cases in which it appeared that
NCBI's splign and Ensembl's gene build process resulted in slightly
different exon structures for the same CDS. The shift is typically to
cause a few nucleotides from the 3' end of an exon to "move" to the 5'
end of the other alignment, or vice versa. The implication is that a
genomic variant might be coding in one transcript and non-coding in
the other (again, even though they have identical CDS). Or am I
missing something?

-Reece




More information about the Dev mailing list