[ensembl-dev] VEP: reporting HGVS identifiers with RefSeq accessions
reece at harts.net
Wed Feb 15 06:39:09 GMT 2012
Fast Forward 10 hours...
I wrote a reprehensible hack to loop over otherfeatures NMs, then find
overlapping ENSTs on the same slice. The code is at
http://goo.gl/drlJX . Results look like this (chr Y):
# 141 transcripts
* NM_006883.2 6473
Y 1 1951 541633 569564 6 5
NnLCEeS ENST00000334060 ENSG00000185960 CCDS14106.1,CCDS14107.1
NM_006883.2,NM_000451.3 Y 1 1951 541633 569564 6 5
* NM_018390.3 55344
Y 1 5305 150855 166002 7 6
n S ENST00000399012 ENSG00000182378 CCDS14103.1
NM_018390.3 Y 1 5287 150855 166002 8 6
* NM_001006120.2 378949
Y -1 1881 24026501 24038660 12 1
NnLC S ENST00000382673 ENSG00000242389 CCDS35481.1
NM_001006118.2 Y -1 1881 24026501 24062201 12 1
* lines indicate the NMs from otherfeatures. Beneath that are 0 or
more overlapping ENSTs. The first part of the line is a 7-character
summary: N=exon number matches, n=cds-trimmed exon numbers match,
L=cds length matches, C=cds sequence matches, E=exon boundaries match,
e=cds-trimmed exon boundaries match, S=strand matches. Columns are
display id, gene_id, ccds, nm, chr, cds start, cds end, transcript
start, end, exon count, cds-trimmed exon count (e.g., cds in second
exon). Not shown are the exon arrays, which you'll get if you run the
In the above I excerpted 3 prominent cases.
1) NM_006883.2 matches ENST00000334060 in all respects: exon number,
length, cds, exon structure, etc.
2) NM_018390.3 overlaps ENST00000399012, but is not the same
translation *even though that ENST shows CCDS and NM_018390.3 as
3) NM_001006120.2 overlaps ENST00000382673 and has an identical
translation *but has a different exon structure*. This is the case I
alluded to in my previous email that might cause a coding variant to
appear as non-coding or vice versa.
Caveat: The code probably contains bugs or abuses of the API.
So, does your comment about using --ccds and --xref_refseq still hold?
More information about the Dev