[ensembl-dev] why would a snp have multiple consequences in the same transcript
Paul Flicek
flicek at ebi.ac.uk
Fri Nov 19 18:08:12 GMT 2010
On 19 Nov 2010, at 17:49, Robert Bradbury wrote:
> Andreas,
>
> Looking at your last (extensive) response, it generates some more questions.
>
> 1) I assume NMD means "Named", e.g. a genomic area named in the Gene Names
> database (which I think includes RNA which is non-coding). Is this correct?
No. NMD means that the transcript is subject to non-sense mediated decay.
>
> 2) How many actual completely sequenced genomes are being used to provide
> these statistics? Or is the data simply derived from the current human SNP
> database (which could easily be lacking with respect to complex indels).
This data incorporates information from dbSNP 131 and other minor sources. dbSNP 131 includes extensive data from the 1000 Genomes project, but not yet the full data set.
Indels of all sizes are difficult to discover and are expected to be underrepresented in all current variation catalogs.
>
> 3) Is there documentation on how to download the human database from which
> the mysql results are derived and set it up locally? Also documentation on
> the table format(s)?
Extensive Ensembl documentation to set up the database or use the Ensembl API to access the data is available from the Documentation link at the top of www.ensembl.org.
You may also be interested in the follow papers we published about Ensembl variation earlier this year:
http://www.biomedcentral.com/1471-2164/11/293
http://www.biomedcentral.com/1471-2105/11/238
>
> As the accumulation of indels in single cells (due to corruptive repair of
> DNA double strand breaks by WRN & DCLRE1C) is likely to be a primary cause
> of aging (IMO) [1]. Is there any attempt to determine at what point the
> noise level in sequences is suggestive of indels indicates "aged" cells (or
> a possibly pre-aged genome if the indels happened during early
> embryongenesis) are present? [2] The problem is that if indels are taking
> place in cells individually they may be impossible to detect in gross
> multi-cell genome sequencing or SNP studies. The only way they might be
> detected is through unusual noise levels indicating that some fraction of
> the cells have damaged genomes. My assumption here may be that indel
> mutations are somewhat more obvious than SNP mutations but this may depend
> on the methods used.
This is much more of a research question. The 1000 Genomes data (which in it's most recent release at www.1000genomes.org) has nearly 25 million SNPs derived from 628 individuals sequenced DNA from EBV transformed lymphoblastoid cells. The age of the individual at the time of transformation is not know. Determining if any sequence changes in the cell are due to those present in the individual compared to those occurring in the cell line is also extremely difficult.
I hope this helps,
Paul
More information about the Dev
mailing list