[ensembl-dev] Understanding Transcripts.

Allan Kamau kamauallan at gmail.com
Wed Jun 6 09:33:38 BST 2012

Thank you Andy for your reply.
I am interested in human genes and in my case a gene specified by
entrezGene as I take entrezGene Ids then I query for transcripts from
Ensembl using the ensembl API.

If I understand correctly.

1 /
A gene is either a protein structure or a RNA structure that that does
something in the cell. A gene is defined by its function and not
strictly it's structure/chemical composition. This means that several
independent protein structures or RNA sequence based structures
performing very similar function may each be termed as the same gene.
For example if gene "A" is observed to have some well defined
phenotype characteristics, then all protein structures and or RNA
sequence based structures having these characteristics can each be
termed as instances of gene "A".

2 /
For humans several transcripts may yield a independent instances of a
particular gene individually.

If the above understanding is correct, is it possible to have a
protein structure and a RNA based structure each having the same
phenotype characteristics and therefore yielding a instances of the
same gene. I am asking this because a manual query using gene
"ENSG00000133997" did match some transcripts based on exons (and
produce a protein product) and some transcripts based on introns (not
producing any protein product).


On 6/5/12, LAW Andy <andy.law at roslin.ed.ac.uk> wrote:
> Allan,
> This all depends on what you mean by "gene".
> I think the common understanding is that a "gene" is a piece of the genome
> that does "stuff". There are (protein) coding genes and there are non-coding
> (RNA) genes. They are bits of the genome that get transcribed into RNA (into
> transcripts).
> There may be (and often are) more than one transcript from a given "gene".
> These different transcripts arise from either variations in splicing
> (transcript A may have exons 1, 2 and 3 whereas transcript B may have exons
> 1, 3 and 4), from differential use of multiple transcription start sites
> (often one start site is active in one set of cell types/tissues and a
> different start site is active in a separate set of tissues) or from
> different transcription end sites.
> A transcript is thus something that is transcribed from the "gene". One gene
> may have many transcripts. A given transcript only comes from one gene (this
> is a one-to-many (gene-to-transcript) relationship).
> In general, in Eukaryotes, a coding transcript will give rise to a single
> protein product. I believe that there are examples where this is not
> strictly true in that one transcript may have alternate translation start
> sites, but I'm not confident enough of this to be able to give you an
> example. In Prokaryotes, multiple protein products can be derived from a
> single transcript.
> Hope that helps some.
> On 5 Jun 2012, at 10:10am, Allan Kamau wrote:
>> I am still struggling to understand the transcript data in relation to
>> a Gene, to simplify my preceding question, I would to ask.
>> 1)Is a transcript strictly a pre-step to a variant of gene.
>> 2)Is there a one to one relationship between a transcript and the
>> gene. A single transcript gets processed (in case of protein genes)
>> into a single gene and for RNA based genes the transcript may undergo
>> minimal processing to yield a single RNA based gene.
>> Allan.
>> On 6/4/12, Allan Kamau <kamauallan at gmail.com> wrote:
>>> I am a non-biologist and I would like to get better understanding of a
>>> transcript in relation to a gene.
>>> According to a search on Ensembl, the gene "ENSG00000133997" reports
>>> that it has 11 transcripts and a listing of transcript_ids of these
>>> transcripts is provided. Some of the entries in this transcript
>>> listing have protein product ids and advises that "A protein coding
>>> transcript is a spliced mRNA that leads to a protein product" and the
>>> other entries are described as "No protein product" and have the text
>>> "Retained intronNoncoding transcript containing intronic sequence"
>>> along side them.
>>> My initial understanding was as follows (which I now think is wrong).
>>> One transcript is one mature mRNA which is composed of one or more
>>> exons observed to join together and have a polyA tail. And that this
>>> single transcript would be translated into a protein based gene. There
>>> could be several such transcripts (for different situations) all
>>> producing perhaps different protein products but each of these protein
>>> products is independently a variant of the the same gene to which
>>> these transcripts are known to yield.
>>> Or one transcript could be composed of an intron (is it possible to
>>> have multiple introns joined together) that would represent a single
>>> non-protein based (or simply RNA based) gene. And that there could be
>>> multiple such transcripts yielding the same gene (may have different
>>> structure but performs the same function).
>>> And that a gene may either be enzyme based or RNA based, which means
>>> that for a given gene we may not have transcripts representing mRNA
>>> and transcripts representing RNA gene products at the same time.
>>> Kindly advise.
> Later,
> Andy
