[ensembl-dev] Understanding Transcripts.

William Spooner william.spooner at eaglegenomics.com
Wed Jun 6 21:44:17 BST 2012

Hi Allan,

Please see comments inline…

On 6 Jun 2012, at 09:33, Allan Kamau wrote:

> Thank you Andy for your reply.
> I am interested in human genes and in my case a gene specified by
> entrezGene as I take entrezGene Ids then I query for transcripts from
> Ensembl using the ensembl API.
> If I understand correctly.
> 1 /
> A gene is either a protein structure or a RNA structure that that does
> something in the cell. A gene is defined by its function and not
> strictly it's structure/chemical composition.

In Ensembl terms, a gene is a genomic locus from which RNA is transcribed. Function is not implied. I.e. genes are defined by genomic location _not_ function. 

> This means that several
> independent protein structures or RNA sequence based structures
> performing very similar function may each be termed as the same gene.
> For example if gene "A" is observed to have some well defined
> phenotype characteristics, then all protein structures and or RNA
> sequence based structures having these characteristics can each be
> termed as instances of gene "A".
> 2 /
> For humans several transcripts may yield a independent instances of a
> particular gene individually.

No - several transcripts may yield independent instances of a particular protein, but if the transcripts are transcribed from different genomic locations (which is entirely possible), then they are from different genes.

> If the above understanding is correct, is it possible to have a
> protein structure and a RNA based structure each having the same
> phenotype characteristics and therefore yielding a instances of the
> same gene. I am asking this because a manual query using gene
> "ENSG00000133997" did match some transcripts based on exons (and
> produce a protein product) and some transcripts based on introns (not
> producing any protein product).

The gene in Ensembl is the locus. A single locus can result in multiple transcripts (RNAs) with different functions. 

I hope that helps,


> Allan.
> On 6/5/12, LAW Andy <andy.law at roslin.ed.ac.uk> wrote:
>> Allan,
>> This all depends on what you mean by "gene".
>> I think the common understanding is that a "gene" is a piece of the genome
>> that does "stuff". There are (protein) coding genes and there are non-coding
>> (RNA) genes. They are bits of the genome that get transcribed into RNA (into
>> transcripts).
>> There may be (and often are) more than one transcript from a given "gene".
>> These different transcripts arise from either variations in splicing
>> (transcript A may have exons 1, 2 and 3 whereas transcript B may have exons
>> 1, 3 and 4), from differential use of multiple transcription start sites
>> (often one start site is active in one set of cell types/tissues and a
>> different start site is active in a separate set of tissues) or from
>> different transcription end sites.
>> A transcript is thus something that is transcribed from the "gene". One gene
>> may have many transcripts. A given transcript only comes from one gene (this
>> is a one-to-many (gene-to-transcript) relationship).
>> In general, in Eukaryotes, a coding transcript will give rise to a single
>> protein product. I believe that there are examples where this is not
>> strictly true in that one transcript may have alternate translation start
>> sites, but I'm not confident enough of this to be able to give you an
>> example. In Prokaryotes, multiple protein products can be derived from a
>> single transcript.
>> Hope that helps some.
>> On 5 Jun 2012, at 10:10am, Allan Kamau wrote:
>>> I am still struggling to understand the transcript data in relation to
>>> a Gene, to simplify my preceding question, I would to ask.
>>> 1)Is a transcript strictly a pre-step to a variant of gene.
>>> 2)Is there a one to one relationship between a transcript and the
>>> gene. A single transcript gets processed (in case of protein genes)
>>> into a single gene and for RNA based genes the transcript may undergo
>>> minimal processing to yield a single RNA based gene.
>>> Allan.
>>> On 6/4/12, Allan Kamau <kamauallan at gmail.com> wrote:
>>>> I am a non-biologist and I would like to get better understanding of a
>>>> transcript in relation to a gene.
>>>> According to a search on Ensembl, the gene "ENSG00000133997" reports
>>>> that it has 11 transcripts and a listing of transcript_ids of these
>>>> transcripts is provided. Some of the entries in this transcript
>>>> listing have protein product ids and advises that "A protein coding
>>>> transcript is a spliced mRNA that leads to a protein product" and the
>>>> other entries are described as "No protein product" and have the text
>>>> "Retained intronNoncoding transcript containing intronic sequence"
>>>> along side them.
>>>> My initial understanding was as follows (which I now think is wrong).
>>>> One transcript is one mature mRNA which is composed of one or more
>>>> exons observed to join together and have a polyA tail. And that this
>>>> single transcript would be translated into a protein based gene. There
>>>> could be several such transcripts (for different situations) all
>>>> producing perhaps different protein products but each of these protein
>>>> products is independently a variant of the the same gene to which
>>>> these transcripts are known to yield.
>>>> Or one transcript could be composed of an intron (is it possible to
>>>> have multiple introns joined together) that would represent a single
>>>> non-protein based (or simply RNA based) gene. And that there could be
>>>> multiple such transcripts yielding the same gene (may have different
>>>> structure but performs the same function).
>>>> And that a gene may either be enzyme based or RNA based, which means
>>>> that for a given gene we may not have transcripts representing mRNA
>>>> and transcripts representing RNA gene products at the same time.
>>>> Kindly advise.
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> List admin (including subscribe/unsubscribe):
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>> Later,
>> Andy
>> --------
>> Yada, yada, yada...
>> The University of Edinburgh is a charitable body, registered in Scotland,
>> with registration number SC005336
>> Disclaimer: This e-mail and any attachments are confidential and intended
>> solely for the use of the recipient(s) to whom they are addressed. If you
>> have received it in error, please destroy all copies and inform the sender.
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

William Spooner           http://eaglegenomics.com
M:07779-663045 E:william.spooner at eaglegenomics.com 

More information about the Dev mailing list