[ensembl-dev] canonical transcript

Fri Apr 27 15:23:08 BST 2012

Hi Sung,

Just adding more information to Andy's description on how we choose a 
canonical transcript for a gene.

We take the longest CCDS model in each gene, if none available then the 
longest coding Ensembl-Havana merged transcript is chosen. If no merged 
transcript is present, we take the longest coding transcript regardless 
of their source; this can be either an Ensembl or a Havana transcript. 
Finally, if there are no coding transcripts in the gene, the longest 
non-coding transcript is selected.

Hope this helps,
Amonida

-- 
Amonida Zadissa Ph.D.
Deputy team leader
EnsEMBL Genebuild team
Wellcome Trust Sanger Institute
England

On 27/04/2012 14:22, Andy Yates wrote:
> Hi Sung,
>
> 1).
>
> Canonical transcripts are defined by a number of rules which for most species boils down to the longest transcript wins. However some species, like human, have a more complicated assignment method:
>
> - if a gene has protein producing transcripts&  divide into groups
> 	- Group A: protein_coding biotype transcripts in CCDS
> 	- Group B: take protein_coding biotype transcripts in havana
> 	- Group C: take protein producing biotypes in havana
> 	- Group D: remaining protein producing biotypes
>
> Order these sets by length&  then ask in order for a transcript i.e. if group A had no transcripts but group B had 2 transcripts then we would use the longest from B. Equality to CCDS is based on an identical exon coding model.
>
> - if a gene has no protein producing transcripts
> 	- Group A: take transcripts in havana
> 	- Group B: all other transcripts
>
> Apply the same rules but using just groups A&  B.
>
> 3). The canonical_translation_id field in transcript refers to an entry in the translation table. This is to indicate the canonical when we have transcripts producing more than one protein product due to alternative initiation
>
> 4). Ensembl does not annotate canonical translations since we maintain a 1:1 relationship between transcripts and translations. Ensembl Genomes do have this data&  they can better explain their rules for assignment.
>
> Hope this helps,
>
> Andy
>
> Andrew Yates                   Ensembl Core Software Project Leader
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensembl.org/
>
> On 27 Apr 2012, at 13:31, Sung Gong wrote:
>
>> Hi,
>>
>> The 'gene' table contains a column 'canonical_transcript_id' which is
>> a foreign key to the 'transcript' table.
>>
>> My questions are:
>> 1. How do you define whether a transcript is canonical or not? Any
>> documentation on the Ensembl web site?
>> 2. Within the 'gene' table, all the 'canonical_annotation' column are null?
>> 3. I could not find 'translation_translation' table, whereas there is
>> a column 'canonical_translation_id' in the 'transcript' table.
>> 4. How do you define a canonical translation?
>>
>> Cheers,
>> Sung
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/