[ensembl-dev] canonical transcript
Amonida Zadissa
amonida at sanger.ac.uk
Fri Apr 27 15:23:08 BST 2012
Hi Sung,
Just adding more information to Andy's description on how we choose a
canonical transcript for a gene.
We take the longest CCDS model in each gene, if none available then the
longest coding Ensembl-Havana merged transcript is chosen. If no merged
transcript is present, we take the longest coding transcript regardless
of their source; this can be either an Ensembl or a Havana transcript.
Finally, if there are no coding transcripts in the gene, the longest
non-coding transcript is selected.
Hope this helps,
Amonida
--
Amonida Zadissa Ph.D.
Deputy team leader
EnsEMBL Genebuild team
Wellcome Trust Sanger Institute
England
On 27/04/2012 14:22, Andy Yates wrote:
> Hi Sung,
>
> 1).
>
> Canonical transcripts are defined by a number of rules which for most species boils down to the longest transcript wins. However some species, like human, have a more complicated assignment method:
>
> - if a gene has protein producing transcripts& divide into groups
> - Group A: protein_coding biotype transcripts in CCDS
> - Group B: take protein_coding biotype transcripts in havana
> - Group C: take protein producing biotypes in havana
> - Group D: remaining protein producing biotypes
>
> Order these sets by length& then ask in order for a transcript i.e. if group A had no transcripts but group B had 2 transcripts then we would use the longest from B. Equality to CCDS is based on an identical exon coding model.
>
> - if a gene has no protein producing transcripts
> - Group A: take transcripts in havana
> - Group B: all other transcripts
>
> Apply the same rules but using just groups A& B.
>
> 3). The canonical_translation_id field in transcript refers to an entry in the translation table. This is to indicate the canonical when we have transcripts producing more than one protein product due to alternative initiation
>
> 4). Ensembl does not annotate canonical translations since we maintain a 1:1 relationship between transcripts and translations. Ensembl Genomes do have this data& they can better explain their rules for assignment.
>
> Hope this helps,
>
> Andy
>
> Andrew Yates Ensembl Core Software Project Leader
> EMBL-EBI Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK http://www.ensembl.org/
>
> On 27 Apr 2012, at 13:31, Sung Gong wrote:
>
>> Hi,
>>
>> The 'gene' table contains a column 'canonical_transcript_id' which is
>> a foreign key to the 'transcript' table.
>>
>> My questions are:
>> 1. How do you define whether a transcript is canonical or not? Any
>> documentation on the Ensembl web site?
>> 2. Within the 'gene' table, all the 'canonical_annotation' column are null?
>> 3. I could not find 'translation_translation' table, whereas there is
>> a column 'canonical_translation_id' in the 'transcript' table.
>> 4. How do you define a canonical translation?
>>
>> Cheers,
>> Sung
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org
>> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
More information about the Dev
mailing list