[ensembl-dev] how to calculate transcript length ?

Amonida Zadissa amonida at sanger.ac.uk
Tue Aug 28 09:53:13 BST 2012


Hi Enrico,

Please note that not all exons may be coding and some exons may be 
partially coding, like the example you have described.

The transcript ENST00000419234 is indeed 3179 bases long but it contains 
the untranslated regions (UTRs) at both 5' and 3' ends. The first exon 
(ENSE00002313575) has 276 bases but only 82 bases are coding. The last 
exon (ENSE00001355929) is 1570 bases long but only the first 364 bases 
are coding. Excluding the UTRs (194 at 5' end and 1206 at 3' end), 
leaves 1779 coding bases, including the final stop codon. This gives a 
translation of 592 functional amino acids.

Hope this clarifies the scenario.

Best regards,
Amonida

On 27/08/2012 23:56, enrico1970 at yahoo.com wrote:
> Dear Thibaut and Jay,
> I really appreciate your suggestion to the list  but I have a similar query to the one of Gang
> At the page http://asia.ensembl.org/Homo_sapiens/Transcript/Exons?db=core;g=ENSG00000089234;r=12:112080797-112123790;t=ENST00000419234
>
> The transcript ENST00000419234 has a length of 3179 nucleotides, that corresponds to the sum of its exons, the protein is expected to have 3179/3=1059 amino acids
>
> but it has only 592 amino acids, the same phenomenon happen for other trancripts.
>
> What is the data definition of the length of the transcript and of the protein?
> Kind regards,
>
> Enrico Rubagotti
>
>
>
>
>
> Hi,
> I would recommend you to use the Perl API for all the information you
> want to retrieve from Ensembl.
> It's quite easy to use and if there is a schema change, you will not
> have to change all your SQL queries.
> Here is some documentation: http://www.ensembl.org/info/docs/api/core/core_tutorial.html http://www.ensembl.org/info/docs/Doxygen/core-api/index.html http://www.ensembl.org/info/docs/api/index.html Here is some code for your query: use Bio::EnsEMBL::Registry; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', # alternatively 'useastdb.ensembl.org' -user => 'anonymous'
> );
> my $gene_adaptor  = $registry->get_adaptor( 'Human', 'Core', 'Gene' );
> my $gene = $gene_adaptor->fetch_by_stable_id('ENSG00000089234');
> foreach my $transcript (@{$gene->get_all_Transcripts()) { print STDOUT 'Length of ', $transcript->display_id, ': ', $transcript->length, "\n";
> } Regards
> Thibaut On 31/07/12 12:49, Jay Humphrey wrote:
>> Length is end - start + 1. >1 2 3 [4 5 6 7 8] 9 >start = 4, end = 8 >8 - 4 = 4, actually there are 5 residues. >>On 31/07/2012 10:39, ?? wrote: >>Hi All >>I wondering how to calculate transcript length within Ensembl database. >>I try to sum exon's length: >>>>SELECT tp.stable_id, SUM( e.seq_region_end ) - SUM( e.seq_region_start ) >>FROM gene g >>JOIN transcript tp ON ( g.gene_id = tp.gene_id ) >>JOIN exon_transcript et ON ( et.transcript_id = tp.transcript_id ) >>JOIN exon e ON ( e.exon_id = et.exon_id ) >>WHERE g.stable_id = 'ENSG00000089234' >>GROUP BY tp.stable_id >>>>But the result is inconsistent with Ensembl official data: >>http://asia.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000089234;r=12:112080797-112123790 >>>>If you know how to dig out the datas of >>variation,orthologue,paralogue,regulation. please also tell me. >>>>>>Thanks million >>-- >>Gang Chen >>TILSI >>Taicang Institute For Life Science Information >>Address: A2/162, Renmin
>   South Road, Taicang, 215400, Jiangsu >>Province, P.R.China >>Phone: (+86)512-82782588 >>>>>>>>_______________________________________________ >>Dev mailing listDev at ensembl.org >>List admin (including subscribe/unsubscribe):http://lists.ensembl.org/mailman/listinfo/dev >>Ensembl Blog:http://www.ensembl.info/ >>-- >Jay Humphrey                   Ensembl Genomes Web Developer >EMBL-EBI                       Tel: +44-(0)1223-492682 >Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468 >Cambridge CB10 1SD, UKhttp://www.ensemblgenomes.org/ >>>_______________________________________________ >Dev mailing listDev at ensembl.org >List admin (including subscribe/unsubscribe):http://lists.ensembl.org/mailman/listinfo/dev >Ensembl Blog:http://www.ensembl.info/  -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ensembl.org/pipermail/dev/attachments/20120801/d347b3bb/attachment.htm>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> List admin (including subscribe/unsubscribe): http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

-- 
Amonida Zadissa Ph.D.
Deputy team leader
EnsEMBL Genebuild team
Wellcome Trust Sanger Institute
England




More information about the Dev mailing list