[ensembl-dev] perl API slow script

Nicolas Thierry-Mieg Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr
Tue Sep 17 15:49:30 BST 2019


Hi list,

I want to obtain the list of Ensembl Human "canonical" transcripts.
As far as I can see this is not available in the GTF or GFF files that 
can be downloaded from ftp.ensembl.org .

So, I wrote the following small script that uses the perl API to connect 
to ensembl. My script works, but it's very slow: it took more than 16 
hours, just to obtain 66832 ENST identifiers... I'ld expect it to take 
seconds or minutes, not hours. I must be doing something very wrong but 
I can't see it.
Please help, what is wrong with the code below?
Or if the issue is permanently saturated ensembl servers, is there some 
other way I could obtain the ensembl canonical transcripts? I tried 
using the UCSC Table Browser, but there are discrepancies between their 
"knownCanonical" table and the ensembl canonical transcripts. I also 
tried biomart but couldn't find "canonical" anywhere.


use Bio::EnsEMBL::Registry;
my $reg = "Bio::EnsEMBL::Registry";
$reg->load_registry_from_db(
     -host => 'ensembldb.ensembl.org',
     -user => 'anonymous',
     -species => 'homo sapiens'
     );
my $transcripts_adaptor = $reg->get_adaptor('human', 'core', 'transcript');
my $transcripts = $transcripts_adaptor->fetch_all;

while(my $transcript = shift @{$transcripts}) {
     ($transcript->is_canonical) || next;
     print $transcript->stable_id."\n" ;
}


Thanks!
Regards,
Nicolas





More information about the Dev mailing list