[ensembl-dev] perl API slow script
Nicolas Thierry-Mieg
Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr
Tue Sep 17 15:49:30 BST 2019
Hi list,
I want to obtain the list of Ensembl Human "canonical" transcripts.
As far as I can see this is not available in the GTF or GFF files that
can be downloaded from ftp.ensembl.org .
So, I wrote the following small script that uses the perl API to connect
to ensembl. My script works, but it's very slow: it took more than 16
hours, just to obtain 66832 ENST identifiers... I'ld expect it to take
seconds or minutes, not hours. I must be doing something very wrong but
I can't see it.
Please help, what is wrong with the code below?
Or if the issue is permanently saturated ensembl servers, is there some
other way I could obtain the ensembl canonical transcripts? I tried
using the UCSC Table Browser, but there are discrepancies between their
"knownCanonical" table and the ensembl canonical transcripts. I also
tried biomart but couldn't find "canonical" anywhere.
use Bio::EnsEMBL::Registry;
my $reg = "Bio::EnsEMBL::Registry";
$reg->load_registry_from_db(
-host => 'ensembldb.ensembl.org',
-user => 'anonymous',
-species => 'homo sapiens'
);
my $transcripts_adaptor = $reg->get_adaptor('human', 'core', 'transcript');
my $transcripts = $transcripts_adaptor->fetch_all;
while(my $transcript = shift @{$transcripts}) {
($transcript->is_canonical) || next;
print $transcript->stable_id."\n" ;
}
Thanks!
Regards,
Nicolas
More information about the Dev
mailing list