[ensembl-dev] perl API slow script

Olson, Andrew olson at cshl.edu
Tue Sep 17 16:15:00 BST 2019


Hi Nicolas,
For bulk operations that are pretty easy, I like to just query the database directly.

echo "select t.* from transcript t, gene g where t.transcript_id = g.canonical_transcript_id and g.is_current = 1” | mysql … > canonicalTranscripts.txt

Andrew

> On Sep 17, 2019, at 10:49 AM, Nicolas Thierry-Mieg <Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr> wrote:
> 
> Hi list,
> 
> I want to obtain the list of Ensembl Human "canonical" transcripts.
> As far as I can see this is not available in the GTF or GFF files that can be downloaded from ftp.ensembl.org .
> 
> So, I wrote the following small script that uses the perl API to connect to ensembl. My script works, but it's very slow: it took more than 16 hours, just to obtain 66832 ENST identifiers... I'ld expect it to take seconds or minutes, not hours. I must be doing something very wrong but I can't see it.
> Please help, what is wrong with the code below?
> Or if the issue is permanently saturated ensembl servers, is there some other way I could obtain the ensembl canonical transcripts? I tried using the UCSC Table Browser, but there are discrepancies between their "knownCanonical" table and the ensembl canonical transcripts. I also tried biomart but couldn't find "canonical" anywhere.
> 
> 
> use Bio::EnsEMBL::Registry;
> my $reg = "Bio::EnsEMBL::Registry";
> $reg->load_registry_from_db(
>    -host => 'ensembldb.ensembl.org',
>    -user => 'anonymous',
>    -species => 'homo sapiens'
>    );
> my $transcripts_adaptor = $reg->get_adaptor('human', 'core', 'transcript');
> my $transcripts = $transcripts_adaptor->fetch_all;
> 
> while(my $transcript = shift @{$transcripts}) {
>    ($transcript->is_canonical) || next;
>    print $transcript->stable_id."\n" ;
> }
> 
> 
> Thanks!
> Regards,
> Nicolas
> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ensembl.org_mailman_listinfo_dev-5Fensembl.org&d=DwICAg&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=ic-pQ08gnhTpvpqfp3_6Uw&m=thfiDlexwfeY-yTcNwP7qpwgWqIZRFegrqeZWtXHPJQ&s=eCdySxAcksmzOro4PpPFfIMSbYapkfEH2bxIcfiOtWA&e= Ensembl Blog: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_&d=DwICAg&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=ic-pQ08gnhTpvpqfp3_6Uw&m=thfiDlexwfeY-yTcNwP7qpwgWqIZRFegrqeZWtXHPJQ&s=76ALP_8cUWCIkw5wr56dajimDo-tNzjXSQtp4DQ7gME&e= 



More information about the Dev mailing list