[ensembl-dev] perl API slow script

Thibaut Hourlier thibaut at ebi.ac.uk
Tue Sep 17 17:12:16 BST 2019


Hi Nicolas,
In the current release there are 248,916 transcript in the human database so the API fetched all of them before processing them. Then the gene knows which transcript is canonical but a transcript doesn’t knows if it’s canonical which means more queries from the API.

Because of the way the API works it is usually faster to use a slice object to get your gene/transcripts or any other object.
Unless you are really restricted by memory, I would use a foreach loop instead of the while loop with shift.

my $slice_adaptor = $reg->get_adaptor(‘human’, ‘core’, ’slice’);
foreach my $slice (@{$slice_adaptor->fetch_all(’toplevel’)}) {
  foreach my $gene (@{$slice->get_all_Genes}) {
    my $transcript = $gene->canonical_transcript;
    print $transcript->stable_id, “\n”;
  }
}

We are close to a new release so the servers can also be a bit overloaded.

Thanks
Thibaut

> On 17 Sep 2019, at 16:15, Olson, Andrew <olson at cshl.edu> wrote:
> 
> Hi Nicolas,
> For bulk operations that are pretty easy, I like to just query the database directly.
> 
> echo "select t.* from transcript t, gene g where t.transcript_id = g.canonical_transcript_id and g.is_current = 1” | mysql … > canonicalTranscripts.txt
> 
> Andrew
> 
>> On Sep 17, 2019, at 10:49 AM, Nicolas Thierry-Mieg <Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr <mailto:Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr>> wrote:
>> 
>> Hi list,
>> 
>> I want to obtain the list of Ensembl Human "canonical" transcripts.
>> As far as I can see this is not available in the GTF or GFF files that can be downloaded from ftp.ensembl.org .
>> 
>> So, I wrote the following small script that uses the perl API to connect to ensembl. My script works, but it's very slow: it took more than 16 hours, just to obtain 66832 ENST identifiers... I'ld expect it to take seconds or minutes, not hours. I must be doing something very wrong but I can't see it.
>> Please help, what is wrong with the code below?
>> Or if the issue is permanently saturated ensembl servers, is there some other way I could obtain the ensembl canonical transcripts? I tried using the UCSC Table Browser, but there are discrepancies between their "knownCanonical" table and the ensembl canonical transcripts. I also tried biomart but couldn't find "canonical" anywhere.
>> 
>> 
>> use Bio::EnsEMBL::Registry;
>> my $reg = "Bio::EnsEMBL::Registry";
>> $reg->load_registry_from_db(
>>   -host => 'ensembldb.ensembl.org',
>>   -user => 'anonymous',
>>   -species => 'homo sapiens'
>>   );
>> my $transcripts_adaptor = $reg->get_adaptor('human', 'core', 'transcript');
>> my $transcripts = $transcripts_adaptor->fetch_all;
>> 
>> while(my $transcript = shift @{$transcripts}) {
>>   ($transcript->is_canonical) || next;
>>   print $transcript->stable_id."\n" ;
>> }
>> 
>> 
>> Thanks!
>> Regards,
>> Nicolas
>> 
>> 
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ensembl.org_mailman_listinfo_dev-5Fensembl.org&d=DwICAg&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=ic-pQ08gnhTpvpqfp3_6Uw&m=thfiDlexwfeY-yTcNwP7qpwgWqIZRFegrqeZWtXHPJQ&s=eCdySxAcksmzOro4PpPFfIMSbYapkfEH2bxIcfiOtWA&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ensembl.org_mailman_listinfo_dev-5Fensembl.org&d=DwICAg&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=ic-pQ08gnhTpvpqfp3_6Uw&m=thfiDlexwfeY-yTcNwP7qpwgWqIZRFegrqeZWtXHPJQ&s=eCdySxAcksmzOro4PpPFfIMSbYapkfEH2bxIcfiOtWA&e=> Ensembl Blog: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_&d=DwICAg&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=ic-pQ08gnhTpvpqfp3_6Uw&m=thfiDlexwfeY-yTcNwP7qpwgWqIZRFegrqeZWtXHPJQ&s=76ALP_8cUWCIkw5wr56dajimDo-tNzjXSQtp4DQ7gME&e= <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_&d=DwICAg&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=ic-pQ08gnhTpvpqfp3_6Uw&m=thfiDlexwfeY-yTcNwP7qpwgWqIZRFegrqeZWtXHPJQ&s=76ALP_8cUWCIkw5wr56dajimDo-tNzjXSQtp4DQ7gME&e=> 
> 
> _______________________________________________
> Dev mailing list    Dev at ensembl.org <mailto:Dev at ensembl.org>
> Posting guidelines and subscribe/unsubscribe info: https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org <https://lists.ensembl.org/mailman/listinfo/dev_ensembl.org>
> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20190917/9876b9d9/attachment.html>


More information about the Dev mailing list