[ensembl-dev] perl API slow script
Olson, Andrew
olson at cshl.edu
Tue Sep 17 16:15:00 BST 2019
Hi Nicolas,
For bulk operations that are pretty easy, I like to just query the database directly.
echo "select t.* from transcript t, gene g where t.transcript_id = g.canonical_transcript_id and g.is_current = 1” | mysql … > canonicalTranscripts.txt
Andrew
> On Sep 17, 2019, at 10:49 AM, Nicolas Thierry-Mieg <Nicolas.Thierry-Mieg at univ-grenoble-alpes.fr> wrote:
>
> Hi list,
>
> I want to obtain the list of Ensembl Human "canonical" transcripts.
> As far as I can see this is not available in the GTF or GFF files that can be downloaded from ftp.ensembl.org .
>
> So, I wrote the following small script that uses the perl API to connect to ensembl. My script works, but it's very slow: it took more than 16 hours, just to obtain 66832 ENST identifiers... I'ld expect it to take seconds or minutes, not hours. I must be doing something very wrong but I can't see it.
> Please help, what is wrong with the code below?
> Or if the issue is permanently saturated ensembl servers, is there some other way I could obtain the ensembl canonical transcripts? I tried using the UCSC Table Browser, but there are discrepancies between their "knownCanonical" table and the ensembl canonical transcripts. I also tried biomart but couldn't find "canonical" anywhere.
>
>
> use Bio::EnsEMBL::Registry;
> my $reg = "Bio::EnsEMBL::Registry";
> $reg->load_registry_from_db(
> -host => 'ensembldb.ensembl.org',
> -user => 'anonymous',
> -species => 'homo sapiens'
> );
> my $transcripts_adaptor = $reg->get_adaptor('human', 'core', 'transcript');
> my $transcripts = $transcripts_adaptor->fetch_all;
>
> while(my $transcript = shift @{$transcripts}) {
> ($transcript->is_canonical) || next;
> print $transcript->stable_id."\n" ;
> }
>
>
> Thanks!
> Regards,
> Nicolas
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ensembl.org_mailman_listinfo_dev-5Fensembl.org&d=DwICAg&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=ic-pQ08gnhTpvpqfp3_6Uw&m=thfiDlexwfeY-yTcNwP7qpwgWqIZRFegrqeZWtXHPJQ&s=eCdySxAcksmzOro4PpPFfIMSbYapkfEH2bxIcfiOtWA&e= Ensembl Blog: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ensembl.info_&d=DwICAg&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=ic-pQ08gnhTpvpqfp3_6Uw&m=thfiDlexwfeY-yTcNwP7qpwgWqIZRFegrqeZWtXHPJQ&s=76ALP_8cUWCIkw5wr56dajimDo-tNzjXSQtp4DQ7gME&e=
More information about the Dev
mailing list