[ensembl-dev] Slow get_adaptor

mag mr6 at ebi.ac.uk
Wed Jan 8 12:02:59 GMT 2014


Hi Genomeo,

Unfortunately, if a gene id is no longer in the Ensembl database, it is 
because we were unable to find a gene in the newer release that would 
map accurately enough to the old model.
This can be because the gene was completely removed, or because its 
structure has changed so massively (add a dozen transcripts or a 5000bp 
UTR) that it does not bear enough resemblance any more.
This is more common in human because of the continuous integration of 
manual annotation.

The rest server archive endpoint ( 
http://beta.rest.ensembl.org/archive/id/ENSG00000101321?content-type=application/json) 
can help you find out when a given ID was last seen.
If that ID was not mapped in the latest release, it can sadly not 
provide you with much more information.

To recognise old gene ids, you need to use the old database version in 
which these genes were last seen.
Given your set was generated on release 73, you can use the 73 API 
version which will automatically use the 73 data.


Regards,
Magali

On 08/01/2014 11:31, Genomeo Dev wrote:
> Thanks very much Magali and Andy.
>
> With regard to the script, fetch_all_by_stable_id_list() quietly fails 
> on gene ids which are no longer in Ensembl database. I have a set of 
> genes from Ensembl 73 which I obtained based on VEP 73. I still want 
> to annotate everything in this set with start, end and external names. 
> I looked up by fetch_by* in the Doxygen Perl Documentation but could 
> not find a function which recognises old ID genes. Do you know any 
> function that solves that?
>
> G.
>
>
> On 8 January 2014 10:34, Andy Yates <ayates at ebi.ac.uk 
> <mailto:ayates at ebi.ac.uk>> wrote:
>
>     Hi there,
>
>     The blocking queries have been killed from the server so you
>     should see a marked improvement in your code's performance. Script
>     response time will be dependent on database load & your distance
>     from the MySQL server. The figures Mag quoted are a best case
>     where the database is located on the same network as the machine
>     running the API script.
>
>     Andy
>
>     ------------
>     Andrew Yates - Ensembl Support Coordinator
>     European Bioinformatics Institute (EMBL-EBI)
>     European Molecular Biology Laboratory
>     Wellcome Trust Genome Campus
>     Hinxton
>     Cambridge CB10 1SD
>     Tel: +44-(0)1223-492538 <tel:%2B44-%280%291223-492538>
>     Fax: +44-(0)1223-494468 <tel:%2B44-%280%291223-494468>
>     http://www.ensembl.org/
>
>     On 8 Jan 2014, at 10:27, mag <mr6 at ebi.ac.uk
>     <mailto:mr6 at ebi.ac.uk>> wrote:
>
>     > Hi Genomeo,
>     >
>     > I am afraid we still have some heavy and long running queries
>     slowing down our servers.
>     > We are looking into it as we speak.
>     >
>     > I have tried running your code on a local server, for all the
>     genes on chromosome 1.
>     > This returns in less than a minute, with 5363 genes and 17531
>     transcripts.
>     > It would be a little bit slower from a remote server, but this
>     gives you an idea of how fast the script should be running in
>     normal conditions.
>     >
>     > The useast server seems a lot quieter, so I would recommend you
>     try using that instead.
>     > host => useastdb.ensembl.org <http://useastdb.ensembl.org>, user
>     => anonymous
>     > (http://www.ensembl.org/info/data/mysql.html)
>     >
>     > Alternatively, if you are planning on using the databases
>     regularly and have mysql installed locally, you can create your
>     own local server.
>     > The mysql dumps are available here:
>     ftp://ftp.ensembl.org/pub/current_mysql/
>     > And instructions on how to install it can be found here:
>     http://www.ensembl.org/info/docs/webcode/mirror/install/ensembl-data.html
>     >
>     >
>     > Hope that helps,
>     > Magali
>     >
>     > On 08/01/2014 09:25, Genomeo Dev wrote:
>     >> Hi,
>     >>
>     >> This morning it is getting even slower - takes minutes to just
>     run the code for two genes.
>     >>
>     >> Do you have any advice on how I can run it for 5000 genes
>     within a reasonable time?
>     >>
>     >> G.
>     >>
>     >>
>     >> On 7 January 2014 19:26, <mr6 at ebi.ac.uk <mailto:mr6 at ebi.ac.uk>>
>     wrote:
>     >> Hi Genomeo,
>     >>
>     >> I don't think there is anything massively wrong with the code
>     you are using.
>     >>
>     >> Looking at our mysql server, it is currently under heavy load,
>     which would
>     >> explain slow response time.
>     >>
>     >> Please let us know if the problem persists.
>     >>
>     >>
>     >> Regards,
>     >> Magali
>     >>
>     >> > Hi all,
>     >> >
>     >> > I am finding this code very slow. I am using Ensembl VM 74:
>     >> >
>     >> > $gene_adaptor = Bio::EnsEMBL::Registry->get_adaptor( "human",
>     "core",
>     >> > "gene" );
>     >> >
>     >> > my $genes =
>     >> >
>     $gene_adaptor->fetch_all_by_stable_id_list(["ENSG00000249352","ENSG00000109576"]);
>     >> >
>     >> > while ( my $gene = shift @{$genes} ) {
>     >> > my $gstring = feature2string($gene);
>     >> > print "$gstring\n";
>     >> > my $transcripts = $gene->get_all_Transcripts();
>     >> > while ( my $transcript = shift @{$transcripts} ) {
>     >> > my $tstring = feature2string($transcript);
>     >> > print "\t$tstring\n";
>     >> > }
>     >> > }
>     >> >
>     >> > I suspect the first line is the problem. Any advice on how I
>     can run this
>     >> > faster? especially for a large set of genes?
>     >> >
>     >> > Thanks,
>     >> >
>     >> > Genomeo
>     >> > _______________________________________________
>     >> > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     >> > Posting guidelines and subscribe/unsubscribe info:
>     >> > http://lists.ensembl.org/mailman/listinfo/dev
>     >> > Ensembl Blog: http://www.ensembl.info/
>     >> >
>     >>
>     >>
>     >>
>     >
>     > _______________________________________________
>     > Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>     > Posting guidelines and subscribe/unsubscribe info:
>     http://lists.ensembl.org/mailman/listinfo/dev
>     > Ensembl Blog: http://www.ensembl.info/
>
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20140108/fb449203/attachment.html>


More information about the Dev mailing list