[ensembl-dev] slow response and problems to receive mouse ancestral sequence. API 83

Christian Groß - EWI C.Gross at tudelft.nl
Thu Mar 10 13:26:35 GMT 2016


Dear Matthieu,

Thank you very much for your help. I will try these additional connection parameters soon and then let you know about the results.

Best regards,

Christian

-----Original Message-----
From: dev-bounces at ensembl.org [mailto:dev-bounces at ensembl.org] On Behalf Of Matthieu Muffato
Sent: donderdag 10 maart 2016 8:55
To: Ensembl Dev
Subject: Re: [ensembl-dev] slow response and problems to receive mouse ancestral sequence. API 83

Dear Christian

First of all, we've made some changes to the script to make it easier to run in cases like yours. We'll push that soon to Github. Only the configuration options are different, the rest of the script is untouched.

I'm surprised it takes that much time to run the script form the Netherlands. We are aware that the lag when querying from America, Asia, etc, can cause some trouble and it is the reason why we have mirrors there, but the connection to ensembldb.ensembl.org should be fast from your farm.
On our farm (which is obviously very close to the data) it takes ~3h30min to extract ~2.6Gb of ancestral sequences for mouse (using the EPO mammals alignment).

There are additional connection parameters that will make the code avoid sleeping connections ("disconnect_when_inactive") / constantly check that the connection is alive ("reconnect_when_lost"), but I fear they would both have a cost in your case.

Since I have run the script, I've simply put the data on my FTP ftp://ftp.ebi.ac.uk/pub/databases/ensembl/muffato/

I would still like to know if calling set_disconnect_when_inactive(1) or
set_reconnect_when_lost(1) on the Registry helps.

Matthieu

On 07/03/16 11:01, Christian Groß - EWI wrote:
> Dear Ensembl Developer Team,
>
> My name is Christian Groß  and I am a PhD student at the TUDelft in 
> the Netherlands. For a week I try to download the ancestral sequence 
> of the Mouse-Rat ancestor. To do that I use the "get_ancestral_sequence.pl"
> which is provided by the PERL ensemble-API. I modified that script a 
> slightly bit to make it work for my purposes.
>
> I changed this
>
> /(140): my $species_name = "Homo sapiens";/
>
> /(141): my $alignment_set = "primates";/
>
> to
>
> /my $species_name = "Mus musculus";/
>
> /my $alignment_set = "mammals";/
>
> and I replaced this
>
> /(161): if ($registry_file) {/
>
> /(162):   die "Registry file '$registry_file' doesn't exist\n" if (!-e
> $registry_file);/
>
> /(163):   $reg->load_all($registry_file, 1);/
>
> /(164): } elsif ($url) {/
>
> /(165):   $reg->load_registry_from_url($url, 1);/
>
> /(166): } else {/
>
> /(167):   $reg->load_all();/
>
> /(168): }/
>
> by this.
>
> /# Auto-configure the registry/
>
> /Bio::EnsEMBL::Registry->load_registry_from_db(/
>
> /        -host=>"ensembldb.ensembl.org", -user=>"anonymous",/
>
> /        -port=>'5306');///
>
> I started the modified script on one of our servers but it runs 
> incredible slowly, most of the time it is in a sleeping state and 
> waits for a response from the ensembl servers. The first try was 
> killed by our server because it extended a 1 ½ day limit. Therefore I 
> started it again on a different server but the connection to the 
> ensemble MySQL was lost after  2 days and 4hours. Within these two 
> days only  1017MB of the ancestral sequence were downloaded.
>
> Is there any way to speed up the download or to start the download at 
> the point at which it stopped? Has my request a low priority because I 
> use the auto-configure registry?
>
> I would be really glad if you could help me in this case or point me 
> to a different method to extract the ancestral sequence from the server.
>
> Down below you will find the entire output message from starting that 
> script until the thrown exception.
>
> Much thanks in advance.
>
> Sincerely,
>
> Christian Groß
>
> (dato-env2)gross016 at assembly:/mnt/scratch/gross016/bin/nobackup/ensemb
> le_api/ensembl-compara/scripts/ancestral_sequences$
> perl get_ancestral_sequence_mouse.pl
>
> UNIVERSAL->import is deprecated and will be removed in a future perl 
> UNIVERSAL->at
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/BioPerl-1.6.1/Bio/Tree
> /TreeFunctionsI.pm
> line 94.
>
> Found MLSS mlss_id=780 name='17 eutherian mammals EPO'
>
> DBD::mysql::st execute failed: Lost connection to MySQL server during 
> query at 
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/module
> s/Bio/EnsEMBL/Compara/DBSQL/BaseAdaptor.pm
> line 170.
>
> -------------------- EXCEPTION --------------------
>
> MSG: Detected an error whilst executing SQL 'SELECT gat.node_id, 
> gat.parent_id, gat.root_id, gat.left_index, gat.right_index, 
> gat.distance_to_parent, gat.left_node_id, gat.right_node_id, 
> ga.genomic_align_id, ga.genomic_align_block_id, 
> ga.method_link_species_set_id, ga.dnafrag_id, ga.dnafrag_start, 
> ga.dnafrag_end, ga.dnafrag_strand, ga.cigar_line, ga.visible FROM ( 
> (genomic_align_tree gat)  LEFT JOIN genomic_align ga ON gat.node_id =
> ga.node_id) WHERE gat.node_id = ?  LIMIT 1': DBD::mysql::st execute
> failed: Lost connection to MySQL server during query at 
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/module
> s/Bio/EnsEMBL/Compara/DBSQL/BaseAdaptor.pm
> line 170.
>
> STACK Bio::EnsEMBL::Compara::DBSQL::BaseAdaptor::generic_fetch
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/module
> s/Bio/EnsEMBL/Compara/DBSQL/BaseAdaptor.pm:171
>
> STACK Bio::EnsEMBL::Compara::DBSQL::BaseAdaptor::generic_fetch_one
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/module
> s/Bio/EnsEMBL/Compara/DBSQL/BaseAdaptor.pm:270
>
> STACK
> Bio::EnsEMBL::Compara::DBSQL::NestedSetAdaptor::fetch_node_by_node_id
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/module
> s/Bio/EnsEMBL/Compara/DBSQL/NestedSetAdaptor.pm:98
>
> STACK
> Bio::EnsEMBL::Compara::DBSQL::NestedSetAdaptor::fetch_parent_for_node
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/module
> s/Bio/EnsEMBL/Compara/DBSQL/NestedSetAdaptor.pm:118
>
> STACK Bio::EnsEMBL::Compara::NestedSet::parent
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/module
> s/Bio/EnsEMBL/Compara/NestedSet.pm:277
>
> STACK main::dump_ancestral_sequence 
> get_ancestral_sequence_mouse.pl:290
>
> STACK toplevel get_ancestral_sequence_mouse.pl:251
>
> Date (localtime)    = Sat Mar  5 18:55:10 2016
>
> Ensembl API version = 83
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: 
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

--
Matthieu Muffato, Ph.D.
Ensembl Compara and TreeFam Project Leader European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD, United Kingdom Room  A3-145 Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468

_______________________________________________
Dev mailing list    Dev at ensembl.org
Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
Ensembl Blog: http://www.ensembl.info/




More information about the Dev mailing list