[ensembl-dev] slow response and problems to receive mouse ancestral sequence. API 83

Matthieu Muffato muffato at ebi.ac.uk
Thu Mar 10 07:54:42 GMT 2016


Dear Christian

First of all, we've made some changes to the script to make it easier to 
run in cases like yours. We'll push that soon to Github. Only the 
configuration options are different, the rest of the script is untouched.

I'm surprised it takes that much time to run the script form the 
Netherlands. We are aware that the lag when querying from America, Asia, 
etc, can cause some trouble and it is the reason why we have mirrors 
there, but the connection to ensembldb.ensembl.org should be fast from 
your farm.
On our farm (which is obviously very close to the data) it takes 
~3h30min to extract ~2.6Gb of ancestral sequences for mouse (using the 
EPO mammals alignment).

There are additional connection parameters that will make the code avoid 
sleeping connections ("disconnect_when_inactive") / constantly check 
that the connection is alive ("reconnect_when_lost"), but I fear they 
would both have a cost in your case.

Since I have run the script, I've simply put the data on my FTP 
ftp://ftp.ebi.ac.uk/pub/databases/ensembl/muffato/

I would still like to know if calling set_disconnect_when_inactive(1) or 
set_reconnect_when_lost(1) on the Registry helps.

Matthieu

On 07/03/16 11:01, Christian Groß - EWI wrote:
> Dear Ensembl Developer Team,
>
> My name is Christian Groß  and I am a PhD student at the TUDelft in the
> Netherlands. For a week I try to download the ancestral sequence of the
> Mouse-Rat ancestor. To do that I use the “get_ancestral_sequence.pl”
> which is provided by the PERL ensemble-API. I modified that script a
> slightly bit to make it work for my purposes.
>
> I changed this
>
> /(140): my $species_name = "Homo sapiens";/
>
> /(141): my $alignment_set = "primates";/
>
> to
>
> /my $species_name = "Mus musculus";/
>
> /my $alignment_set = "mammals";/
>
> and I replaced this
>
> /(161): if ($registry_file) {/
>
> /(162):   die "Registry file '$registry_file' doesn't exist\n" if (!-e
> $registry_file);/
>
> /(163):   $reg->load_all($registry_file, 1);/
>
> /(164): } elsif ($url) {/
>
> /(165):   $reg->load_registry_from_url($url, 1);/
>
> /(166): } else {/
>
> /(167):   $reg->load_all();/
>
> /(168): }/
>
> by this.
>
> /# Auto-configure the registry/
>
> /Bio::EnsEMBL::Registry->load_registry_from_db(/
>
> /        -host=>"ensembldb.ensembl.org", -user=>"anonymous",/
>
> /        -port=>'5306');///
>
> I started the modified script on one of our servers but it runs
> incredible slowly, most of the time it is in a sleeping state and waits
> for a response from the ensembl servers. The first try was killed by our
> server because it extended a 1 ½ day limit. Therefore I started it again
> on a different server but the connection to the ensemble MySQL was lost
> after  2 days and 4hours. Within these two days only  1017MB of the
> ancestral sequence were downloaded.
>
> Is there any way to speed up the download or to start the download at
> the point at which it stopped? Has my request a low priority because I
> use the auto-configure registry?
>
> I would be really glad if you could help me in this case or point me to
> a different method to extract the ancestral sequence from the server.
>
> Down below you will find the entire output message from starting that
> script until the thrown exception.
>
> Much thanks in advance.
>
> Sincerely,
>
> Christian Groß
>
> (dato-env2)gross016 at assembly:/mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/scripts/ancestral_sequences$
> perl get_ancestral_sequence_mouse.pl
>
> UNIVERSAL->import is deprecated and will be removed in a future perl at
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/BioPerl-1.6.1/Bio/Tree/TreeFunctionsI.pm
> line 94.
>
> Found MLSS mlss_id=780 name='17 eutherian mammals EPO'
>
> DBD::mysql::st execute failed: Lost connection to MySQL server during
> query at
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/BaseAdaptor.pm
> line 170.
>
> -------------------- EXCEPTION --------------------
>
> MSG: Detected an error whilst executing SQL 'SELECT gat.node_id,
> gat.parent_id, gat.root_id, gat.left_index, gat.right_index,
> gat.distance_to_parent, gat.left_node_id, gat.right_node_id,
> ga.genomic_align_id, ga.genomic_align_block_id,
> ga.method_link_species_set_id, ga.dnafrag_id, ga.dnafrag_start,
> ga.dnafrag_end, ga.dnafrag_strand, ga.cigar_line, ga.visible FROM (
> (genomic_align_tree gat)  LEFT JOIN genomic_align ga ON gat.node_id =
> ga.node_id) WHERE gat.node_id = ?  LIMIT 1': DBD::mysql::st execute
> failed: Lost connection to MySQL server during query at
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/BaseAdaptor.pm
> line 170.
>
> STACK Bio::EnsEMBL::Compara::DBSQL::BaseAdaptor::generic_fetch
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/BaseAdaptor.pm:171
>
> STACK Bio::EnsEMBL::Compara::DBSQL::BaseAdaptor::generic_fetch_one
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/BaseAdaptor.pm:270
>
> STACK
> Bio::EnsEMBL::Compara::DBSQL::NestedSetAdaptor::fetch_node_by_node_id
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/NestedSetAdaptor.pm:98
>
> STACK
> Bio::EnsEMBL::Compara::DBSQL::NestedSetAdaptor::fetch_parent_for_node
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/modules/Bio/EnsEMBL/Compara/DBSQL/NestedSetAdaptor.pm:118
>
> STACK Bio::EnsEMBL::Compara::NestedSet::parent
> /mnt/scratch/gross016/bin/nobackup/ensemble_api/ensembl-compara/modules/Bio/EnsEMBL/Compara/NestedSet.pm:277
>
> STACK main::dump_ancestral_sequence get_ancestral_sequence_mouse.pl:290
>
> STACK toplevel get_ancestral_sequence_mouse.pl:251
>
> Date (localtime)    = Sat Mar  5 18:55:10 2016
>
> Ensembl API version = 83
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>

-- 
Matthieu Muffato, Ph.D.
Ensembl Compara and TreeFam Project Leader
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus, Hinxton
Cambridge, CB10 1SD, United Kingdom
Room  A3-145
Phone + 44 (0) 1223 49 4631
Fax   + 44 (0) 1223 49 4468




More information about the Dev mailing list