[ensembl-dev] Bulk download of Microarray Probe mapping via MySQL

Thomas Juettemann juettemann at ebi.ac.uk
Thu Jul 18 21:46:32 BST 2013


Hi Alex,

ensembl-hive is maintained by a different team and has historically differently labelled  "sticky tags" in CVS. There are 2 ways to find out which tags are assigned to a file.

1. Web

Go to 
http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBAdaptor.pm?root=ensembl&view=log

There you see a pull-down menu labelled "Sticky Tag", which shows you the different revisions. 

2. Command line:

[farm2-head4]~: cvs status -v ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBAdaptor.pm 
Password: 
===================================================================
File: DBAdaptor.pm     	Status: Up-to-date

   Working revision:	1.20
   Repository revision:	1.20	/cvsroot/ensembl/ensembl-hive/modules/Bio/EnsEMBL/Hive/DBSQL/DBAdaptor.pm,v
   Commit Identifier:	EMyGmlbNsSzQyYFw
   Sticky Tag:		lg4_pre_rel72_20130423 (revision: 1.20)
   Sticky Date:		(none)
   Sticky Options:	(none)

   Existing Tags:
	lg4_post_rel72_2013_06_14_accu_pgsql	(revision: 1.21)
	lg4_pre_rel72_20130423   	(revision: 1.20)
	lg4_mid_rel71_20130228   	(revision: 1.20)
	lg4_post_rel70_20121212  	(revision: 1.19)
	lg4_post_rel70_20121127  	(revision: 1.19)
	lg4_pre_rel70_20121112   	(revision: 1.19)
	lg4_pre_rel70_20121102   	(revision: 1.19)
	lg4_post_rel69_20121023  	(revision: 1.19)
	lg4_pre_rel69_20120802   	(revision: 1.19)
	lg4_rel_68_20120626      	(revision: 1.19)
	lg4_pre_rel_68_20120528_multimeadow_aware	(revision: 1.18)
	lg4_post_rel67_20120518_pre_meadow_name	(revision: 1.18)
	lg4_post_rel_67_20120416 	(revision: 1.18)
	lg4_post_rel_66_20120210 	(revision: 1.18)
	projection_genebuild_2011_07	(branch: 1.9.2)
	lg4_post_rel63_20110712  	(revision: 1.16)
	lg4_post_rel_62_20110413_pre_rename	(revision: 1.14)
	lg4_post_rel_62_20110317 	(revision: 1.14)
	lg4_post_rel_61_20110106 	(revision: 1.14)
	lg4_mid_rel60_20100923   	(revision: 1.14)
	lg4_post_rel59_20100702  	(revision: 1.10)
	stable                   	(revision: 1.21)
	lg4_post_rel58_20100603  	(revision: 1.9)
	lg4_post_rel58_20100511  	(revision: 1.9)
	lg4_pre_rel58_20090322   	(revision: 1.9)
	lg4_after_merger_20090713	(revision: 1.7)
	lg4_pre_merger_20090713  	(revision: 1.7)
	lg4_branch_20090306      	(branch: 1.7.4)
	lg4_base_20090306        	(revision: 1.7)
	branch-ensembl-31        	(branch: 1.7.2)
	ensembl-31-branchpoint   	(revision: 1.7)
	branch-ensembl-29        	(revision: 1.7)

By removing the tag you checked out the HEAD tag. That is the latest code, which runs in general the risk of not being coherent. In case of the Hive code the risk is very small though.

Nathan forgot to mention that you also need to specify the DNAdb options, otherwise the line

my $core_params    = ${&process_DB_options($db_opts, ['core'], 1, 'dnadb')}{core};

returns undef.

Please try this:

perl ../ensembl_scripts/dump_array_annotations.pl \
perl ~/local/src/ensembl/ensembl-functgenomics/scripts/export/dump_array_annotations.pl \
 -user anonymous \
 -port 3306 \
 -host ensembldb.ensembl.org \
 -dbname homo_sapiens_funcgen_72_37 \
 -dnadb_user anonymous\
 -dnadb_port 3306 \
 -dnadb_host useastdb.ensembl.org \
 -dnadb_name homo_sapiens_core_72_37 \
 -arrays HuGene-1_0-st-v1 \
 -features

Please note that I changed  -host to use   ensembldb.ensembl.org instead of useast as I got this error when using useast:
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
Child exited with value 1
Error:	Inappropriate ioctl for device

It might be just a timeout,  please try using useast when you get the same when using  ensembldb.ensembl.org, we will follow up the error above

Please let us know when you have anymore questions.

Cheers,
Thomas


From: Alex Holman <aholman at jimmy.harvard.edu>
Date: 18 July 2013 18:49
Subject: Re: [ensembl-dev] Bulk download of Microarray Probe mapping via MySQL
To: dev at ensembl.org


Hi Nathan,
Thanks a lot for writing this, its a big help. I'm having a couple of
problems getting it going.

I'm on v.72 of the api, and my command line for running the script is:
perl ../ensembl_scripts/dump_array_annotations.pl -user anonymous
-pass -port 3306 -host useastdb.ensembl.org -dbname
"homo_sapiens_funcgen_72_37" -arrays HuGene-1_0-st-v1 -features

First, a simple issue that I solved on my own, but I'm going to just
list here for posterity.
The script looks for a module DBAdaptor.pm located in the ensembl-hive
directory of the CVS.
Can't locate Bio/EnsEMBL/Hive/DBSQL/DBAdaptor.pm
ensembl-hive doesn't appear to come down with the standard CVS
commands listed in the ensembl CVS instructions:
http://useast.ensembl.org/info/docs/api/api_cvs.html as it doesn't
appear to be associated with branch 72.  If you remove '-r
branch-ensembl-72' from the CVS command, ensembl-hive downloads
properly.

Second, something that I haven't solved yet.
When I try to run the script with the above command I get the error:
Can't use an undefined value as a HASH reference at
/Users/alex/Work/2013_07_01_get_Ensembl_annotations/ensembl_api/ensembl-functgenomics/modules/Bio/EnsEMBL/Funcgen/Utils/DBAdaptorHelper.pm
line 282.

Line 282 in that module is the %{$core_params} hash reference used in
calling create_DBAdaptor_from_params.

sub create_Funcgen_DBAdaptor_from_options {
 my $db_opts = $_[0];

 my $funcgen_params = ${&process_DB_options($db_opts, ['funcgen'])}{funcgen};
 #This simply allows dnadb_params to be optional
 my $core_params    = ${&process_DB_options($db_opts, ['core'], 1,
'dnadb')}{core};

 return create_DBAdaptor_from_params({%{$funcgen_params},
**line 282**                           %{$core_params}},
                                     'funcgen');
}

In addition, I tried using the virtual machine image provided from
Ensembl just to verify that the issue wasn't something with my
install.  After dealing with the ensemble-hive issue above I ran into
the same problem.
http://useast.ensembl.org/info/data/virtual_machine.html

Thanks again for working on this.
-Alex-


--
Thomas Juettemann, PhD

Senior Technical Officer
Ensembl DNA Regulation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UK

http://www.ensembl.info/
http://www.facebook.com/Ensembl.org
http://twitter.com/#!/ensembl





More information about the Dev mailing list