[ensembl-dev] Error messages when using the Variation API

john.samuel john.samuel at senecacollege.ca
Fri May 27 04:31:17 BST 2016


Hi Johanne,
I have also been having an intermittent problem where the code dies 
while executing exactly the same line as Anja had.
In my case I am trying to get all the homologues for a given gene.
Here is the stack trace:

DBD::mysql::st execute failed: Lost connection to MySQL server during 
query at 
/home/john.samuel/src/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm 
line 482, <RE1_IN> line 205.

-------------------- EXCEPTION --------------------
MSG: Detected an error whilst executing SQL 'SELECT  g.gene_id, 
g.seq_region_id, g.seq_region_start, g.seq_region_end, 
g.seq_region_strand, g.analysis_id, g.biotype, g.display_xref_id, 
g.description, g.status, g.source, g.is_current, 
g.canonical_transcript_id, g.stable_id, g.version, 
UNIX_TIMESTAMP(g.created_date), UNIX_TIMESTAMP(g.modified_date), 
x.display_label, x.dbprimary_acc, x.description, x.version, 
exdb.db_name, exdb.status, exdb.db_release, exdb.db_display_name, 
x.info_type, x.info_text
FROM (( (gene g)
   LEFT JOIN xref x ON x.xref_id = g.display_xref_id )
   LEFT JOIN external_db exdb ON exdb.external_db_id = x.external_db_id )
  WHERE g.stable_id = ? AND g.is_current = 1
': DBD::mysql::st execute failed: Lost connection to MySQL server during 
query at 
/home/john.samuel/src/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm 
line 482, <RE1_IN> line 205.

STACK Bio::EnsEMBL::DBSQL::BaseAdaptor::generic_fetch 
/home/john.samuel/src/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm:483
STACK Bio::EnsEMBL::DBSQL::GeneAdaptor::fetch_by_stable_id 
/home/john.samuel/src/ensembl/modules/Bio/EnsEMBL/DBSQL/GeneAdaptor.pm:249
STACK Bio::EnsEMBL::Compara::GeneMember::get_Gene 
/home/john.samuel/src/ensembl-compara/modules/Bio/EnsEMBL/Compara/GeneMember.pm:183
STACK main::get_homologues slartibartfast.pl:488
STACK toplevel slartibartfast.pl:348
Date (localtime)    = Mon May  9 17:05:48 2016
Ensembl API version = 80
---------------------------------------------------

Any ideas about the cause of this problem, and a possible fix?  I 
haven't had problems like this before, even with other programs with run 
times of many hours, only when trying to get homologues.
My program does run for a long time too (> 2 hours) before this occurs, 
and some of the genes (zebrafish) have a lot of homologues.

Regards,
John


On 16-05-24 07:48 AM, Anja Thormann wrote:
> Hello Johanne,
>
>> The only exception I get now from time to time, is:
>> DBD::mysql::st execute failed: Lost connection to MySQL server during 
>> query at 
>> /Users/Johanne/src/ensembl/modules//Bio/EnsEMBL/DBSQL/BaseAdaptor.pm 
>> line 482.
>>
>> Is it caused by something concerning my local MySQL installation or 
>> Internet connection?
>
> This is probably related to a long run time of the script. I will look 
> into this.
>
>> As our thread contained two conversations, I wonder if you got the 
>> same results on the LD variant expansion script as I did with the two 
>> reference genomes?
>
> I still need to run the comparison. But a difference of 13 variants in 
> the results doesn’t seem to be a problem to me. You are using an 
> updated assembly with hg38 which has consequences on the set of 
> variants you are looking at. The blog post 
> (http://genomeref.blogspot.co.uk/2013/12/announcing-grch38.html) 
> explains this in more detail in the section 'General assembly updates'.
>
>> And does your database contain all information from dbSNP + 1000G? I 
>> have set the script to use both database and VCF data 
>> ($variation_adaptor->db->use_vcf(1);), but did not find info on where 
>> exactly the data in the database comes from.
>
> We store all 1000G variants in our database. The variants are imported 
> from dbSNP. The genotypes for 1000G variants are stored in VCF files. 
> To use the genotypes from VCF files you need to set use_vcf to 1.
>
>
> Best,
> Anja
>
>
>> Best,
>> Johanne
>>
>>> 23. mai 2016 kl. 16.45 skrev Will McLaren <wm2 at ebi.ac.uk 
>>> <mailto:wm2 at ebi.ac.uk>>:
>>>
>>> Hello,
>>>
>>> The files there are symbolic links to another directory - it's 
>>> possible your FTP client is not following these links.
>>>
>>> The "real" files are here:
>>>
>>> ftp://ftp.ensembl.org/pub/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/
>>>
>>> Try downloading from that path instead.
>>>
>>> Regards
>>>
>>> Will
>>>
>>> On 23 May 2016 at 15:31, Johanne Håøy Horn <johannhh at ifi.uio.no 
>>> <mailto:johannhh at ifi.uio.no>> wrote:
>>>
>>>     Hi,
>>>
>>>     I got the same error message with the full path.
>>>
>>>     I think the problem is with the hg38 .gz and .gz.tbi files. I
>>>     went back to the URL you gave me:
>>>     ftp://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/
>>>     But when I click any of the files, I get the error message "The
>>>     operation can’t be completed because the original item for <file
>>>     name> can’t be found». This is not a problem for the hg19 ftp
>>>     connection, whose files I can open just fine.
>>>     I have restarted my computer (Max OS X 10.11.5) and remounted
>>>     the connection several times. I log on using guest.
>>>
>>>     So the files I downloaded were not existing after all, and the
>>>     error message were correct.
>>>
>>>     Do you have any suggestions as to how I can mount it correctly?
>>>
>>>     Best,
>>>     Johanne
>>>
>>>>     23. mai 2016 kl. 15.31 skrev Will McLaren <wm2 at ebi.ac.uk
>>>>     <mailto:wm2 at ebi.ac.uk>>:
>>>>
>>>>     I think possibly Perl doesn't like using "~" to represent your
>>>>     home directory - try replacing it with the full path, or
>>>>     possibly $ENV{HOME}
>>>>
>>>>     Will
>>>>
>>>>     On 23 May 2016 at 13:59, Johanne Håøy Horn <johannhh at ifi.uio.no
>>>>     <mailto:johannhh at ifi.uio.no>> wrote:
>>>>
>>>>         Thank you for your wonderful support!
>>>>
>>>>         I tried now with the following JSON struct:
>>>>         {
>>>>             "id": "1000genomes_phase3",
>>>>             "species": "homo_sapiens",
>>>>         "assembly": "GRCh38",
>>>>             "type": "local",
>>>>         "strict_name_match": 1,
>>>>         "filename_template":
>>>>         "~/src/ensembl-vcf/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
>>>>         "chromosomes": [
>>>>               "1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
>>>>         "11", "12", "13", "14",
>>>>               "15", "16", "17", "18", "19", "20", "21", "22", "X", "Y"
>>>>             ],
>>>>         "sample_prefix": "1000GENOMES:phase_3:"
>>>>           },
>>>>
>>>>         I got this error message:
>>>>         MSG: ERROR: VCF file
>>>>         ~/src/ensembl-vcf/ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz
>>>>         not found
>>>>
>>>>         Should the tbi files be where I call the script? Or is it
>>>>         something else I am doing wrong?
>>>>
>>>>         Best,
>>>>         Johanne
>>>>
>>>>>         23. mai 2016 kl. 14.04 skrev Will McLaren <wm2 at ebi.ac.uk
>>>>>         <mailto:wm2 at ebi.ac.uk>>:
>>>>>
>>>>>         Hi Johanne,
>>>>>
>>>>>         You need the filename part of the template too, so:
>>>>>
>>>>>          "filename_template":
>>>>>         "~/src/ensembl-vcf/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
>>>>>
>>>>>         Regards
>>>>>
>>>>>         Will
>>>>>
>>>>>         On 23 May 2016 at 12:57, Johanne Håøy Horn
>>>>>         <johannhh at ifi.uio.no <mailto:johannhh at ifi.uio.no>> wrote:
>>>>>
>>>>>             Hello again!
>>>>>
>>>>>             I tried set the following in the JSON file:
>>>>>
>>>>>              {
>>>>>                 "id": "1000genomes_phase3",
>>>>>                 "species": "homo_sapiens",
>>>>>             "assembly": "GRCh38",
>>>>>                 "type": "local",
>>>>>             "strict_name_match": 1,
>>>>>             "filename_template": "~/src/ensembl-vcf/",
>>>>>             "chromosomes": [
>>>>>                   "1", "2", "3", "4", "5", "6", "7", "8", "9",
>>>>>             "10", "11", "12", "13", "14",
>>>>>                   "15", "16", "17", "18", "19", "20", "21", "22",
>>>>>             "X", "Y"
>>>>>                 ],
>>>>>             "sample_prefix": "1000GENOMES:phase_3:"
>>>>>               },
>>>>>
>>>>>             But I get this error message:
>>>>>             MSG: ERROR: VCF file ~/src/ensembl-vcf/ not found
>>>>>
>>>>>             I downloaded all the hg38 files you linked to in the
>>>>>             folder ~/src/ensembl-vcf/. When  you say that I need
>>>>>             to change filename_template to the path where the
>>>>>             files were downloaded, is it the full path of all the
>>>>>             48 files rather than the path to the folder they are in?
>>>>>
>>>>>             Best,
>>>>>             Johanne
>>>>>
>>>>>>             23. mai 2016 kl. 11.57 skrev Will McLaren
>>>>>>             <wm2 at ebi.ac.uk <mailto:wm2 at ebi.ac.uk>>:
>>>>>>
>>>>>>             Hi Johanne,
>>>>>>
>>>>>>             It looks like the API is intermittently losing
>>>>>>             connection to the remote VCF files hosted on our FTP
>>>>>>             site.
>>>>>>
>>>>>>             You can bypass this connection by downloading the
>>>>>>             files to your local machine:
>>>>>>
>>>>>>             GRCh38:
>>>>>>             ftp://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/
>>>>>>             GRCh37:
>>>>>>             ftp://ftp.ensembl.org/pub/grch37/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/
>>>>>>
>>>>>>             You will then need to edit
>>>>>>             [module_path]/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json,
>>>>>>             changing the "filename_template" entry to the path
>>>>>>             where you downloaded the files, and "type" from
>>>>>>             "remote" to "local".
>>>>>>
>>>>>>             Regarding the warning message, this should not affect
>>>>>>             your analyses in any way, but I have put in a fix on
>>>>>>             release/84 of ensembl-variation to suppress it.
>>>>>>
>>>>>>             Regards
>>>>>>
>>>>>>             Will McLaren
>>>>>>             Ensembl Variation
>>>>>>
>>>>>>             On 21 May 2016 at 12:57, Johanne Håøy Horn
>>>>>>             <johannhh at ifi.uio.no <mailto:johannhh at ifi.uio.no>> wrote:
>>>>>>
>>>>>>                 Dear ensembl dev team,
>>>>>>
>>>>>>                 I have been using your variation API for some
>>>>>>                 time now, and get a range of errors from time to
>>>>>>                 time, without knowing exactly why. It is not
>>>>>>                 because of the scripts, I think, as the same
>>>>>>                 script producing the error can work just fine if
>>>>>>                 I run it again.
>>>>>>
>>>>>>                 The different error messages are:
>>>>>>                 /Parser/BaseVCF4.pm line 891, <IN> line 5.
>>>>>>                 Use of uninitialized value in list assignment at
>>>>>>                 /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm
>>>>>>                 line 891, <IN> line 5.
>>>>>>                 connect: Operation timed out
>>>>>>                 [kftp_connect_file] 350 Restarting at 654385206.
>>>>>>                 Send STORE or RETRIEVE to initiate transfer
>>>>>>                 [kftp_connect_file] 227 Entering Passive Mode
>>>>>>                 (193,62,203,85,220,250).
>>>>>>                 Tabix::tabix_query: t is not of type tabix_tPtr
>>>>>>                 at
>>>>>>                 /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/TabixParser.pm
>>>>>>                 line 70.
>>>>>>                 [kftp_connect_file] 227 Entering Passive Mode
>>>>>>                 (193,62,203,85,157,134).
>>>>>>                 [main] fail to open the data file.
>>>>>>                 Can't use an undefined value as an ARRAY
>>>>>>                 reference at
>>>>>>                 /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm
>>>>>>                 line 730.
>>>>>>
>>>>>>                 Usually just one of these error occur at a time.
>>>>>>                 I suspect it might have something to do with the
>>>>>>                 connection between my computer and the ensembl
>>>>>>                 database, as the first error at least always show
>>>>>>                 up in repeats when I lose my Internet connection.
>>>>>>                 However, are all of them caused by Internet
>>>>>>                 trouble? I have checked that the MySQL instance
>>>>>>                 is up and running, and can visit web pages
>>>>>>                 through a browser when some of the errors occur.
>>>>>>                 Could it be something on the server/database side?
>>>>>>
>>>>>>                 Also, if I use the GRCh37 database:
>>>>>>
>>>>>>                 $registry->load_registry_from_db(
>>>>>>                   -host => 'ensembldb.ensembl.org
>>>>>>                 <http://ensembldb.ensembl.org/>',
>>>>>>                   -user => 'anonymous',
>>>>>>                   -port => 3337,
>>>>>>                 );
>>>>>>
>>>>>>                 I get this warning/printout:
>>>>>>                 Use of uninitialized value $nums{"."} in numeric
>>>>>>                 comparison (<=>) at
>>>>>>                 /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/VCFCollection.pm
>>>>>>                 line 770, <IN> line 6.
>>>>>>
>>>>>>                 I use version 84 of the Ensembl API, OS X
>>>>>>                 10.11.5, and the script I use when all of these
>>>>>>                 errors occur, is attached. Note that the attached
>>>>>>                 script by default uses hg38, but will produce the
>>>>>>                 last printout mentioned when switching to hg37.
>>>>>>
>>>>>>                 And something different I have been wondering about:
>>>>>>                 The VCF files that are downloaded locally
>>>>>>                 (ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi,
>>>>>>                 for instance) - should they be deleted and
>>>>>>                 re-downloaded from time to time to get the latest
>>>>>>                 1000G data? And where exactly are the VCFs
>>>>>>                 downloaded from? Is it dbSNP, as indicated in the
>>>>>>                 file name?
>>>>>>
>>>>>>                 Best,
>>>>>>                 Johanne Håøy Horn
>>>>>>
>>>>>>                 _______________________________________________
>>>>>>                 Dev mailing list Dev at ensembl.org
>>>>>>                 <mailto:Dev at ensembl.org>
>>>>>>                 Posting guidelines and subscribe/unsubscribe
>>>>>>                 info: http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>                 Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>>             _______________________________________________
>>>>>>             Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>>>             Posting guidelines and subscribe/unsubscribe info:
>>>>>>             http://lists.ensembl.org/mailman/listinfo/dev
>>>>>>             Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>>             _______________________________________________
>>>>>             Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>>             Posting guidelines and subscribe/unsubscribe info:
>>>>>             http://lists.ensembl.org/mailman/listinfo/dev
>>>>>             Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>>         _______________________________________________
>>>>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>>         Posting guidelines and subscribe/unsubscribe info:
>>>>>         http://lists.ensembl.org/mailman/listinfo/dev
>>>>>         Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>>         _______________________________________________
>>>>         Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>         Posting guidelines and subscribe/unsubscribe info:
>>>>         http://lists.ensembl.org/mailman/listinfo/dev
>>>>         Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>>     _______________________________________________
>>>>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>     Posting guidelines and subscribe/unsubscribe info:
>>>>     http://lists.ensembl.org/mailman/listinfo/dev
>>>>     Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>     _______________________________________________
>>>     Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>     Posting guidelines and subscribe/unsubscribe info:
>>>     http://lists.ensembl.org/mailman/listinfo/dev
>>>     Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info: 
>>> http://lists.ensembl.org/mailman/listinfo/dev 
>>> <http://lists.ensembl.org/mailman/listinfo/dev>
>>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info: 
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160526/4add1df8/attachment.html>


More information about the Dev mailing list