[ensembl-dev] Error messages when using the Variation API
john.samuel
john.samuel at senecacollege.ca
Fri May 27 04:31:17 BST 2016
Hi Johanne,
I have also been having an intermittent problem where the code dies
while executing exactly the same line as Anja had.
In my case I am trying to get all the homologues for a given gene.
Here is the stack trace:
DBD::mysql::st execute failed: Lost connection to MySQL server during
query at
/home/john.samuel/src/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm
line 482, <RE1_IN> line 205.
-------------------- EXCEPTION --------------------
MSG: Detected an error whilst executing SQL 'SELECT g.gene_id,
g.seq_region_id, g.seq_region_start, g.seq_region_end,
g.seq_region_strand, g.analysis_id, g.biotype, g.display_xref_id,
g.description, g.status, g.source, g.is_current,
g.canonical_transcript_id, g.stable_id, g.version,
UNIX_TIMESTAMP(g.created_date), UNIX_TIMESTAMP(g.modified_date),
x.display_label, x.dbprimary_acc, x.description, x.version,
exdb.db_name, exdb.status, exdb.db_release, exdb.db_display_name,
x.info_type, x.info_text
FROM (( (gene g)
LEFT JOIN xref x ON x.xref_id = g.display_xref_id )
LEFT JOIN external_db exdb ON exdb.external_db_id = x.external_db_id )
WHERE g.stable_id = ? AND g.is_current = 1
': DBD::mysql::st execute failed: Lost connection to MySQL server during
query at
/home/john.samuel/src/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm
line 482, <RE1_IN> line 205.
STACK Bio::EnsEMBL::DBSQL::BaseAdaptor::generic_fetch
/home/john.samuel/src/ensembl/modules/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm:483
STACK Bio::EnsEMBL::DBSQL::GeneAdaptor::fetch_by_stable_id
/home/john.samuel/src/ensembl/modules/Bio/EnsEMBL/DBSQL/GeneAdaptor.pm:249
STACK Bio::EnsEMBL::Compara::GeneMember::get_Gene
/home/john.samuel/src/ensembl-compara/modules/Bio/EnsEMBL/Compara/GeneMember.pm:183
STACK main::get_homologues slartibartfast.pl:488
STACK toplevel slartibartfast.pl:348
Date (localtime) = Mon May 9 17:05:48 2016
Ensembl API version = 80
---------------------------------------------------
Any ideas about the cause of this problem, and a possible fix? I
haven't had problems like this before, even with other programs with run
times of many hours, only when trying to get homologues.
My program does run for a long time too (> 2 hours) before this occurs,
and some of the genes (zebrafish) have a lot of homologues.
Regards,
John
On 16-05-24 07:48 AM, Anja Thormann wrote:
> Hello Johanne,
>
>> The only exception I get now from time to time, is:
>> DBD::mysql::st execute failed: Lost connection to MySQL server during
>> query at
>> /Users/Johanne/src/ensembl/modules//Bio/EnsEMBL/DBSQL/BaseAdaptor.pm
>> line 482.
>>
>> Is it caused by something concerning my local MySQL installation or
>> Internet connection?
>
> This is probably related to a long run time of the script. I will look
> into this.
>
>> As our thread contained two conversations, I wonder if you got the
>> same results on the LD variant expansion script as I did with the two
>> reference genomes?
>
> I still need to run the comparison. But a difference of 13 variants in
> the results doesn’t seem to be a problem to me. You are using an
> updated assembly with hg38 which has consequences on the set of
> variants you are looking at. The blog post
> (http://genomeref.blogspot.co.uk/2013/12/announcing-grch38.html)
> explains this in more detail in the section 'General assembly updates'.
>
>> And does your database contain all information from dbSNP + 1000G? I
>> have set the script to use both database and VCF data
>> ($variation_adaptor->db->use_vcf(1);), but did not find info on where
>> exactly the data in the database comes from.
>
> We store all 1000G variants in our database. The variants are imported
> from dbSNP. The genotypes for 1000G variants are stored in VCF files.
> To use the genotypes from VCF files you need to set use_vcf to 1.
>
>
> Best,
> Anja
>
>
>> Best,
>> Johanne
>>
>>> 23. mai 2016 kl. 16.45 skrev Will McLaren <wm2 at ebi.ac.uk
>>> <mailto:wm2 at ebi.ac.uk>>:
>>>
>>> Hello,
>>>
>>> The files there are symbolic links to another directory - it's
>>> possible your FTP client is not following these links.
>>>
>>> The "real" files are here:
>>>
>>> ftp://ftp.ensembl.org/pub/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/
>>>
>>> Try downloading from that path instead.
>>>
>>> Regards
>>>
>>> Will
>>>
>>> On 23 May 2016 at 15:31, Johanne Håøy Horn <johannhh at ifi.uio.no
>>> <mailto:johannhh at ifi.uio.no>> wrote:
>>>
>>> Hi,
>>>
>>> I got the same error message with the full path.
>>>
>>> I think the problem is with the hg38 .gz and .gz.tbi files. I
>>> went back to the URL you gave me:
>>> ftp://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/
>>> But when I click any of the files, I get the error message "The
>>> operation can’t be completed because the original item for <file
>>> name> can’t be found». This is not a problem for the hg19 ftp
>>> connection, whose files I can open just fine.
>>> I have restarted my computer (Max OS X 10.11.5) and remounted
>>> the connection several times. I log on using guest.
>>>
>>> So the files I downloaded were not existing after all, and the
>>> error message were correct.
>>>
>>> Do you have any suggestions as to how I can mount it correctly?
>>>
>>> Best,
>>> Johanne
>>>
>>>> 23. mai 2016 kl. 15.31 skrev Will McLaren <wm2 at ebi.ac.uk
>>>> <mailto:wm2 at ebi.ac.uk>>:
>>>>
>>>> I think possibly Perl doesn't like using "~" to represent your
>>>> home directory - try replacing it with the full path, or
>>>> possibly $ENV{HOME}
>>>>
>>>> Will
>>>>
>>>> On 23 May 2016 at 13:59, Johanne Håøy Horn <johannhh at ifi.uio.no
>>>> <mailto:johannhh at ifi.uio.no>> wrote:
>>>>
>>>> Thank you for your wonderful support!
>>>>
>>>> I tried now with the following JSON struct:
>>>> {
>>>> "id": "1000genomes_phase3",
>>>> "species": "homo_sapiens",
>>>> "assembly": "GRCh38",
>>>> "type": "local",
>>>> "strict_name_match": 1,
>>>> "filename_template":
>>>> "~/src/ensembl-vcf/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
>>>> "chromosomes": [
>>>> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
>>>> "11", "12", "13", "14",
>>>> "15", "16", "17", "18", "19", "20", "21", "22", "X", "Y"
>>>> ],
>>>> "sample_prefix": "1000GENOMES:phase_3:"
>>>> },
>>>>
>>>> I got this error message:
>>>> MSG: ERROR: VCF file
>>>> ~/src/ensembl-vcf/ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz
>>>> not found
>>>>
>>>> Should the tbi files be where I call the script? Or is it
>>>> something else I am doing wrong?
>>>>
>>>> Best,
>>>> Johanne
>>>>
>>>>> 23. mai 2016 kl. 14.04 skrev Will McLaren <wm2 at ebi.ac.uk
>>>>> <mailto:wm2 at ebi.ac.uk>>:
>>>>>
>>>>> Hi Johanne,
>>>>>
>>>>> You need the filename part of the template too, so:
>>>>>
>>>>> "filename_template":
>>>>> "~/src/ensembl-vcf/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
>>>>>
>>>>> Regards
>>>>>
>>>>> Will
>>>>>
>>>>> On 23 May 2016 at 12:57, Johanne Håøy Horn
>>>>> <johannhh at ifi.uio.no <mailto:johannhh at ifi.uio.no>> wrote:
>>>>>
>>>>> Hello again!
>>>>>
>>>>> I tried set the following in the JSON file:
>>>>>
>>>>> {
>>>>> "id": "1000genomes_phase3",
>>>>> "species": "homo_sapiens",
>>>>> "assembly": "GRCh38",
>>>>> "type": "local",
>>>>> "strict_name_match": 1,
>>>>> "filename_template": "~/src/ensembl-vcf/",
>>>>> "chromosomes": [
>>>>> "1", "2", "3", "4", "5", "6", "7", "8", "9",
>>>>> "10", "11", "12", "13", "14",
>>>>> "15", "16", "17", "18", "19", "20", "21", "22",
>>>>> "X", "Y"
>>>>> ],
>>>>> "sample_prefix": "1000GENOMES:phase_3:"
>>>>> },
>>>>>
>>>>> But I get this error message:
>>>>> MSG: ERROR: VCF file ~/src/ensembl-vcf/ not found
>>>>>
>>>>> I downloaded all the hg38 files you linked to in the
>>>>> folder ~/src/ensembl-vcf/. When you say that I need
>>>>> to change filename_template to the path where the
>>>>> files were downloaded, is it the full path of all the
>>>>> 48 files rather than the path to the folder they are in?
>>>>>
>>>>> Best,
>>>>> Johanne
>>>>>
>>>>>> 23. mai 2016 kl. 11.57 skrev Will McLaren
>>>>>> <wm2 at ebi.ac.uk <mailto:wm2 at ebi.ac.uk>>:
>>>>>>
>>>>>> Hi Johanne,
>>>>>>
>>>>>> It looks like the API is intermittently losing
>>>>>> connection to the remote VCF files hosted on our FTP
>>>>>> site.
>>>>>>
>>>>>> You can bypass this connection by downloading the
>>>>>> files to your local machine:
>>>>>>
>>>>>> GRCh38:
>>>>>> ftp://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/
>>>>>> GRCh37:
>>>>>> ftp://ftp.ensembl.org/pub/grch37/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/
>>>>>>
>>>>>> You will then need to edit
>>>>>> [module_path]/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json,
>>>>>> changing the "filename_template" entry to the path
>>>>>> where you downloaded the files, and "type" from
>>>>>> "remote" to "local".
>>>>>>
>>>>>> Regarding the warning message, this should not affect
>>>>>> your analyses in any way, but I have put in a fix on
>>>>>> release/84 of ensembl-variation to suppress it.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Will McLaren
>>>>>> Ensembl Variation
>>>>>>
>>>>>> On 21 May 2016 at 12:57, Johanne Håøy Horn
>>>>>> <johannhh at ifi.uio.no <mailto:johannhh at ifi.uio.no>> wrote:
>>>>>>
>>>>>> Dear ensembl dev team,
>>>>>>
>>>>>> I have been using your variation API for some
>>>>>> time now, and get a range of errors from time to
>>>>>> time, without knowing exactly why. It is not
>>>>>> because of the scripts, I think, as the same
>>>>>> script producing the error can work just fine if
>>>>>> I run it again.
>>>>>>
>>>>>> The different error messages are:
>>>>>> /Parser/BaseVCF4.pm line 891, <IN> line 5.
>>>>>> Use of uninitialized value in list assignment at
>>>>>> /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm
>>>>>> line 891, <IN> line 5.
>>>>>> connect: Operation timed out
>>>>>> [kftp_connect_file] 350 Restarting at 654385206.
>>>>>> Send STORE or RETRIEVE to initiate transfer
>>>>>> [kftp_connect_file] 227 Entering Passive Mode
>>>>>> (193,62,203,85,220,250).
>>>>>> Tabix::tabix_query: t is not of type tabix_tPtr
>>>>>> at
>>>>>> /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/TabixParser.pm
>>>>>> line 70.
>>>>>> [kftp_connect_file] 227 Entering Passive Mode
>>>>>> (193,62,203,85,157,134).
>>>>>> [main] fail to open the data file.
>>>>>> Can't use an undefined value as an ARRAY
>>>>>> reference at
>>>>>> /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm
>>>>>> line 730.
>>>>>>
>>>>>> Usually just one of these error occur at a time.
>>>>>> I suspect it might have something to do with the
>>>>>> connection between my computer and the ensembl
>>>>>> database, as the first error at least always show
>>>>>> up in repeats when I lose my Internet connection.
>>>>>> However, are all of them caused by Internet
>>>>>> trouble? I have checked that the MySQL instance
>>>>>> is up and running, and can visit web pages
>>>>>> through a browser when some of the errors occur.
>>>>>> Could it be something on the server/database side?
>>>>>>
>>>>>> Also, if I use the GRCh37 database:
>>>>>>
>>>>>> $registry->load_registry_from_db(
>>>>>> -host => 'ensembldb.ensembl.org
>>>>>> <http://ensembldb.ensembl.org/>',
>>>>>> -user => 'anonymous',
>>>>>> -port => 3337,
>>>>>> );
>>>>>>
>>>>>> I get this warning/printout:
>>>>>> Use of uninitialized value $nums{"."} in numeric
>>>>>> comparison (<=>) at
>>>>>> /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/VCFCollection.pm
>>>>>> line 770, <IN> line 6.
>>>>>>
>>>>>> I use version 84 of the Ensembl API, OS X
>>>>>> 10.11.5, and the script I use when all of these
>>>>>> errors occur, is attached. Note that the attached
>>>>>> script by default uses hg38, but will produce the
>>>>>> last printout mentioned when switching to hg37.
>>>>>>
>>>>>> And something different I have been wondering about:
>>>>>> The VCF files that are downloaded locally
>>>>>> (ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi,
>>>>>> for instance) - should they be deleted and
>>>>>> re-downloaded from time to time to get the latest
>>>>>> 1000G data? And where exactly are the VCFs
>>>>>> downloaded from? Is it dbSNP, as indicated in the
>>>>>> file name?
>>>>>>
>>>>>> Best,
>>>>>> Johanne Håøy Horn
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list Dev at ensembl.org
>>>>>> <mailto:Dev at ensembl.org>
>>>>>> Posting guidelines and subscribe/unsubscribe
>>>>>> info: http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>>> Posting guidelines and subscribe/unsubscribe info:
>>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>>> _______________________________________________
>>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/ <http://www.ensembl.info/>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> <http://lists.ensembl.org/mailman/listinfo/dev>
>>> Ensembl Blog: http://www.ensembl.info/
>>
>> _______________________________________________
>> Dev mailing list Dev at ensembl.org <mailto:Dev at ensembl.org>
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info: http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160526/4add1df8/attachment.html>
More information about the Dev
mailing list