[ensembl-dev] Error messages when using the Variation API

Will McLaren wm2 at ebi.ac.uk
Mon May 23 14:31:17 BST 2016


I think possibly Perl doesn't like using "~" to represent your home
directory - try replacing it with the full path, or possibly $ENV{HOME}

Will

On 23 May 2016 at 13:59, Johanne Håøy Horn <johannhh at ifi.uio.no> wrote:

> Thank you for your wonderful support!
>
> I tried now with the following JSON struct:
> {
>       "id": "1000genomes_phase3",
>       "species": "homo_sapiens",
>       "assembly": "GRCh38",
>       "type": "local",
>       "strict_name_match": 1,
>       "filename_template":
> "~/src/ensembl-vcf/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
>       "chromosomes": [
>         "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
> "13", "14",
>         "15", "16", "17", "18", "19", "20", "21", "22", "X", "Y"
>       ],
>       "sample_prefix": "1000GENOMES:phase_3:"
>     },
>
> I got this error message:
> MSG: ERROR: VCF file
> ~/src/ensembl-vcf/ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz
> not found
>
> Should the tbi files be where I call the script? Or is it something else I
> am doing wrong?
>
> Best,
> Johanne
>
> 23. mai 2016 kl. 14.04 skrev Will McLaren <wm2 at ebi.ac.uk>:
>
> Hi Johanne,
>
> You need the filename part of the template too, so:
>
>  "filename_template":
> "~/src/ensembl-vcf/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
>
> Regards
>
> Will
>
> On 23 May 2016 at 12:57, Johanne Håøy Horn <johannhh at ifi.uio.no> wrote:
>
>> Hello again!
>>
>> I tried set the following in the JSON file:
>>
>>  {
>>       "id": "1000genomes_phase3",
>>       "species": "homo_sapiens",
>>       "assembly": "GRCh38",
>>       "type": "local",
>>       "strict_name_match": 1,
>>       "filename_template": "~/src/ensembl-vcf/",
>>       "chromosomes": [
>>         "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
>> "13", "14",
>>         "15", "16", "17", "18", "19", "20", "21", "22", "X", "Y"
>>       ],
>>       "sample_prefix": "1000GENOMES:phase_3:"
>>     },
>>
>> But I get this error message:
>> MSG: ERROR: VCF file ~/src/ensembl-vcf/ not found
>>
>> I downloaded all the hg38 files you linked to in the
>> folder ~/src/ensembl-vcf/. When  you say that I need to change
>> filename_template to the path where the files were downloaded, is it the
>> full path of all the 48 files rather than the path to the folder they are
>> in?
>>
>> Best,
>> Johanne
>>
>> 23. mai 2016 kl. 11.57 skrev Will McLaren <wm2 at ebi.ac.uk>:
>>
>> Hi Johanne,
>>
>> It looks like the API is intermittently losing connection to the remote
>> VCF files hosted on our FTP site.
>>
>> You can bypass this connection by downloading the files to your local
>> machine:
>>
>> GRCh38: ftp://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/
>> GRCh37:
>> ftp://ftp.ensembl.org/pub/grch37/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/
>>
>> You will then need to edit
>> [module_path]/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json,
>> changing the "filename_template" entry to the path where you downloaded the
>> files, and "type" from "remote" to "local".
>>
>> Regarding the warning message, this should not affect your analyses in
>> any way, but I have put in a fix on release/84 of ensembl-variation to
>> suppress it.
>>
>> Regards
>>
>> Will McLaren
>> Ensembl Variation
>>
>> On 21 May 2016 at 12:57, Johanne Håøy Horn <johannhh at ifi.uio.no> wrote:
>>
>>> Dear ensembl dev team,
>>>
>>> I have been using your variation API for some time now, and get a range
>>> of errors from time to time, without knowing exactly why. It is not because
>>> of the scripts, I think, as the same script producing the error can work
>>> just fine if I run it again.
>>>
>>> The different error messages are:
>>> /Parser/BaseVCF4.pm line 891, <IN> line 5.
>>> Use of uninitialized value in list assignment at
>>> /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm
>>> line 891, <IN> line 5.
>>> connect: Operation timed out
>>> [kftp_connect_file] 350 Restarting at 654385206. Send STORE or RETRIEVE
>>> to initiate transfer
>>>
>>> [kftp_connect_file] 227 Entering Passive Mode (193,62,203,85,220,250).
>>> Tabix::tabix_query: t is not of type tabix_tPtr at
>>> /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/TabixParser.pm line 70.
>>>
>>> [kftp_connect_file] 227 Entering Passive Mode (193,62,203,85,157,134).
>>> [main] fail to open the data file.
>>> Can't use an undefined value as an ARRAY reference at
>>> /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm
>>> line 730.
>>>
>>> Usually just one of these error occur at a time. I suspect it might have
>>> something to do with the connection between my computer and the ensembl
>>> database, as the first error at least always show up in repeats when I lose
>>> my Internet connection. However, are all of them caused by Internet
>>> trouble? I have checked that the MySQL instance is up and running, and can
>>> visit web pages through a browser when some of the errors occur. Could it
>>> be something on the server/database side?
>>>
>>> Also, if I use the GRCh37 database:
>>>
>>> $registry->load_registry_from_db(
>>>   -host => 'ensembldb.ensembl.org',
>>>   -user => 'anonymous',
>>>   -port => 3337,
>>> );
>>>
>>> I get this warning/printout:
>>> Use of uninitialized value $nums{"."} in numeric comparison (<=>) at
>>> /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/VCFCollection.pm
>>> line 770, <IN> line 6.
>>>
>>> I use version 84 of the Ensembl API, OS X 10.11.5, and the script I use
>>> when all of these errors occur, is attached. Note that the attached script
>>> by default uses hg38, but will produce the last printout mentioned when
>>> switching to hg37.
>>>
>>> And something different I have been wondering about:
>>> The VCF files that are downloaded locally
>>> (ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi,
>>> for instance) - should they be deleted and re-downloaded from time to time
>>> to get the latest 1000G data? And where exactly are the VCFs downloaded
>>> from? Is it dbSNP, as indicated in the file name?
>>>
>>> Best,
>>> Johanne Håøy Horn
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160523/572d23b3/attachment.html>


More information about the Dev mailing list