[ensembl-dev] Error messages when using the Variation API

Will McLaren wm2 at ebi.ac.uk
Mon May 23 15:45:37 BST 2016


Hello,

The files there are symbolic links to another directory - it's possible
your FTP client is not following these links.

The "real" files are here:

ftp://ftp.ensembl.org/pub/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/

Try downloading from that path instead.

Regards

Will

On 23 May 2016 at 15:31, Johanne Håøy Horn <johannhh at ifi.uio.no> wrote:

> Hi,
>
> I got the same error message with the full path.
>
> I think the problem is with the hg38 .gz and .gz.tbi files. I went back to
> the URL you gave me:
> ftp://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/
> But when I click any of the files, I get the error message "The operation
> can’t be completed because the original item for <file name> can’t be
> found». This is not a problem for the hg19 ftp connection, whose files I
> can open just fine.
> I have restarted my computer (Max OS X 10.11.5) and remounted the
> connection several times. I log on using guest.
>
> So the files I downloaded were not existing after all, and the error
> message were correct.
>
> Do you have any suggestions as to how I can mount it correctly?
>
> Best,
> Johanne
>
> 23. mai 2016 kl. 15.31 skrev Will McLaren <wm2 at ebi.ac.uk>:
>
> I think possibly Perl doesn't like using "~" to represent your home
> directory - try replacing it with the full path, or possibly $ENV{HOME}
>
> Will
>
> On 23 May 2016 at 13:59, Johanne Håøy Horn <johannhh at ifi.uio.no> wrote:
>
>> Thank you for your wonderful support!
>>
>> I tried now with the following JSON struct:
>> {
>>       "id": "1000genomes_phase3",
>>       "species": "homo_sapiens",
>>       "assembly": "GRCh38",
>>       "type": "local",
>>       "strict_name_match": 1,
>>       "filename_template":
>> "~/src/ensembl-vcf/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
>>       "chromosomes": [
>>         "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
>> "13", "14",
>>         "15", "16", "17", "18", "19", "20", "21", "22", "X", "Y"
>>       ],
>>       "sample_prefix": "1000GENOMES:phase_3:"
>>     },
>>
>> I got this error message:
>> MSG: ERROR: VCF file
>> ~/src/ensembl-vcf/ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz
>> not found
>>
>> Should the tbi files be where I call the script? Or is it something else
>> I am doing wrong?
>>
>> Best,
>> Johanne
>>
>> 23. mai 2016 kl. 14.04 skrev Will McLaren <wm2 at ebi.ac.uk>:
>>
>> Hi Johanne,
>>
>> You need the filename part of the template too, so:
>>
>>  "filename_template":
>> "~/src/ensembl-vcf/ALL.chr###CHR###.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz",
>>
>> Regards
>>
>> Will
>>
>> On 23 May 2016 at 12:57, Johanne Håøy Horn <johannhh at ifi.uio.no> wrote:
>>
>>> Hello again!
>>>
>>> I tried set the following in the JSON file:
>>>
>>>  {
>>>       "id": "1000genomes_phase3",
>>>       "species": "homo_sapiens",
>>>       "assembly": "GRCh38",
>>>       "type": "local",
>>>       "strict_name_match": 1,
>>>       "filename_template": "~/src/ensembl-vcf/",
>>>       "chromosomes": [
>>>         "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
>>> "13", "14",
>>>         "15", "16", "17", "18", "19", "20", "21", "22", "X", "Y"
>>>       ],
>>>       "sample_prefix": "1000GENOMES:phase_3:"
>>>     },
>>>
>>> But I get this error message:
>>> MSG: ERROR: VCF file ~/src/ensembl-vcf/ not found
>>>
>>> I downloaded all the hg38 files you linked to in the
>>> folder ~/src/ensembl-vcf/. When  you say that I need to change
>>> filename_template to the path where the files were downloaded, is it the
>>> full path of all the 48 files rather than the path to the folder they are
>>> in?
>>>
>>> Best,
>>> Johanne
>>>
>>> 23. mai 2016 kl. 11.57 skrev Will McLaren <wm2 at ebi.ac.uk>:
>>>
>>> Hi Johanne,
>>>
>>> It looks like the API is intermittently losing connection to the remote
>>> VCF files hosted on our FTP site.
>>>
>>> You can bypass this connection by downloading the files to your local
>>> machine:
>>>
>>> GRCh38: ftp://ftp.ensembl.org/pub/variation_genotype/homo_sapiens/
>>> GRCh37:
>>> ftp://ftp.ensembl.org/pub/grch37/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/
>>>
>>> You will then need to edit
>>> [module_path]/ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json,
>>> changing the "filename_template" entry to the path where you downloaded the
>>> files, and "type" from "remote" to "local".
>>>
>>> Regarding the warning message, this should not affect your analyses in
>>> any way, but I have put in a fix on release/84 of ensembl-variation to
>>> suppress it.
>>>
>>> Regards
>>>
>>> Will McLaren
>>> Ensembl Variation
>>>
>>> On 21 May 2016 at 12:57, Johanne Håøy Horn <johannhh at ifi.uio.no> wrote:
>>>
>>>> Dear ensembl dev team,
>>>>
>>>> I have been using your variation API for some time now, and get a range
>>>> of errors from time to time, without knowing exactly why. It is not because
>>>> of the scripts, I think, as the same script producing the error can work
>>>> just fine if I run it again.
>>>>
>>>> The different error messages are:
>>>> /Parser/BaseVCF4.pm line 891, <IN> line 5.
>>>> Use of uninitialized value in list assignment at
>>>> /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm
>>>> line 891, <IN> line 5.
>>>> connect: Operation timed out
>>>> [kftp_connect_file] 350 Restarting at 654385206. Send STORE or RETRIEVE
>>>> to initiate transfer
>>>>
>>>> [kftp_connect_file] 227 Entering Passive Mode (193,62,203,85,220,250).
>>>> Tabix::tabix_query: t is not of type tabix_tPtr at
>>>> /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/TabixParser.pm line 70.
>>>>
>>>> [kftp_connect_file] 227 Entering Passive Mode (193,62,203,85,157,134).
>>>> [main] fail to open the data file.
>>>> Can't use an undefined value as an ARRAY reference at
>>>> /Users/Johanne/src/ensembl-io/modules/Bio/EnsEMBL/IO/Parser/BaseVCF4.pm
>>>> line 730.
>>>>
>>>> Usually just one of these error occur at a time. I suspect it might
>>>> have something to do with the connection between my computer and the
>>>> ensembl database, as the first error at least always show up in repeats
>>>> when I lose my Internet connection. However, are all of them caused by
>>>> Internet trouble? I have checked that the MySQL instance is up and running,
>>>> and can visit web pages through a browser when some of the errors occur.
>>>> Could it be something on the server/database side?
>>>>
>>>> Also, if I use the GRCh37 database:
>>>>
>>>> $registry->load_registry_from_db(
>>>>   -host => 'ensembldb.ensembl.org',
>>>>   -user => 'anonymous',
>>>>   -port => 3337,
>>>> );
>>>>
>>>> I get this warning/printout:
>>>> Use of uninitialized value $nums{"."} in numeric comparison (<=>) at
>>>> /Users/Johanne/src/ensembl-variation/modules/Bio/EnsEMBL/Variation/VCFCollection.pm
>>>> line 770, <IN> line 6.
>>>>
>>>> I use version 84 of the Ensembl API, OS X 10.11.5, and the script I use
>>>> when all of these errors occur, is attached. Note that the attached script
>>>> by default uses hg38, but will produce the last printout mentioned when
>>>> switching to hg37.
>>>>
>>>> And something different I have been wondering about:
>>>> The VCF files that are downloaded locally
>>>> (ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP.vcf.gz.tbi,
>>>> for instance) - should they be deleted and re-downloaded from time to time
>>>> to get the latest 1000G data? And where exactly are the VCFs downloaded
>>>> from? Is it dbSNP, as indicated in the file name?
>>>>
>>>> Best,
>>>> Johanne Håøy Horn
>>>>
>>>> _______________________________________________
>>>> Dev mailing list    Dev at ensembl.org
>>>> Posting guidelines and subscribe/unsubscribe info:
>>>> http://lists.ensembl.org/mailman/listinfo/dev
>>>> Ensembl Blog: http://www.ensembl.info/
>>>>
>>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>>>
>>> _______________________________________________
>>> Dev mailing list    Dev at ensembl.org
>>> Posting guidelines and subscribe/unsubscribe info:
>>> http://lists.ensembl.org/mailman/listinfo/dev
>>> Ensembl Blog: http://www.ensembl.info/
>>>
>>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
>>
>> _______________________________________________
>> Dev mailing list    Dev at ensembl.org
>> Posting guidelines and subscribe/unsubscribe info:
>> http://lists.ensembl.org/mailman/listinfo/dev
>> Ensembl Blog: http://www.ensembl.info/
>>
>>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
>
> _______________________________________________
> Dev mailing list    Dev at ensembl.org
> Posting guidelines and subscribe/unsubscribe info:
> http://lists.ensembl.org/mailman/listinfo/dev
> Ensembl Blog: http://www.ensembl.info/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20160523/417f584c/attachment.html>


More information about the Dev mailing list