[ensembl-dev] FTP download issues with Ensembl Rel 59
volker.weinberger at novartis.com
volker.weinberger at novartis.com
Thu Aug 26 17:27:48 BST 2010
Hi,
while data downloads of the Ensembl databases used to be a rather fast and
painless routine process up to Rel 58 for us, we're really facing
significant issues with Rel 59. In fact I still havn't managed to get all
files correctly, some of the very large ones I've transferred countless
times so far without success.
Of course this could be due to several factors, including internal ones.
I'm just curious if we're the only database users affected.
On the Ensembl end, the obvious change was that large files are no longer
provided in parts, but as complete files, which makes some of those quite
large, e.g. up to 14GB in compara. We typically don't have issues with
smaller files but those large files have a rather high potential of
causing trouble. Our connection to the Ensembl FTP server is not really
stable or fast, so interruptions or timeouts are rather common and more
likely with large files.
In the past, using lftp with -mirror option resulted in rather reliable
downloads, while now with the large files we typically got corrupt
results, and in some cases downloads that never ended. I assume this was
some effect of the -c continuation flag. I did get the impression that
interrupted and continued downloads typically lead to corrupt results
(checksums not matching, file sizes exceeding original files sizes,
sometimes several fold).
Well I then tried several other options that can go though our FTP
firewall, including wgetPrisma, ncftp, manual ftp, etc, basically all with
the same issues and results: in some exceptional lucky cases a big file
would download correctly (matching checksums) but typically it wouldn't.
(Plain wget and our HTTP proxy wasn't an option as there's a 2GB filesize
limit on that one).
While now there's only a single file missing (genomic_align from compara)
and I have some hope to get a good copy of this some day, this is not a
sustainable download approach and has wasted a lot of bandwidth. I'm
looking for ways to improve it.
Best regards,
Volker
Volker Weinberger
Novartis Institutes for BioMedical Research
NIBR Information Technologies and Automation Services (NITAS)
Application Manager
Basel, Switzerland, WSJ-310.5.17
Phone: +41 61 32 41467
Email : volker.weinberger at novartis.com
_________________________
CONFIDENTIALITY NOTICE
The information contained in this e-mail message is intended only for the
exclusive use of the individual or entity named above and may contain
information that is privileged, confidential or exempt from disclosure
under applicable law. If the reader of this message is not the intended
recipient, or the employee or agent responsible for delivery of the
message to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication is strictly
prohibited. If you have received this communication in error, please
notify the sender immediately by e-mail and delete the material from any
computer. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20100826/f0eedd77/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1529 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20100826/f0eedd77/attachment.gif>
More information about the Dev
mailing list