[ensembl-dev] FTP download issues with Ensembl Rel 59

volker.weinberger at novartis.com volker.weinberger at novartis.com
Thu Aug 26 17:27:48 BST 2010


Hi,

while data downloads of the Ensembl databases used to be a rather fast and 
painless routine process up to Rel 58 for us, we're really facing 
significant issues with Rel 59. In fact I still havn't managed to get all 
files correctly, some of the very large ones I've transferred countless 
times so far without success.

Of course this could be due to several factors, including internal ones. 
I'm just curious if we're the only database users affected.

On the Ensembl end, the obvious change was that large files are no longer 
provided in parts, but as complete files, which makes some of those quite 
large, e.g. up to 14GB in compara. We typically don't have issues with 
smaller files but those large files have a rather high potential of 
causing trouble. Our connection to the Ensembl FTP server is not really 
stable or fast, so interruptions or timeouts are rather common and more 
likely with large files.

In the past, using lftp with -mirror option resulted in rather reliable 
downloads, while now with the large files we typically got corrupt 
results, and in some cases downloads that never ended. I assume this was 
some effect of the -c continuation flag. I did get the impression that 
interrupted and continued downloads typically lead to corrupt results 
(checksums not matching, file sizes exceeding original files sizes, 
sometimes several fold).

Well I then tried several other options that can go though our FTP 
firewall, including wgetPrisma, ncftp, manual ftp, etc, basically all with 
the same issues and results: in some exceptional lucky cases a big file 
would download correctly (matching checksums) but typically it wouldn't. 
(Plain wget and our HTTP proxy wasn't an option as there's a 2GB filesize 
limit on that one).

While now there's only a single file missing (genomic_align from compara) 
and I have some hope to get a good copy of this some day, this is not a 
sustainable download approach and has wasted a lot of bandwidth.  I'm 
looking for ways to improve it.

Best regards,

Volker


Volker Weinberger
Novartis Institutes for BioMedical Research
NIBR Information Technologies and Automation Services (NITAS)
Application Manager
Basel, Switzerland, WSJ-310.5.17
Phone: +41 61 32 41467
Email : volker.weinberger at novartis.com





_________________________

CONFIDENTIALITY NOTICE

The information contained in this e-mail message is intended only for the 
exclusive use of the individual or entity named above and may contain 
information that is privileged, confidential or exempt from disclosure 
under applicable law. If the reader of this message is not the intended 
recipient, or the employee or agent responsible for delivery of the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please 
notify the sender immediately by e-mail and delete the material from any 
computer.  Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20100826/f0eedd77/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1529 bytes
Desc: not available
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20100826/f0eedd77/attachment.gif>


More information about the Dev mailing list