[ensembl-dev] Downloading lots of files from ensembl

Matthew Gerring Matthew.Gerring at jax.org
Wed Oct 26 15:03:54 BST 2022


Hello,

I am downloading lots of (wonderful) ensembl data this morning (EDT), running commands like:

wget -r -l 10 -c --retry-connrefused --tries=0 --timeout=5 -A bed.gz ftp://ftp.ensembl.org/pub/release-$ENSEMBL/regulation/homo_sapiens/Peaks/ &

Where $ENSEMBL is the version.

During this process I often get a few refused requests, I assume because ensembl limits bandwidth, example:

--2022-10-26 13:54:19--  ftp://ftp.ensembl.org/pub/release-107/regulation/mus_musculus/Peaks/MEL_cell_line/EP300/mus_musculus.GRCm39.MEL_cell_line.EP300.SWEmbl_R0005.peaks.20201021.bed.gz
  (try: 4) => 'ftp.ensembl.org/pub/release-107/regulation/mus_musculus/Peaks/MEL_cell_line/EP300/mus_musculus.GRCm39.MEL_cell_line.EP300.SWEmbl_R0005.peaks.20201021.bed.gz'
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.139|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/release-107/regulation/mus_musculus/Peaks/MEL_cell_line/EP300 ... done.
==> PASV ... couldn't connect to 193.62.193.139 port 37209: Connection refused
Retrying.

--2022-10-26 13:54:24--  ftp://ftp.ensembl.org/pub/release-107/regulation/mus_musculus/Peaks/MEL_cell_line/EP300/mus_musculus.GRCm39.MEL_cell_line.EP300.SWEmbl_R0005.peaks.20201021.bed.gz
  (try: 5) => 'ftp.ensembl.org/pub/release-107/regulation/mus_musculus/Peaks/MEL_cell_line/EP300/mus_musculus.GRCm39.MEL_cell_line.EP300.SWEmbl_R0005.peaks.20201021.bed.gz'
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.139|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/release-107/regulation/mus_musculus/Peaks/MEL_cell_line/EP300 ... done.
==> PASV ... couldn't connect to 193.62.193.139 port 64353: Connection refused
Retrying.

--2022-10-26 13:54:31--  ftp://ftp.ensembl.org/pub/release-107/regulation/mus_musculus/Peaks/MEL_cell_line/EP300/mus_musculus.GRCm39.MEL_cell_line.EP300.SWEmbl_R0005.peaks.20201021.bed.gz
  (try: 6) => 'ftp.ensembl.org/pub/release-107/regulation/mus_musculus/Peaks/MEL_cell_line/EP300/mus_musculus.GRCm39.MEL_cell_line.EP300.SWEmbl_R0005.peaks.20201021.bed.gz'
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.139|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /pub/release-107/regulation/mus_musculus/Peaks/MEL_cell_line/EP300 ... done.
==> PASV ... done.    ==> RETR mus_musculus.GRCm39.MEL_cell_line.EP300.SWEmbl_R0005.peaks.20201021.bed.gz ... done.
Length: 262591 (256K)

ftp.ensembl.org/pub/release-107/regulati 100%[===============================================================================>] 256.44K   651KB/s    in 0.4s


Is my doing this unfriendly to the ensembl ftp server(s)? Should I be reducing requests? (How?) Can I get the data over without all the “Connection refused” in between each request? If this is okay and just how it works then you need take no action I think, just confirming it is okay to transfer data. In this case regulation peak files.

Thanks,

Matt Gerring
Jackson Laboratory
---

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ensembl.org/pipermail/dev_ensembl.org/attachments/20221026/e3ba4c6d/attachment-0001.html>


More information about the Dev mailing list