Skip to content

Commit 5f67872

Browse files
JostMigendaRobadob
authored andcommitted
add section on network access, with code example for parallel download
1 parent 81dfe62 commit 5f67872

File tree

1 file changed

+62
-0
lines changed

1 file changed

+62
-0
lines changed

episodes/optimisation-memory.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,68 @@ Repeated runs show some noise to the timing, however the slowdown is consistentl
156156
You might not even be reading 1000 different files. You could be reading the same file multiple times, rather than reading it once and retaining it in memory during execution.
157157
An even greater overhead would apply.
158158

159+
## Accessing the network
160+
161+
When transfering files over a network, similar effects apply. There is a fixed overhead for every file transfer (no matter how big the file), so downloading many small files will be slower than downloading a single large file of the same total size.
162+
163+
Because of this overhead, downloading many small files often does not use all the available bandwidth. It may be possible to speed things up by parallelising downloads.
164+
165+
```Python
166+
from concurrent.futures import ThreadPoolExecutor, as_completed
167+
from timeit import timeit
168+
import requests # install with `pip install requests`
169+
170+
171+
def download_file(url, filename):
172+
response = requests.get(url)
173+
with open(filename, 'wb') as f:
174+
f.write(response.content)
175+
return filename
176+
177+
downloaded_files = []
178+
179+
def sequentialDownload():
180+
for mass in range(10, 20):
181+
url = f"https://github.com/SNEWS2/snewpy-models-ccsn/raw/refs/heads/main/models/Warren_2020/stir_a1.23/stir_multimessenger_a1.23_m{mass}.0.h5"
182+
f = download_file(url, f"seq_{mass}.h5")
183+
downloaded_files.append(f)
184+
185+
def parallelDownload():
186+
pool = ThreadPoolExecutor(max_workers=6)
187+
jobs = []
188+
for mass in range(10, 20):
189+
url = f"https://github.com/SNEWS2/snewpy-models-ccsn/raw/refs/heads/main/models/Warren_2020/stir_a1.23/stir_multimessenger_a1.23_m{mass}.0.h5"
190+
local_filename = f"par_{mass}.h5"
191+
jobs.append(pool.submit(download_file, url, local_filename))
192+
193+
for result in as_completed(jobs):
194+
if result.exception() is None:
195+
# handle return values of the parallelised function
196+
f = result.result()
197+
downloaded_files.append(f)
198+
else:
199+
# handle errors
200+
print(result.exception())
201+
202+
pool.shutdown(wait=False)
203+
204+
205+
print(f"sequentialDownload: {timeit(sequentialDownload, globals=globals(), number=1):.3f} s")
206+
print(downloaded_files)
207+
downloaded_files = []
208+
print(f"parallelDownload: {timeit(parallelDownload, globals=globals(), number=1):.3f} s")
209+
print(downloaded_files)
210+
```
211+
212+
Depending on your internet connection, results may vary significantly, but the parallel download will usually be quite a bit faster. Note also that the order in which the parallel downloads finish will vary.
213+
214+
```output
215+
sequentialDownload: 3.225 s
216+
['seq_10.h5', 'seq_11.h5', 'seq_12.h5', 'seq_13.h5', 'seq_14.h5', 'seq_15.h5', 'seq_16.h5', 'seq_17.h5', 'seq_18.h5', 'seq_19.h5']
217+
parallelDownload: 0.285 s
218+
['par_11.h5', 'par_12.h5', 'par_15.h5', 'par_13.h5', 'par_10.h5', 'par_14.h5', 'par_16.h5', 'par_19.h5', 'par_17.h5', 'par_18.h5']
219+
```
220+
159221
## Latency Overview
160222

161223
Latency can have a big impact on the speed that a program executes, the below graph demonstrates this. Note the log scale!

0 commit comments

Comments
 (0)