add section on network access, with code example for parallel download

JostMigenda · Robadob · commit 5f67872d1fb0 · 2025-03-09T08:31:30.000Z
diff --git a/episodes/optimisation-memory.md b/episodes/optimisation-memory.md
@@ -156,6 +156,68 @@ Repeated runs show some noise to the timing, however the slowdown is consistentl
 You might not even be reading 1000 different files. You could be reading the same file multiple times, rather than reading it once and retaining it in memory during execution.
 An even greater overhead would apply.
 
+## Accessing the network
+
+When transfering files over a network, similar effects apply. There is a fixed overhead for every file transfer (no matter how big the file), so downloading many small files will be slower than downloading a single large file of the same total size.
+
+Because of this overhead, downloading many small files often does not use all the available bandwidth. It may be possible to speed things up by parallelising downloads.
+
+```Python
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from timeit import timeit
+import requests  # install with `pip install requests`
+
+
+def download_file(url, filename):
+    response = requests.get(url)
+    with open(filename, 'wb') as f:
+        f.write(response.content)
+    return filename
+
+downloaded_files = []
+
+def sequentialDownload():
+    for mass in range(10, 20):
+        url = f"https://github.com/SNEWS2/snewpy-models-ccsn/raw/refs/heads/main/models/Warren_2020/stir_a1.23/stir_multimessenger_a1.23_m{mass}.0.h5"
+        f = download_file(url, f"seq_{mass}.h5")
+        downloaded_files.append(f)
+
+def parallelDownload():
+    pool = ThreadPoolExecutor(max_workers=6)
+    jobs = []
+    for mass in range(10, 20):
+        url = f"https://github.com/SNEWS2/snewpy-models-ccsn/raw/refs/heads/main/models/Warren_2020/stir_a1.23/stir_multimessenger_a1.23_m{mass}.0.h5"
+        local_filename = f"par_{mass}.h5"
+        jobs.append(pool.submit(download_file, url, local_filename))
+
+    for result in as_completed(jobs):        
+        if result.exception() is None:
+            # handle return values of the parallelised function
+            f = result.result()
+            downloaded_files.append(f)
+        else:
+            # handle errors
+            print(result.exception())
+
+    pool.shutdown(wait=False)
+
+
+print(f"sequentialDownload: {timeit(sequentialDownload, globals=globals(), number=1):.3f} s")
+print(downloaded_files)
+downloaded_files = []
+print(f"parallelDownload: {timeit(parallelDownload, globals=globals(), number=1):.3f} s")
+print(downloaded_files)
+```
+
+Depending on your internet connection, results may vary significantly, but the parallel download will usually be quite a bit faster. Note also that the order in which the parallel downloads finish will vary.
+
+```output
+sequentialDownload: 3.225 s
+['seq_10.h5', 'seq_11.h5', 'seq_12.h5', 'seq_13.h5', 'seq_14.h5', 'seq_15.h5', 'seq_16.h5', 'seq_17.h5', 'seq_18.h5', 'seq_19.h5']
+parallelDownload: 0.285 s
+['par_11.h5', 'par_12.h5', 'par_15.h5', 'par_13.h5', 'par_10.h5', 'par_14.h5', 'par_16.h5', 'par_19.h5', 'par_17.h5', 'par_18.h5']
+```
+
 ## Latency Overview
 
 Latency can have a big impact on the speed that a program executes, the below graph demonstrates this. Note the log scale!