Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/posts/gpu-pipeline/index.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a couple of broken links (found using lychee):

$ lychee src/posts/gpu-pipeline/index.md
  39/39 ━━━━━━━━━━━━━━━━━━━━ Finished extracting links                                         Issues found in 1 input. Find details below.

[src/posts/gpu-pipeline/index.md]:
     [404] https://github.com/pangeo-data/ncar-hackathon-xarray-on-gpus/blob/main/benchmark/era5_zarr_benchmark.py | Rejected status code (this depends on your "accept" configuration): Not Found
     [404] https://github.com/pangeo-data/ncar-hackathon-xarray-on-gpus/tree/main/zarr_DALI | Rejected status code (this depends on your "accept" configuration): Not Found
   [ERROR] https://www.openhackathons.org/s/ | Network error: error sending request for url (https://www.openhackathons.org/s/) Maybe a certificate error?
   [ERROR] https://www.openhackathons.org/s/siteevent/a0CUP00000rwYYZ2A2/se000355 | Network error: error sending request for url (https://www.openhackathons.org/s/siteevent/a0CUP00000rwYYZ2A2/se000355) Maybe a certificate error?

🔍 39 Total (in 5s) ✅ 35 OK 🚫 4 Errors
   [WARN ] There were issues with GitHub URLs. You could try setting a GitHub token and running lychee again.

The two 404s are pointing to the old scripts. They should be changed as so:

Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ PyTorch’s `DataLoader` includes options like `num_workers`, `pin_memory`, and

## Hackathon: Strategies Explored!

During the hackathon, we tested the following strategies to improve the data loading performance. In the end, we were able to achieve
During the hackathon, we tested the following strategies to improve the data loading performance. In the end, we were able to achieve at least ~17x improvement on 1 GPU in training throughput by optimizing data loading and preprocessing steps.

### Step 1: Optimized Chunking & Compression

Expand Down