Skip to content

Profiling:Added I/O profiling tooling#242

Open
Labreo wants to merge 2 commits intoPecanProject:masterfrom
Labreo:perf/io-bottleneck-profiling
Open

Profiling:Added I/O profiling tooling#242
Labreo wants to merge 2 commits intoPecanProject:masterfrom
Labreo:perf/io-bottleneck-profiling

Conversation

@Labreo
Copy link
Contributor

@Labreo Labreo commented Jan 29, 2026

Summary

  • Motivation:To profile SIPNET to determine degree of I/O bottleneck
  • What: Added a new script profile_sipnet.py to generate a large climate file to test the I/O bottleneck by duplicating the data in the already present in tests/smoke/niwot/sipnet.clim to the required data years till the amount of years is greater than target year(So if 10 is the target it will generate 16 years of date by duplicating by twice since sipnet.clim has 8 years).
    Modified src/sipnet/sipnet.c: Added manual clock() timers wrapped in #ifdef PROFILING guards to measure Read, Compute, and Write phases without affecting production builds.

Note:I didnt use gprof as i initially proposed since it was unreliable on my local development environment (Apple Silicon).

Results of profiling for 10 target years(16 years)


****************************************
MODEL PERFORMANCE REPORT
****************************************
READ PHASE    : 0.007377 sec
COMPUTE PHASE : 0.003970 sec
WRITE PHASE   : 0.025216 sec
TOTAL RUNTIME : 0.036563 sec
----------------------------------
****************************************

Conclusion:The program spends 89 percent of it's time either reading or writing.

How to test

Steps to reproduce and verify the change locally:

python3 tools/profile_sipnet.py

Related issues

Checklist

  • Tests added for new features
  • Documentation updated (if applicable)
  • docs/CHANGELOG.md updated with noteworthy changes
  • Code formatted with clang-format (run git clang-format if needed)
  • Requested review from at least one CODEOWNER

For model structure changes:

  • Removed \fraktur font formatting from docs/model-structure.md for implemented features

Note: See CONTRIBUTING.md for additional guidance. This repository uses automated formatting checks; if the pre-commit hook blocks your commit, run git clang-format to format staged changes.

@mdietze
Copy link
Member

mdietze commented Jan 29, 2026

This is interesting and useful information, in that it suggests that input is much less of a bottleneck than output (which makes sense). That said, I feel like #237 may have been misleading about what the underlying question is, which is more about how long job.sh spends on I/O than how much SIPNET does. In particular. Right now SIPNET writes out it's output to a text file, which model2netCDF.SIPNET reads in and writes back out as netCDF. Informal observations have suggested that the latter step takes just as much time, if not more, than actually running the model. It was proposed to profile this overall script to assess how much benifit we'll get if we modify SIPNET to natively output netCDF. It's hard to estimate a priori how much it will change SIPNET itself (in theory the write should be quicker, but we'd probably have to implement and test to be sure), but it would completely eliminate the subsequent R-based text read and netCDF write.

@dlebauer
Copy link
Member

This looks nice and provides the information I was looking for in 237. As far as what goes into this codebase and where it belongs i.e. reviewing your implementation, I'll defer to @Alomir.

@mdietze thanks for clarifying where the bottleneck in the pecan workflow is.

@Labreo with apologies for scope creep, I've updated #237 by adding another step:

  1. Profile a PEcAn managed SIPNET run, i.e. the run.sh script generated by https://pecanproject.github.io/pecan-documentation/develop/rendered-demo-notebooks/run_pecan.html

If you would prefer to defer that step that is understandable - I can close 237 and split this into a separate issue.

@Labreo
Copy link
Contributor Author

Labreo commented Jan 30, 2026

Sure @dlebauer that is something i can definitely do and provide the results of.

@Labreo Labreo force-pushed the perf/io-bottleneck-profiling branch from 4000929 to ea3d75a Compare January 31, 2026 08:27
@Labreo
Copy link
Contributor Author

Labreo commented Jan 31, 2026

This looks nice and provides the information I was looking for in 237. As far as what goes into this codebase and where it belongs i.e. reviewing your implementation, I'll defer to @Alomir.

To be clear, I do not intend for this PR to be merged. I opened it primarily to share the methodology and reproduce the profiling results discussed in #237. (In hindsight, I should have opened this as a Draft PR apologies for the confusion).

I couldn't run the full run.sh stack locally, so I created a simplified wrapper of the PEcAn model2netcdf.R file. This wrapper replicates the exact I/O logic (parsing headers, mapping columns, and writing the full standard set of NetCDF variables) to measure conversion latency.Also since i run have an apple environment and the docker container would use emulation it would put off the true profiling results instead of running it locally.
I scaled the Niwot Ridge test data to run over 50 target years (resulting in 56 years of simulation data) using profile_sipnet.py to generate a realistic production workload.

Result:
R Wrapper I/O Time profiling over 56 years.

[1] "======================================="
[1] "Total Rows Processed:  36659"
[1] "R Wrapper I/O Time:    1.5342 sec"
[1] "======================================="

Result of profiling of model over 56 years:


--- SIPNET PERFORMANCE PROFILE ---
READ PHASE    : 0.045883 sec
COMPUTE PHASE : 0.026414 sec
WRITE PHASE   : 0.167460 sec
TOTAL RUNTIME : 0.239757 sec
----------------------------------

@Labreo
Copy link
Contributor Author

Labreo commented Feb 7, 2026

Any feedback on this?

@mdietze
Copy link
Member

mdietze commented Feb 7, 2026

Since the two tests were of different lengths (16 vs 56 years) they're not directly comparable, but to a first approximation, I wanted to make sure I've got the following approximately right 56 yr costs
Model
READ PHASE : 0.02582 sec 7%
COMPUTE PHASE : 0.013895 sec 4%
WRITE PHASE : 0.088256 sec 24%
Post-hoc Conversion
READ PHASE : 0.045883 sec 12%
COMPUTE PHASE : 0.026414 sec 7%
WRITE PHASE : 0.167460 sec 45%

All of this seems to support making #239 a higher priority -- post-hoc conversion seems to be ~2x as time consuming as the model itself, and the actual underlying model compute is a tiny fraction of the overall job.sh execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Profile SIPNET to determine degree of I/O bottleneck

3 participants