feat(tutorials): ship tutorial data + downloadable example systems#706
Conversation
|
Warning Review limit reached
More reviews will be available in 17 minutes and 18 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (22)
📝 WalkthroughWalkthroughIntroduces Changesflixopt.tutorials API, build tooling, and notebook migrations
Sequence DiagramsequenceDiagram
participant Notebook
participant flixopt.tutorials
participant _tutorial_data
participant _examples
participant pooch
participant GitHubReleases
rect rgba(100, 149, 237, 0.5)
Note over Notebook,_tutorial_data: Synthetic dataset path (notebooks 02–07)
Notebook->>flixopt.tutorials: get_data('heat_system')
flixopt.tutorials->>_tutorial_data: get_data('heat_system')
_tutorial_data->>_tutorial_data: validate DataName, call _get_heat_system_data()
_tutorial_data-->>Notebook: dict{timesteps, heat_demand, gas_price}
end
rect rgba(60, 179, 113, 0.5)
Note over Notebook,GitHubReleases: Pre-built example path (notebooks 08a–09)
Notebook->>flixopt.tutorials: load_example('district_heating')
flixopt.tutorials->>_examples: load_example('district_heating')
_examples->>_examples: validate ExampleName
_examples->>pooch: fetch('district_heating.nc')
pooch->>GitHubReleases: GET district_heating.nc (if not cached)
GitHubReleases-->>pooch: district_heating.nc
pooch-->>_examples: local cache path
_examples->>_examples: FlowSystem.from_netcdf(path)
_examples-->>Notebook: FlowSystem
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Notebook tutorial data was only reachable by cloning the repo or copying files out of GitHub, so users hit dead imports on a plain `pip install`. Add a public `fx.tutorials` API that makes every notebook standalone: - `get_data(name)` returns the synthetic datasets for notebooks 01-07, generated from numpy/pandas with no files or network. - `load_example(name)` downloads a pre-built FlowSystem for notebooks 08-09 from the project's GitHub releases (cached + hash-verified via pooch), so the heavy demandlib/pvlib/holidays generation no longer runs at user runtime. Gated behind the new `flixopt[tutorials]` extra. The dataset/example names live once each in the `DataName`/`ExampleName` Literals; lists, validation and builder dispatch all derive from them. Add `scripts/build_tutorial_datasets.py` and a manual `tutorial-data` workflow to build and upload the example artefacts to the data release, and migrate all notebooks to the new API. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bd36609 to
df34a1a
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/tutorial-data.yaml:
- Around line 26-37: In the tutorial-data.yaml workflow file, replace the three
mutable action version tags with pinned commit SHAs for security hardening:
change actions/checkout@v6 to use its full commit SHA, astral-sh/setup-uv@v7 to
use its full commit SHA, and actions/setup-python@v6 to use its full commit SHA.
Additionally, add persist-credentials: false as a parameter to the
actions/checkout step to disable credential persistence and reduce token
exposure risk for this write-scoped job.
In `@docs/notebooks/08a-aggregation.ipynb`:
- Line 63: The notebook at line 63 calls
fx.tutorials.load_example('district_heating'), which depends on data files
(district_heating.nc, registry.txt, and other referenced tutorial files) that
must be available through the tutorial-data-v1 release. Currently, these files
are returning HTTP 404 errors, meaning the release has not been published.
Before merging this PR, you must publish the tutorial-data-v1 release containing
all required tutorial data files (registry.txt, district_heating.nc,
operational.nc, simple.nc, complex.nc, seasonal_storage.nc, and multiperiod.nc)
so that the load_example function can successfully retrieve the data at runtime.
In `@flixopt/tutorials/_examples.py`:
- Around line 31-35: The DATA_RELEASE variable is currently set to
'tutorial-data-v1', but this GitHub release tag does not have the required
registry.txt asset published, causing 404 errors when load_example() tries to
download from the _DEFAULT_BASE_URL. Either publish the missing assets to the
'tutorial-data-v1' release tag on GitHub before merging, or temporarily update
the DATA_RELEASE variable to point to an existing release tag that already has
all required assets (registry.txt and data files) published.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8c4729a7-e182-409b-8af3-c2076d795936
📒 Files selected for processing (22)
.github/workflows/tutorial-data.yamldocs/notebooks/02-heat-system.ipynbdocs/notebooks/03-investment-optimization.ipynbdocs/notebooks/04-operational-constraints.ipynbdocs/notebooks/05-multi-carrier-system.ipynbdocs/notebooks/06a-time-varying-parameters.ipynbdocs/notebooks/07-scenarios-and-periods.ipynbdocs/notebooks/08a-aggregation.ipynbdocs/notebooks/08b-rolling-horizon.ipynbdocs/notebooks/08c-clustering.ipynbdocs/notebooks/08c2-clustering-storage-modes.ipynbdocs/notebooks/08d-clustering-multiperiod.ipynbdocs/notebooks/08e-clustering-internals.ipynbdocs/notebooks/08f-clustering-segmentation.ipynbdocs/notebooks/09-plotting-and-data-access.ipynbflixopt/__init__.pyflixopt/tutorials/__init__.pyflixopt/tutorials/_examples.pyflixopt/tutorials/_tutorial_data.pypyproject.tomlscripts/build_tutorial_datasets.pytests/test_tutorials.py
| - uses: actions/checkout@v6 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - uses: astral-sh/setup-uv@v7 | ||
| with: | ||
| version: "0.10.9" | ||
| enable-cache: true | ||
|
|
||
| - uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: ${{ env.PYTHON_VERSION }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Mutable action refs (should be SHA-pinned):"
rg -n '^\s*-\s*uses:\s*.+@v[0-9]+' .github/workflows/tutorial-data.yaml || true
echo
echo "Checkout hardening block:"
rg -n -A5 -B1 'actions/checkout@' .github/workflows/tutorial-data.yaml
echo
echo "Expected: no `@v`* refs, and persist-credentials: false present under checkout."Repository: flixOpt/flixopt
Length of output: 482
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Get the full workflow file to check permissions
cat -n .github/workflows/tutorial-data.yaml | head -50Repository: flixOpt/flixopt
Length of output: 2057
Pin action revisions to commit SHAs and disable checkout credential persistence in this write-scoped job.
This workflow has permissions: contents: write at the job level. The three action references use mutable @v* version tags rather than pinned commit SHAs:
actions/checkout@v6(line 26)astral-sh/setup-uv@v7(line 30)actions/setup-python@v6(line 35)
Additionally, the checkout action does not set persist-credentials: false. For a write-scoped job, pin each action to a full commit SHA and disable credential persistence to reduce supply-chain and token exposure risk.
🧰 Tools
🪛 zizmor (1.25.2)
[warning] 26-28: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
[error] 26-26: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
[error] 30-30: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
[error] 35-35: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/tutorial-data.yaml around lines 26 - 37, In the
tutorial-data.yaml workflow file, replace the three mutable action version tags
with pinned commit SHAs for security hardening: change actions/checkout@v6 to
use its full commit SHA, astral-sh/setup-uv@v7 to use its full commit SHA, and
actions/setup-python@v6 to use its full commit SHA. Additionally, add
persist-credentials: false as a parameter to the actions/checkout step to
disable credential persistence and reduce token exposure risk for this
write-scoped job.
Source: Linters/SAST tools
| "from data.generate_example_systems import create_district_heating_system\n", | ||
| "\n", | ||
| "flow_system = create_district_heating_system()\n", | ||
| "flow_system = fx.tutorials.load_example('district_heating')\n", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
BASE_URL="https://github.com/flixOpt/flixopt/releases/download/tutorial-data-v1"
files=(
"registry.txt"
"district_heating.nc"
"operational.nc"
"simple.nc"
"complex.nc"
"seasonal_storage.nc"
"multiperiod.nc"
)
for f in "${files[@]}"; do
code=$(curl -s -o /dev/null -w "%{http_code}" -L "${BASE_URL}/${f}")
echo "${f} -> HTTP ${code}"
doneRepository: flixOpt/flixopt
Length of output: 247
Tutorial release assets are missing; publish tutorial-data-v1 release before merging.
Line 63 uses fx.tutorials.load_example('district_heating'), which requires district_heating.nc and registry.txt from the tutorial-data-v1 release. Verification confirms all tutorial data files (registry.txt, district_heating.nc, operational.nc, simple.nc, complex.nc, seasonal_storage.nc, multiperiod.nc) are currently unavailable (HTTP 404). The notebook will fail at runtime if this release is not published before merge.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/notebooks/08a-aggregation.ipynb` at line 63, The notebook at line 63
calls fx.tutorials.load_example('district_heating'), which depends on data files
(district_heating.nc, registry.txt, and other referenced tutorial files) that
must be available through the tutorial-data-v1 release. Currently, these files
are returning HTTP 404 errors, meaning the release has not been published.
Before merging this PR, you must publish the tutorial-data-v1 release containing
all required tutorial data files (registry.txt, district_heating.nc,
operational.nc, simple.nc, complex.nc, seasonal_storage.nc, and multiperiod.nc)
so that the load_example function can successfully retrieve the data at runtime.
| DATA_RELEASE = 'tutorial-data-v1' | ||
|
|
||
| _BASE_URL_ENV = 'FLIXOPT_DATA_BASE_URL' # override for testing / self-hosting | ||
| _DEFAULT_BASE_URL = f'https://github.com/flixOpt/flixopt/releases/download/{DATA_RELEASE}/' | ||
| _REGISTRY_FILENAME = 'registry.txt' |
There was a problem hiding this comment.
DATA_RELEASE currently resolves to missing assets (404), breaking notebook execution.
The docs pipeline is failing on registry.txt download for tutorial-data-v1, so load_example()-backed notebooks cannot run until that release is populated. Publish assets for this tag before merge (or temporarily point to an existing populated tag).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@flixopt/tutorials/_examples.py` around lines 31 - 35, The DATA_RELEASE
variable is currently set to 'tutorial-data-v1', but this GitHub release tag
does not have the required registry.txt asset published, causing 404 errors when
load_example() tries to download from the _DEFAULT_BASE_URL. Either publish the
missing assets to the 'tutorial-data-v1' release tag on GitHub before merging,
or temporarily update the DATA_RELEASE variable to point to an existing release
tag that already has all required assets (registry.txt and data files)
published.
Source: Pipeline failures
The tutorial-data job is `contents: write` and authenticates its release steps via an explicit GH_TOKEN, so the checkout action's persisted git credentials are unnecessary. Set persist-credentials: false to reduce token exposure (per CodeRabbit review on #706). Action version tags left as-is to match the repo-wide convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@PStange THis should make usage of tutorials easier. And also expose the pre buildt systems generally |
Problem
Students following the FlixOpt notebooks hit dead imports (
from data.tutorial_data import ...) on a plainpip install flixopt— the tutorial data lives underdocs/(pruned from the package), so they had to clone the repo or copy files out of GitHub by hand.Solution
A public
fx.tutorialsAPI that makes every notebook standalone, in two tiers:get_data(name)— the synthetic datasets for notebooks 01–07. Pure numpy/pandas, no files, no network, works out of a bare install.load_example(name)— the six realistic example systems for notebooks 08–09. Pre-built once, serialized withFlowSystem.to_netcdf, hosted on a GitHub release, and downloaded + cached + hash-verified viapooch. The heavydemandlib/pvlib/holidaysprofile generation no longer runs at user runtime. Gated behind a newflixopt[tutorials]extra.list_data()/list_examples()and validation all derive from theDataName/ExampleNameLiterals, which are the single source of truth for the names (each name written exactly once; builders resolved by convention).Contents
flixopt/tutorials/—get_data,load_example,list_data,list_examples,DataName,ExampleName.pyproject.toml—tutorialsextra (pooch); added tofull/dev.scripts/build_tutorial_datasets.py— builds the.ncartefacts +registry.txt..github/workflows/tutorial-data.yaml— manual workflow that builds and uploads the assets to the data release.docs/notebooks/data/tutorial_data.pyremoved.tests/test_tutorials.py.Verification
tests/test_tutorials.pypasses; ruff/pre-commit clean.tutorial-data-v1release: all six examples download, hash-verify, deserialize, and cache.Rollout — done ✅
The
tutorial-data-v1release has been published with all seven assets (registry.txt+ six.nc), so the 08–09 notebooks download successfully andBuild documentationis green.Review follow-ups (CodeRabbit)
tutorial-data-v1assets (Critical/Major) — resolved by publishing the release above; those comments reflected the pre-publish state.persist-credentials: falseto the checkout step (write-scoped job authenticates releases via explicitGH_TOKEN). Action version tags left as@v*to match the repo-wide convention; SHA-pinning is better handled as a separate repo-wide PR.CI note
The only remaining red check is the pre-existing flaky test
test_heatmap_reshape::test_with_irregular_data(module-levelnp.random+pytest-xdistordering, ~0.7% fail rate) — unrelated to this PR. Fixed in #707; once that merges, rebasing this branch turns CI green.Open question
get_data(in-memory) vsload_example(IO/network) is an intentional naming asymmetry. Happy to switch toload_data/load_examplesymmetry if preferred.🤖 Generated with Claude Code