Carry-over states between episodes, instead of resetting everything to 0. #5

enlorenz · 2026-01-15T16:15:45Z

Note: Not tested "in production" yet. But Continuity checks between consecutive episodes have passed so far.

… for env. test_sanity_env.py is a standalone sanity/check harness that builds the env like train.py and runs invariants, gym compliance, and determinism checks with CLI options and optional external data. test_env.py is a minimal smoke test that instantiates the env with fixed weights and runs random actions, printing rewards; no invariants, determinism, or CLI control.

…raining. Add plotter.py with reusable plotting helpers for per-hour dashboards and cumulative savings analysis. Update train.py to import plotter utilities and add CLI flags to drive dashboard/cumulative plots. Hook plotting into evaluation and training flows, including saving figures to session plots. Addendum: Added plotter logic in train_iter.py

coderabbitai · 2026-01-15T16:20:16Z

📝 Walkthrough

Walkthrough

Adds episode-scoped metrics and separates them from cumulative metrics, introduces a carry_over_state mode for timeline continuity, adds a plotting module for per-hour dashboards and cumulative savings, extends training scripts to optionally emit dashboards, and adds a standalone environment sanity/test script.

Changes

Cohort / File(s)	Summary
Metrics & Episode Tracking `src/metrics_tracker.py`, `src/baseline.py`, `src/job_management.py`	Introduces cumulative vs. episode metric separation (new `reset_cumulative_metrics` and episode counters), updates recording to episode-scoped fields, and adds guarded updates to increment `episode_*` metrics alongside existing cumulative metrics.
Environment Core `src/environment.py`	Adds `carry_over_state` flag and `reset_timeline_state()`; branches reset/step logic for carry-over vs fresh episodes; initializes/propagates per-episode and baseline state; tracks and exposes many `episode_*` metrics; enhances step outputs with per-step info.
Visualization & Analysis `src/plotter.py`	New plotting module with `_as_series`, `_compute_cumulative_savings`, `plot_dashboard()` (hourly multi-panel dashboard) and `plot_cumulative_savings()` (cumulative/monthly savings vs baselines).
Training & Integration `train.py`, `train_iter.py`	Adds CLI flags `--plot-dashboard`/`--dashboard-hours`, integrates dashboard/savings plotting into eval/train flows, adjusts PPO n_steps/batch_size defaults, and adds runtime timing/logging.
Testing & Validation `test/test_sanity_env.py`	New standalone sanity/test script: price loading, invariant and copy checks, determinism and carry-over tests, CLI, and main orchestration for environment validation.

Sequence Diagram(s)

sequenceDiagram
    participant Runner
    participant Env as ComputeClusterEnv
    participant JobMgmt as JobManagement
    participant Metrics as MetricsTracker
    participant Plotter

    Runner->>Env: step(action)
    Env->>JobMgmt: assign_jobs_to_available_nodes()
    JobMgmt->>Metrics: update completion/drop (cumulative + episode)
    JobMgmt-->>Env: job events (assigned/completed/dropped)
    Env->>Metrics: update episode cost, queue/price metrics
    Metrics-->>Env: metrics snapshot
    Env-->>Runner: obs, reward, done

    alt Episode end
        Runner->>Metrics: record_episode_completion()
        Metrics-->>Runner: episode_data (episode_* fields)
        Runner->>Plotter: plot_dashboard(env, episode_costs)
        Plotter-->>Runner: dashboard saved
        Runner->>Plotter: plot_cumulative_savings(env, episode_costs)
        Plotter-->>Runner: savings saved
    end

    alt Reset with carry_over_state
        Runner->>Env: reset()
        Env->>Env: reset_timeline_state() (preserve carried state)
        Env-->>Runner: obs (state preserved)
    else Reset without carry_over_state
        Runner->>Env: reset()
        Env->>Metrics: reset_episode_metrics()
        Env-->>Runner: obs (fresh episode state)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

Integration of early Dropped-Jobs, RNG Seeding and Workload generator & alternative "prices" script. #3: Overlaps on environment, job-management, and metrics changes—similar additions of per-episode metrics and dropped-job handling.

Suggested Reviewers

rbx

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.95% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: carrying over state between episodes instead of resetting everything to zero, which aligns with the changeset's core objective.
Description check	✅ Passed	The description is related to the changeset as it acknowledges the carry-over state implementation and mentions continuity checks have passed, though it lacks production testing details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

test/test_sanity_env.py (2)
48-100: Consider using imported constants instead of hard-coded values.

The invariant checks use hard-coded values (170, 96, 168, 16) while the same constants (MAX_JOB_DURATION, CORES_PER_NODE, MAX_NODES_PER_JOB) are already imported from src.config. Using the constants would improve maintainability if bounds change.
♻️ Suggested improvement
-    assert np.all(nodes <= 170), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"
+    assert np.all(nodes <= MAX_JOB_DURATION), f"nodes > MAX_JOB_DURATION exists, max={nodes.max()}"
@@
-    assert np.all((ca >= 0) & (ca <= 96)), f"cores_available out of bounds min={ca.min()} max={ca.max()}"
+    assert np.all((ca >= 0) & (ca <= CORES_PER_NODE)), f"cores_available out of bounds min={ca.min()} max={ca.max()}"
@@
-    assert np.all(ca[idle] == 96), "idle nodes must have full cores_available"
+    assert np.all(ca[idle] == CORES_PER_NODE), "idle nodes must have full cores_available"
Similarly for the other bounds checks. You may need to import MAX_JOB_AGE for the age bound.
275-313: Prefix unused unpacked variables with underscore.

The _extract() call returns nodes and prices which aren't used in this function. Prefix with _ to satisfy lint (RUF059).
🧹 Lint fix
-    nodes, q, prices = _extract(obs)
+    _nodes, q, _prices = _extract(obs)

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c9ade7 and 2b0533d.

📒 Files selected for processing (2)

src/environment.py
test/test_sanity_env.py

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

src/environment.py

🧬 Code graph analysis (1)

src/environment.py (3)

src/metrics_tracker.py (2)

reset_episode_metrics (43-80)

reset_state_metrics (38-41)

test/test_sanity_env.py (1)

reset (321-327)

src/prices_deterministic.py (1)

reset (52-76)

🪛 Ruff (0.14.11)

test/test_sanity_env.py

150-150: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

281-281: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

281-281: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

341-341: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

342-342: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

342-342: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

344-344: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

345-345: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

345-345: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

383-383: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

383-383: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🔇 Additional comments (6)

src/environment.py (3)

218-265: LGTM!

The carry-over state branching logic is correctly implemented. When carry_over_state is True, only episode metrics are reset while preserving the simulation state (nodes, jobs, prices). When False, full state initialization occurs including price series reset. This properly supports timeline continuity between episodes.

267-294: LGTM!

The reset_timeline_state() method properly initializes all timeline-dependent state including nodes, queues, core tracking, and running jobs. The structure is clean and comprehensive.

331-333: LGTM!

The episode-level metric tracking is comprehensive and consistent across both agent and baseline paths. The new info dict returned from step() provides useful observability (step_cost, queue size, active nodes, drops) for training analysis. Based on learnings, this aligns with the requirement for per-episode metrics output.

Also applies to: 359-365, 377-379, 401-403, 415-421, 437-448, 470-470, 522-529

test/test_sanity_env.py (3)

1-37: LGTM!

The imports and load_prices() helper are well-structured. The function correctly handles the optional path and returns typed data for the environment.

103-108: LGTM!

Good aliasing check to ensure observation buffers don't reference internal state, preventing unintended mutations.

183-273: LGTM!

The CLI parsing and environment factory are comprehensive, properly mirroring the training script's configuration options. The norm_path helper correctly handles empty string normalization, and the WorkloadGenerator integration uses appropriate config bounds.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/environment.py (1)

296-301: Episode increment and termination logic breaks in carry-over mode due to unresolved current_hour state.

When carry_over_state=True, reset() calls only reset_episode_metrics(), which doesn't reset current_hour. Since reset_timeline_state() (which sets current_hour = 0) is called only once in __init__, subsequent episodes in carry-over mode start with current_hour at its final value from the previous episode (336).

This causes two problems:

After episode 1, current_hour remains at 336 and increments to 337, 338, etc. in episode 2.

The termination check at line 495 (if self.metrics.current_hour == EPISODE_HOURS) will never trigger again, causing episodes to never terminate after the first one.

The episode increment check at line 299 (if self.metrics.current_hour == 1) also fails in subsequent episodes since current_hour never resets to 1.

In non-carry-over mode, reset() calls reset_state_metrics() which correctly resets current_hour = 0, avoiding this issue.

🤖 Fix all issues with AI agents

In `@src/plotter.py`:
- Around line 90-93: The metric calculations currently read attributes directly
from env (completion_rate, baseline_completion_rate, avg_wait,
baseline_avg_wait) but those values live under env.metrics; update each
expression to use env.metrics.jobs_completed, env.metrics.jobs_submitted,
env.metrics.total_job_wait_time and the baseline counterparts
(env.metrics.baseline_jobs_completed, env.metrics.baseline_jobs_submitted,
env.metrics.baseline_total_job_wait_time) and keep the same guard logic (getattr
checks or existence checks) to avoid division by zero.

🧹 Nitpick comments (12)

src/environment.py (1)
218-265: Significant code duplication between reset() and reset_timeline_state().

Lines 238-263 in reset() duplicate the state initialization logic from reset_timeline_state() (lines 270-294). This violates DRY and creates maintenance risk—if state structure changes, both locations must be updated.
♻️ Suggested refactor to eliminate duplication

Consider extracting the common state initialization to a private helper method:
+    def _init_state_structures(self):
+        """Initialize state structures for nodes, queues, and job tracking."""
+        self.state = {
+            'nodes': np.zeros(MAX_NODES, dtype=np.int32),
+            'job_queue': np.zeros((MAX_QUEUE_SIZE * 4), dtype=np.int32),
+            'predicted_prices': self.prices.predicted_prices.copy(),
+        }
+        self.baseline_state = {
+            'nodes': np.zeros(MAX_NODES, dtype=np.int32),
+            'job_queue': np.zeros((MAX_QUEUE_SIZE * 4), dtype=np.int32),
+        }
+        self.cores_available = np.full(MAX_NODES, CORES_PER_NODE, dtype=np.int32)
+        self.baseline_cores_available = np.full(MAX_NODES, CORES_PER_NODE, dtype=np.int32)
+        self.running_jobs = {}
+        self.baseline_running_jobs = {}
+        self.next_job_id = 0
+        self.next_empty_slot = 0
+        self.baseline_next_empty_slot = 0

     def reset_timeline_state(self):
         self.metrics.current_hour = 0
-        self.state = {
-            # ... all the initialization code ...
-        }
-        # ... rest of initialization ...
+        self._init_state_structures()

     def reset(self, seed=None, options=None):
         # ... existing code ...
         if not self.carry_over_state:
             # ... price reset logic ...
-            self.state = {
-                # ... all the initialization code ...
-            }
-            # ... rest of initialization ...
+            self._init_state_structures()
src/plotter.py (3)
46-58: Potential off-by-one in monthly percentage calculation.

The monthly percentage is calculated every 2 episodes (assuming 2-week episodes = 1 month), but the values are duplicated twice (extend([pct, pct])). This creates step-like visualization but may cause array length mismatches. The padding logic at lines 66-70 handles this, but it's fragile.

Consider documenting this behavior more clearly or simplifying by computing monthly percentages at the end rather than during iteration.

81-81: Unused parameter episode_costs should be removed.

The docstring at line 84-85 acknowledges this is kept for backward compatibility. Consider removing it entirely if it's truly unused, or add _ = episode_costs to silence the warning if backward compatibility is required.
♻️ Proposed fix
-def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+def plot_dashboard(env, num_hours, max_nodes, save=True, show=True, suffix=""):
     """
     Per-hour dashboard: price, nodes, queue, reward components, etc.
-    NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
-          Cumulative savings now lives in plot_cumulative_savings().
     """
Or if backward compatibility is truly needed:
 def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+    del episode_costs  # Unused, kept for backward compatibility
290-290: Ambiguous variable name l.

The variable l can be confused with 1 (one) in certain fonts. Use a more descriptive name.
♻️ Proposed fix
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]
train.py (4)
156-159: Remove extraneous f-prefix from plain string.

Line 156 uses an f-string without any placeholders.
Suggested fix
-        print(f"Starting a new model training...")
+        print("Starting a new model training...")
259-275: Avoid hardcoding max_nodes=335; retrieve from environment.

The hardcoded value will break if the environment's node count changes. The commented-out code shows the right approach.
Suggested fix
                 plot_dashboard(
                     env,
                     num_hours=args.dashboard_hours,
-                 #   max_nodes=env.max_nodes if hasattr(env, "max_nodes") else env.num_nodes if hasattr(env, "num_nodes") else 0,
-                    max_nodes=335,
+                    max_nodes=getattr(env, "max_nodes", getattr(env, "num_nodes", 335)),
287-295: Minor issues in training loop instrumentation.

Line 288-289: The progress log is confusingly worded - it says "at {iters + 1}" but iters was already incremented on line 286.

Line 294: f-string without placeholders on tb_log_name=f"PPO".
Suggested fixes
             iters += 1
             t0 = time.time()
-            if (iters+1)%10==0:
-                print(f"Running... at {iters + 1} of {STEPS_PER_ITERATION * (iters + 1)} steps")
+            if iters % 10 == 0:
+                print(f"Running iteration {iters}, cumulative steps: {STEPS_PER_ITERATION * iters}")
             if args.iter_limit > 0 and iters > args.iter_limit:
                 print(f"iterations limit ({args.iter_limit}) reached: {iters}.")
                 break
             try:
-                model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name=f"PPO", callback=ComputeClusterCallback())
+                model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name="PPO", callback=ComputeClusterCallback())
152-159: Consider increasing n_steps or documenting the rationale for the small value.

The n_steps=64 is significantly smaller than stable-baselines3's default of 2048. For this environment with long episodes (336 hours), SB3 best practices recommend larger n_steps (1024–2048) to reduce correlated data and improve sample efficiency. The current configuration results in ~5 updates per episode, compared to ~1 per 6 episodes with the default, which increases variance and may impact learning stability. If the intent is faster iteration for development/testing, consider adding a comment explaining the choice or making it configurable via command-line argument.
train_iter.py (1)
175-178: Consider improving readability of long function call.

Line 178 is quite long and could benefit from line breaks for readability.
Suggested improvement
     for combo in combinations:
         efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight = combo
         print(f"Running with weights: efficiency={efficiency_weight}, price={price_weight}, idle={idle_weight}, job_age={job_age_weight}, drop={drop_weight}")
-        run(efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, args.iter_limit_per_step, args.session, args.prices, args.job_durations, args.jobs, args.hourly_jobs,plot_dashboard=args.plot_dashboard,dashboard_hours=args.dashboard_hours)
+        run(
+            efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight,
+            args.iter_limit_per_step, args.session,
+            args.prices, args.job_durations, args.jobs, args.hourly_jobs,
+            plot_dashboard=args.plot_dashboard,
+            dashboard_hours=args.dashboard_hours,
+        )
test/test_sanity_env.py (3)
48-101: Consider using imported constants instead of hardcoded values.

The invariant checks use hardcoded values (170, 96, 168, 16) that should match environment constants. The comment at line 84 acknowledges this, but since you already import MAX_JOB_DURATION, MAX_NODES_PER_JOB, CORES_PER_NODE, etc., consider using them for consistency.
Example for one check
-    assert np.all((dur >= 0) & (dur <= 170)), f"duration out of bounds min={dur.min()} max={dur.max()}"
+    assert np.all((dur >= 0) & (dur <= MAX_JOB_DURATION)), f"duration out of bounds min={dur.min()} max={dur.max()}"
243-246: Fix indentation of nested function.

The norm_path function has inconsistent indentation (extra leading space).
Suggested fix
-     # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.
+    # Train.py passes strings; the env treats "" as falsy in some places and truthy in others.
     # To be safe: normalize "" -> None here.
     def norm_path(x):
         return None if (x is None or str(x).strip() == "") else x
336-359: Consider removing or documenting inline debugging test.

Lines 336-359 appear to be inline debugging/verification code that duplicates the formal determinism_test. If this is intentional for quick sanity checks, consider adding a comment explaining its purpose, or removing it if it's leftover debugging code.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f1cefff and dd7d5cd.

📒 Files selected for processing (8)

src/baseline.py
src/environment.py
src/job_management.py
src/metrics_tracker.py
src/plotter.py
test/test_sanity_env.py
train.py
train_iter.py

🧰 Additional context used

📓 Path-based instructions (2)

train.py

📄 CodeRabbit inference engine (CLAUDE.md)

train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Files:

train.py

{train.py,weights.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Files:

train.py

🧠 Learnings (7)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

src/job_management.py
src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)

Applied to files:

train_iter.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

train_iter.py
train.py
src/plotter.py
src/environment.py
src/metrics_tracker.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps

Applied to files:

train.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

src/environment.py

🧬 Code graph analysis (3)

train_iter.py (1)

src/plotter.py (1)

plot_dashboard (81-222)

train.py (2)

src/plotter.py (2)

plot_dashboard (81-222)

plot_cumulative_savings (225-316)

src/plot.py (1)

plot_cumulative_savings (136-276)

src/environment.py (3)

src/metrics_tracker.py (2)

reset_episode_metrics (43-80)

reset_state_metrics (38-41)

test/test_sanity_env.py (1)

reset (321-327)

src/prices_deterministic.py (1)

reset (52-76)

🪛 Ruff (0.14.11)

test/test_sanity_env.py

126-126: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

281-281: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

281-281: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

341-341: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

342-342: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

342-342: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

344-344: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

345-345: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

345-345: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

383-383: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

383-383: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

train_iter.py

119-119: subprocess call: check for execution of untrusted input

(S603)

train.py

156-156: f-string without any placeholders

Remove extraneous f prefix

(F541)

276-276: Do not catch blind exception: Exception

(BLE001)

294-294: f-string without any placeholders

Remove extraneous f prefix

(F541)

317-317: Do not catch blind exception: Exception

(BLE001)

src/plotter.py

81-81: Unused function argument: episode_costs

(ARG001)

290-290: Ambiguous variable name: l

(E741)

🔇 Additional comments (23)

src/environment.py (3)

175-176: Initialization order may cause issues with self.prices.

reset_timeline_state() at line 175 accesses self.prices.predicted_prices (line 276), but self.prices is initialized at line 121, so this should work. However, the call order means the state is initialized twice—once in reset_timeline_state() during __init__, and again in reset() when the environment is first reset. This is intentional for the carry-over flow but worth noting.

331-333: LGTM - Episode-level metrics propagation is consistent.

The episode-level metrics (episode_jobs_submitted, episode_jobs_rejected_queue_full, episode_jobs_completed, episode_total_job_wait_time, episode_jobs_dropped) are correctly incremented alongside their cumulative counterparts, maintaining parallel tracking.

Also applies to: 377-379

401-402: LGTM - Baseline and cost episode metrics are properly tracked.

Episode-level baseline metrics and costs are correctly propagated from the baseline_metrics dict and accumulated into episode_total_cost, episode_baseline_cost, and episode_baseline_cost_off.

Also applies to: 437-448, 446-447, 468-470

src/metrics_tracker.py (3)

7-14: LGTM - Clean initialization pattern.

The constructor properly separates cumulative metrics initialization from state/episode metrics, establishing a clear hierarchy.

62-76: LGTM - Comprehensive episode-level metrics.

The episode-level metrics for both agent and baseline are well-structured and mirror the cumulative metrics, enabling accurate per-episode reporting.

93-101: LGTM - Correct episode-level calculations.

The calculations for avg_wait_time, completion_rate, and drop_rate properly use episode-level counters with appropriate zero-division guards.

src/job_management.py (2)

159-167: LGTM - Backward-compatible episode metrics updates.

The presence checks (if 'episode_*' in metrics) allow this function to work with both old and new metrics dictionaries, maintaining backward compatibility.

187-193: LGTM - Consistent pattern for drop metrics.

The episode-level drop tracking mirrors the completion tracking pattern, with proper presence checks for both baseline and non-baseline paths.

src/baseline.py (2)

42-46: LGTM - Episode metrics for baseline submissions.

Correctly increments episode_baseline_jobs_submitted and episode_baseline_jobs_rejected_queue_full with presence checks for backward compatibility.

61-63: LGTM - Episode max queue size tracking.

Properly tracks per-episode maximum queue size for baseline, with the inner condition correctly checking against the episode-level threshold.

src/plotter.py (2)

8-16: LGTM - Robust series normalization helper.

The _as_series function correctly handles None, arrays of varying lengths, and padding with NaN values.

225-316: LGTM - Comprehensive cumulative savings visualization.

The function properly handles dual-axis plotting, combines legends, and provides a summary statistics box. The return value with computed statistics is useful for programmatic access.

train.py (4)

5-11: LGTM!

Import changes are appropriate - plot_dashboard and plot_cumulative_savings from the new plotter module, and time for performance instrumentation.

60-62: LGTM!

New CLI options for dashboard plotting are well-defined with sensible defaults (336 hours = 2 weeks).

298-318: LGTM on dashboard plotting integration.

The exception handling around plot_dashboard is appropriate for non-fatal plotting operations. Using Exception here is acceptable since you want to catch any plotting failure without crashing training.

218-251: Good implementation of evaluation metrics.

The evaluation output correctly includes per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections), aligning with the coding guidelines.

train_iter.py (2)

84-98: LGTM!

The extended function signature with plot_dashboard and dashboard_hours parameters is clean and maintains backward compatibility with defaults.

114-122: LGTM on subprocess forwarding.

The subprocess call constructs commands from user-provided CLI arguments, which is appropriate for this local training orchestration tool. The Ruff S603 warning is a false positive in this context.

test/test_sanity_env.py (5)

1-11: Good documentation and usage examples.

The docstrings provide clear usage examples for different test scenarios (quick run, full belt-and-braces, with external data).

111-142: Well-designed determinism test.

The test correctly pins price_start_index to ensure external price variation doesn't affect determinism checking. The unused obs variable from unpacking (line 126) is a common pattern in RL code and can be safely ignored.

144-177: Excellent carry-over continuity test.

This test directly validates the PR's core objective - carrying state between episodes rather than resetting to zero. The comprehensive snapshot comparison ensures all critical state components (nodes, job_queue, predicted_prices, running_jobs, price_index, etc.) are preserved across resets.

377-377: Clarify why check_obs_is_copy is commented out.

If there's a known issue with observation aliasing, consider adding a TODO or comment explaining why this check is disabled.

398-406: Good test orchestration structure.

The conditional test execution (gym compliance → invariants → determinism → carry-over) provides flexibility for different testing scenarios.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T16:20:20Z

src/metrics_tracker.py

    def reset_state_metrics(self):
+        """Reset timeline-dependent metrics."""
+        self.current_hour = 0
+        self.reset_episode_metrics()


⚠️ Potential issue | 🟠 Major

reset_state_metrics resets current_hour but reset_episode_metrics does not.

This is the expected behavior for the non-carry-over case, but confirms the issue noted in environment.py: in carry-over mode, calling only reset_episode_metrics() will not reset current_hour, causing episode boundary detection to fail.

If carry-over mode is intended to span multiple episodes without resetting time, then current_hour should be reset separately in reset() when carry_over_state=True:

🐛 Proposed fix in environment.py reset()

def reset(self, seed=None, options=None): # ... # Reset metrics if self.carry_over_state: + self.metrics.current_hour = 0 # Reset hour for new episode self.metrics.reset_episode_metrics() else: self.metrics.reset_state_metrics()

src/plotter.py

… now checks for determinism and carry-over continuitiy. Add carry-over state handling in the refactored environment by initializing timeline state on construction and skipping resets when enabled, while separating cumulative from episode-scoped metrics. Update MetricsTracker to track episode_* counters, propagate those counters through job assignment and baseline steps, and ensure episode summaries use per-episode values. Extend sanity tests and metrics usage to align with the refactor (metrics.current_hour, carry-over continuity checks).

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@src/environment.py`:
- Around line 175-177: The constructor calls reset_timeline_state() before
Prices.reset(), so reset_timeline_state() tries to copy
self.prices.predicted_prices while it's still None; fix by invoking
self.prices.reset(start_index=0) before calling self.reset_timeline_state() in
__init__ (same order used in reset()), ensuring predicted_prices is initialized
before reset_timeline_state() uses self.prices.predicted_prices.copy().

In `@src/plotter.py`:
- Around line 81-86: The parameter episode_costs on plot_dashboard is
intentionally unused but triggers Ruff ARG001; silence the warning by explicitly
using it in the function body (e.g., assign it to a throwaway variable or delete
it) so the signature stays the same but the linter treats it as used; update the
body of plot_dashboard to include a line like "_ = episode_costs" (or "del
episode_costs") immediately after the docstring.

In `@test/test_sanity_env.py`:
- Around line 118-137: In the rollout() function and other places where you
unpack but don't use values (e.g., the env.reset and env.step returns and
variables like obs, r, info, nodes, prices), prefix those unused variable names
with an underscore (e.g., _obs, _r, _info, _nodes, _prices) so the linter RUF059
is satisfied; update every unpacking site (including the env.reset(...) return
and env.step(...) tuples in rollout and the other similar blocks referenced) to
use the underscore-prefixed names consistently.
- Around line 166-176: Update the carry-over test to expect metrics.current_hour
to be reset to 0 after calling ComputeClusterEnv.reset() instead of matching the
saved snapshot; specifically modify the assertion that references
env.metrics.current_hour (in test/test_sanity_env.py) so it asserts
env.metrics.current_hour == 0 (reflecting the reset behavior in
ComputeClusterEnv.reset) rather than comparing to snapshot["current_hour"].

♻️ Duplicate comments (1)

src/plotter.py (1)

89-179: Use env.metrics for costs, job counts, and time‑series (current code can crash or show zeros).

plot_dashboard() still reads env.jobs_completed, env.total_cost, env.price_stats, etc., but these live on env.metrics. This will either raise AttributeError or render empty panels. Please route all metric reads through metrics.

🛠️ Proposed fix

-    base_cost = float(getattr(env, "baseline_cost", 0.0))
-    base_cost_off = float(getattr(env, "baseline_cost_off", 0.0))
-    agent_cost = float(getattr(env, "total_cost", 0.0))
+    base_cost = float(getattr(metrics, "baseline_cost", 0.0))
+    base_cost_off = float(getattr(metrics, "baseline_cost_off", 0.0))
+    agent_cost = float(getattr(metrics, "total_cost", 0.0))
@@
-        f"Jobs: {env.jobs_completed}/{env.jobs_submitted} ({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={env.max_queue_size_reached}) | "
-        f"Base: {env.baseline_jobs_completed}/{env.baseline_jobs_submitted} ({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={env.baseline_max_queue_size_reached})"
+        f"Jobs: {getattr(metrics, 'jobs_completed', 0)}/{getattr(metrics, 'jobs_submitted', 0)} "
+        f"({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={getattr(metrics, 'max_queue_size_reached', 0)}) | "
+        f"Base: {getattr(metrics, 'baseline_jobs_completed', 0)}/{getattr(metrics, 'baseline_jobs_submitted', 0)} "
+        f"({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={getattr(metrics, 'baseline_max_queue_size_reached', 0)})"
@@
-        add_panel("Electricity price", getattr(env, "price_stats", None), "€/MWh", None)
+        add_panel("Electricity price", getattr(metrics, "price_stats", None), "€/MWh", None)
@@
-        add_panel("Online nodes", getattr(env, "on_nodes", None), "count", (0, max_nodes * 1.1))
+        add_panel("Online nodes", getattr(metrics, "on_nodes", None), "count", (0, max_nodes * 1.1))
@@
-        add_panel("Used nodes", getattr(env, "used_nodes", None), "count", (0, max_nodes))
+        add_panel("Used nodes", getattr(metrics, "used_nodes", None), "count", (0, max_nodes))
@@
-        add_panel("Job queue & running jobs",
-                  getattr(env, "job_queue_sizes", None),
+        add_panel("Job queue & running jobs",
+                  getattr(metrics, "job_queue_sizes", None),
                   "jobs",
                   None,
                   overlay=("Running jobs", running_series),
         )
@@
-        add_panel("Efficiency reward (%)", getattr(env, "eff_rewards", None), "score", None)
+        add_panel("Efficiency reward (%)", getattr(metrics, "eff_rewards", None), "score", None)
@@
-        add_panel("Price reward (%)", getattr(env, "price_rewards", None), "score", None)
+        add_panel("Price reward (%)", getattr(metrics, "price_rewards", None), "score", None)
@@
-        add_panel("Idle penalty (%)", getattr(env, "idle_penalties", None), "score", None)
+        add_panel("Idle penalty (%)", getattr(metrics, "idle_penalties", None), "score", None)
@@
-        add_panel("Job-age penalty (%)", getattr(env, "job_age_penalties", None), "score", None)
+        add_panel("Job-age penalty (%)", getattr(metrics, "job_age_penalties", None), "score", None)
@@
-        add_panel("Total reward", getattr(env, "rewards", None), "reward", None)
+        add_panel("Total reward", getattr(metrics, "rewards", None), "reward", None)

🧹 Nitpick comments (1)

src/plotter.py (1)
305-308: Minor readability: avoid single‑letter l in list comprehension.

Ruff flags this as ambiguous (E741).
✏️ Suggested tweak
-    labels = [l.get_label() for l in lines]
+    labels = [line.get_label() for line in lines]

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e15d30 and 2c9ade7.

📒 Files selected for processing (6)

src/baseline.py
src/environment.py
src/job_management.py
src/metrics_tracker.py
src/plotter.py
test/test_sanity_env.py

🚧 Files skipped from review as they are similar to previous changes (2)

src/baseline.py
src/job_management.py

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler

Applied to files:

src/environment.py

📚 Learning: 2025-12-04T08:51:13.779Z

Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)

Applied to files:

src/environment.py
src/plotter.py
src/metrics_tracker.py

🧬 Code graph analysis (1)

src/environment.py (3)

src/metrics_tracker.py (2)

reset_episode_metrics (43-80)

reset_state_metrics (38-41)

test/test_sanity_env.py (1)

reset (321-327)

src/prices_deterministic.py (1)

reset (52-76)

🪛 Ruff (0.14.11)

test/test_sanity_env.py

126-126: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

150-150: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

281-281: Unpacked variable nodes is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

281-281: Unpacked variable prices is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

341-341: Unpacked variable o1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

342-342: Unpacked variable t1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

342-342: Unpacked variable tr1 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

344-344: Unpacked variable o2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

345-345: Unpacked variable t2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

345-345: Unpacked variable tr2 is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

383-383: Unpacked variable r is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

383-383: Unpacked variable info is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

src/plotter.py

81-81: Unused function argument: episode_costs

(ARG001)

307-307: Ambiguous variable name: l

(E741)

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-16T12:38:47Z

src/environment.py

+        self.reset_timeline_state()
+        self.metrics.reset_episode_metrics()
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the Prices class and check __init__ rg -n "class Prices" src/ -A 20 | head -50

Repository: FairRootGroup/powersched

Length of output: 2426

🏁 Script executed:

# Also check the environment.py file at the mentioned lines sed -n '160,190p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 1410

🏁 Script executed:

# And check lines 268-277 as mentioned sed -n '260,285p' src/environment.py

Repository: FairRootGroup/powersched

Length of output: 1050

🏁 Script executed:

# Now let's look at the reset_timeline_state implementation rg -n "def reset_timeline_state" src/environment.py -A 10

Repository: FairRootGroup/powersched

Length of output: 577

🏁 Script executed:

# Search for reset method in Prices class rg -n "def reset" src/prices_deterministic.py -A 15

Repository: FairRootGroup/powersched

Length of output: 796

🏁 Script executed:

# Check the full environment.__init__ to see the complete sequence rg -n "def __init__" src/environment.py -A 80 | head -120

Repository: FairRootGroup/powersched

Length of output: 3707

🏁 Script executed:

# Also check if Prices.reset() is called anywhere in environment.__init__ rg -n "self.prices.reset" src/environment.py

Repository: FairRootGroup/powersched

Length of output: 183

🏁 Script executed:

# Get full context around these critical lines sed -n '115,250p' src/environment.py | cat -n

Repository: FairRootGroup/powersched

Length of output: 6999

reset_timeline_state() attempts to copy self.prices.predicted_prices while it is still None.

In __init__ at line 175, reset_timeline_state() is called before Prices.reset() is invoked. However, Prices.__init__ initializes predicted_prices = None (line 19), and it only gets populated when Prices.reset() is called. This causes AttributeError: 'NoneType' object has no attribute 'copy' at line 277 when trying to execute self.prices.predicted_prices.copy().

Call self.prices.reset(start_index=0) before self.reset_timeline_state() in __init__, matching the pattern used in the reset() method (lines 120–123).

🤖 Prompt for AI Agents

In `@src/environment.py` around lines 175 - 177, The constructor calls reset_timeline_state() before Prices.reset(), so reset_timeline_state() tries to copy self.prices.predicted_prices while it's still None; fix by invoking self.prices.reset(start_index=0) before calling self.reset_timeline_state() in __init__ (same order used in reset()), ensuring predicted_prices is initialized before reset_timeline_state() uses self.prices.predicted_prices.copy().

coderabbitai · 2026-01-16T12:38:48Z

src/plotter.py

+def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
+    """
+    Per-hour dashboard: price, nodes, queue, reward components, etc.
+    NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
+          Cumulative savings now lives in plot_cumulative_savings().
+    """


⚠️ Potential issue | 🟡 Minor

Silence Ruff’s unused-parameter warning for episode_costs.

The parameter is kept for backwards compatibility but is unused, so Ruff (ARG001) will complain. Mark it intentionally unused to keep lint clean without breaking the signature.

🧹 Proposed fix

def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""): """ Per-hour dashboard: price, nodes, queue, reward components, etc. NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore. Cumulative savings now lives in plot_cumulative_savings(). """ + _ = episode_costs # backward-compat placeholder

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):

"""

Per-hour dashboard: price, nodes, queue, reward components, etc.

NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.

Cumulative savings now lives in plot_cumulative_savings().

"""

def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):

"""

Per-hour dashboard: price, nodes, queue, reward components, etc.

NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.

Cumulative savings now lives in plot_cumulative_savings().

"""

_ = episode_costs # backward-compat placeholder

🧰 Tools

🪛 Ruff (0.14.11)

81-81: Unused function argument: episode_costs

(ARG001)

🤖 Prompt for AI Agents

In `@src/plotter.py` around lines 81 - 86, The parameter episode_costs on plot_dashboard is intentionally unused but triggers Ruff ARG001; silence the warning by explicitly using it in the function body (e.g., assign it to a throwaway variable or delete it) so the signature stays the same but the linter treats it as used; update the body of plot_dashboard to include a line like "_ = episode_costs" (or "del episode_costs") immediately after the docstring.

coderabbitai · 2026-01-16T12:38:48Z

test/test_sanity_env.py

+    def rollout():
+        env = make_env()
+        # Pin external price window so determinism doesn't vary by episode.
+        obs, _ = env.reset(seed=seed, options={"price_start_index": 0})
+        traj = []
+        done = False
+        i = 0
+        while not done and i < n_steps:
+            obs, r, term, trunc, info = env.step(actions[i])
+            # record a small fingerprint
+            traj.append((
+                float(r),
+                float(info.get("step_cost", 0.0)),
+                int(info.get("num_unprocessed_jobs", -1)),
+                int(info.get("num_on_nodes", -1)),
+                int(info.get("dropped_this_episode", -1)),
+            ))
+            done = term or trunc
+            i += 1
+        env.close()


⚠️ Potential issue | 🟡 Minor

Ruff RUF059: unused unpacked variables.

Several unpacked values aren’t used (e.g., obs, r, info, nodes, prices). Prefix them with _ to keep lint clean.

🧹 Suggested clean‑up

- obs, r, term, trunc, info = env.step(actions[i]) + _obs, r, term, trunc, info = env.step(actions[i]) @@ - obs, r, term, trunc, info = env.step(action) + _obs, _r, term, trunc, _info = env.step(action) @@ - nodes, q, prices = _extract(obs) + _nodes, q, _prices = _extract(obs) @@ - o1, _ = env.reset(seed=seed) + _o1, _ = env.reset(seed=seed) - o1s, r1, t1, tr1, i1 = env.step(action) + o1s, r1, _t1, _tr1, i1 = env.step(action) @@ - o2, _ = env.reset(seed=seed) + _o2, _ = env.reset(seed=seed) - o2s, r2, t2, tr2, i2 = env.step(action) + o2s, r2, _t2, _tr2, i2 = env.step(action) @@ - obs, r, term, trunc, info = env.step(action) + obs, _r, term, trunc, _info = env.step(action)

Also applies to: 145-152, 281-283, 341-346, 383-384

🧰 Tools

🪛 Ruff (0.14.11)

126-126: Unpacked variable obs is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

In `@test/test_sanity_env.py` around lines 118 - 137, In the rollout() function and other places where you unpack but don't use values (e.g., the env.reset and env.step returns and variables like obs, r, info, nodes, prices), prefix those unused variable names with an underscore (e.g., _obs, _r, _info, _nodes, _prices) so the linter RUF059 is satisfied; update every unpacking site (including the env.reset(...) return and env.step(...) tuples in rollout and the other similar blocks referenced) to use the underscore-prefixed names consistently.

coderabbitai · 2026-01-16T12:38:48Z

test/test_sanity_env.py

+    env.reset(seed=seed)
+
+    assert np.array_equal(env.state["nodes"], snapshot["nodes"]), "carry-over failed: nodes reset"
+    assert np.array_equal(env.state["job_queue"], snapshot["job_queue"]), "carry-over failed: job_queue reset"
+    assert np.array_equal(env.state["predicted_prices"], snapshot["predicted_prices"]), "carry-over failed: predicted_prices reset"
+    assert np.array_equal(env.cores_available, snapshot["cores_available"]), "carry-over failed: cores_available reset"
+    assert env.running_jobs == snapshot["running_jobs"], "carry-over failed: running_jobs reset"
+    assert env.prices.price_index == snapshot["price_index"], "carry-over failed: price_index reset"
+    assert env.next_job_id == snapshot["next_job_id"], "carry-over failed: next_job_id reset"
+    assert env.next_empty_slot == snapshot["next_empty_slot"], "carry-over failed: next_empty_slot reset"
+    assert env.metrics.current_hour == snapshot["current_hour"], "carry-over failed: current_hour reset"


⚠️ Potential issue | 🟡 Minor

Carry‑over test should expect current_hour to reset.

ComputeClusterEnv.reset() sets metrics.current_hour = 0 in carry‑over mode, so asserting it stays unchanged will fail. Update the assertion to reflect the intended per‑episode hour reset.

🧪 Proposed fix

- assert env.metrics.current_hour == snapshot["current_hour"], "carry-over failed: current_hour reset" + assert env.metrics.current_hour == 0, "carry-over failed: current_hour should reset per episode"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

env.reset(seed=seed)

assert np.array_equal(env.state["nodes"], snapshot["nodes"]), "carry-over failed: nodes reset"

assert np.array_equal(env.state["job_queue"], snapshot["job_queue"]), "carry-over failed: job_queue reset"

assert np.array_equal(env.state["predicted_prices"], snapshot["predicted_prices"]), "carry-over failed: predicted_prices reset"

assert np.array_equal(env.cores_available, snapshot["cores_available"]), "carry-over failed: cores_available reset"

assert env.running_jobs == snapshot["running_jobs"], "carry-over failed: running_jobs reset"

assert env.prices.price_index == snapshot["price_index"], "carry-over failed: price_index reset"

assert env.next_job_id == snapshot["next_job_id"], "carry-over failed: next_job_id reset"

assert env.next_empty_slot == snapshot["next_empty_slot"], "carry-over failed: next_empty_slot reset"

assert env.metrics.current_hour == snapshot["current_hour"], "carry-over failed: current_hour reset"

env.reset(seed=seed)

assert np.array_equal(env.state["nodes"], snapshot["nodes"]), "carry-over failed: nodes reset"

assert np.array_equal(env.state["job_queue"], snapshot["job_queue"]), "carry-over failed: job_queue reset"

assert np.array_equal(env.state["predicted_prices"], snapshot["predicted_prices"]), "carry-over failed: predicted_prices reset"

assert np.array_equal(env.cores_available, snapshot["cores_available"]), "carry-over failed: cores_available reset"

assert env.running_jobs == snapshot["running_jobs"], "carry-over failed: running_jobs reset"

assert env.prices.price_index == snapshot["price_index"], "carry-over failed: price_index reset"

assert env.next_job_id == snapshot["next_job_id"], "carry-over failed: next_job_id reset"

assert env.next_empty_slot == snapshot["next_empty_slot"], "carry-over failed: next_empty_slot reset"

assert env.metrics.current_hour == 0, "carry-over failed: current_hour should reset per episode"

🤖 Prompt for AI Agents

In `@test/test_sanity_env.py` around lines 166 - 176, Update the carry-over test to expect metrics.current_hour to be reset to 0 after calling ComputeClusterEnv.reset() instead of matching the saved snapshot; specifically modify the assertion that references env.metrics.current_hour (in test/test_sanity_env.py) so it asserts env.metrics.current_hour == 0 (reflecting the reset behavior in ComputeClusterEnv.reset) rather than comparing to snapshot["current_hour"].

Enis Lorenz added 3 commits January 15, 2026 10:35

Fix: CodeRabbit suggestions.

a1fde81

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

Enis Lorenz added 3 commits January 16, 2026 13:28

Fix: Corrected imports to fit refactor-logic

5a2729d

Fix: CodeRabbit suggestions

2c9ade7

enlorenz force-pushed the wip/integration branch from 2e15d30 to 2c9ade7 Compare January 16, 2026 12:30

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

Fix: Carry-over variables now not-empty

2b0533d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Carry-over states between episodes, instead of resetting everything to 0. #5

Carry-over states between episodes, instead of resetting everything to 0. #5

Uh oh!

enlorenz commented Jan 15, 2026

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly Related PRs

Suggested Reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 15, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 16, 2026

Uh oh!

coderabbitai bot Jan 16, 2026

Uh oh!

coderabbitai bot Jan 16, 2026

Uh oh!

coderabbitai bot Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		self.reset_timeline_state()
		self.metrics.reset_episode_metrics()

Carry-over states between episodes, instead of resetting everything to 0. #5

Are you sure you want to change the base?

Carry-over states between episodes, instead of resetting everything to 0. #5

Uh oh!

Conversation

enlorenz commented Jan 15, 2026

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly Related PRs

Suggested Reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Jan 15, 2026 •

edited

Loading