-
Notifications
You must be signed in to change notification settings - Fork 1
Carry-over states between episodes, instead of resetting everything to 0. #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… for env. test_sanity_env.py is a standalone sanity/check harness that builds the env like train.py and runs invariants, gym compliance, and determinism checks with CLI options and optional external data. test_env.py is a minimal smoke test that instantiates the env with fixed weights and runs random actions, printing rewards; no invariants, determinism, or CLI control.
…raining. Add plotter.py with reusable plotting helpers for per-hour dashboards and cumulative savings analysis. Update train.py to import plotter utilities and add CLI flags to drive dashboard/cumulative plots. Hook plotting into evaluation and training flows, including saving figures to session plots. Addendum: Added plotter logic in train_iter.py
📝 WalkthroughWalkthroughAdds episode-scoped metrics and separates them from cumulative metrics, introduces a carry_over_state mode for timeline continuity, adds a plotting module for per-hour dashboards and cumulative savings, extends training scripts to optionally emit dashboards, and adds a standalone environment sanity/test script. Changes
Sequence Diagram(s)sequenceDiagram
participant Runner
participant Env as ComputeClusterEnv
participant JobMgmt as JobManagement
participant Metrics as MetricsTracker
participant Plotter
Runner->>Env: step(action)
Env->>JobMgmt: assign_jobs_to_available_nodes()
JobMgmt->>Metrics: update completion/drop (cumulative + episode)
JobMgmt-->>Env: job events (assigned/completed/dropped)
Env->>Metrics: update episode cost, queue/price metrics
Metrics-->>Env: metrics snapshot
Env-->>Runner: obs, reward, done
alt Episode end
Runner->>Metrics: record_episode_completion()
Metrics-->>Runner: episode_data (episode_* fields)
Runner->>Plotter: plot_dashboard(env, episode_costs)
Plotter-->>Runner: dashboard saved
Runner->>Plotter: plot_cumulative_savings(env, episode_costs)
Plotter-->>Runner: savings saved
end
alt Reset with carry_over_state
Runner->>Env: reset()
Env->>Env: reset_timeline_state() (preserve carried state)
Env-->>Runner: obs (state preserved)
else Reset without carry_over_state
Runner->>Env: reset()
Env->>Metrics: reset_episode_metrics()
Env-->>Runner: obs (fresh episode state)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly Related PRs
Suggested Reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (5)📓 Common learnings📚 Learning: 2025-12-04T08:51:13.779ZApplied to files:
📚 Learning: 2025-12-04T08:51:13.779ZApplied to files:
📚 Learning: 2025-12-04T08:51:13.779ZApplied to files:
📚 Learning: 2025-12-04T08:51:13.779ZApplied to files:
🧬 Code graph analysis (1)src/environment.py (3)
🪛 Ruff (0.14.11)test/test_sanity_env.py150-150: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 150-150: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 150-150: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 281-281: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 281-281: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 341-341: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 342-342: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 342-342: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 344-344: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 345-345: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 345-345: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 383-383: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 383-383: Unpacked variable Prefix it with an underscore or any other dummy variable pattern (RUF059) 🔇 Additional comments (6)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/environment.py (1)
296-301: Episode increment and termination logic breaks in carry-over mode due to unresolvedcurrent_hourstate.When
carry_over_state=True,reset()calls onlyreset_episode_metrics(), which doesn't resetcurrent_hour. Sincereset_timeline_state()(which setscurrent_hour = 0) is called only once in__init__, subsequent episodes in carry-over mode start withcurrent_hourat its final value from the previous episode (336).This causes two problems:
- After episode 1,
current_hourremains at 336 and increments to 337, 338, etc. in episode 2.- The termination check at line 495 (
if self.metrics.current_hour == EPISODE_HOURS) will never trigger again, causing episodes to never terminate after the first one.- The episode increment check at line 299 (
if self.metrics.current_hour == 1) also fails in subsequent episodes sincecurrent_hournever resets to 1.In non-carry-over mode,
reset()callsreset_state_metrics()which correctly resetscurrent_hour = 0, avoiding this issue.
🤖 Fix all issues with AI agents
In `@src/plotter.py`:
- Around line 90-93: The metric calculations currently read attributes directly
from env (completion_rate, baseline_completion_rate, avg_wait,
baseline_avg_wait) but those values live under env.metrics; update each
expression to use env.metrics.jobs_completed, env.metrics.jobs_submitted,
env.metrics.total_job_wait_time and the baseline counterparts
(env.metrics.baseline_jobs_completed, env.metrics.baseline_jobs_submitted,
env.metrics.baseline_total_job_wait_time) and keep the same guard logic (getattr
checks or existence checks) to avoid division by zero.
🧹 Nitpick comments (12)
src/environment.py (1)
218-265: Significant code duplication betweenreset()andreset_timeline_state().Lines 238-263 in
reset()duplicate the state initialization logic fromreset_timeline_state()(lines 270-294). This violates DRY and creates maintenance risk—if state structure changes, both locations must be updated.♻️ Suggested refactor to eliminate duplication
Consider extracting the common state initialization to a private helper method:
+ def _init_state_structures(self): + """Initialize state structures for nodes, queues, and job tracking.""" + self.state = { + 'nodes': np.zeros(MAX_NODES, dtype=np.int32), + 'job_queue': np.zeros((MAX_QUEUE_SIZE * 4), dtype=np.int32), + 'predicted_prices': self.prices.predicted_prices.copy(), + } + self.baseline_state = { + 'nodes': np.zeros(MAX_NODES, dtype=np.int32), + 'job_queue': np.zeros((MAX_QUEUE_SIZE * 4), dtype=np.int32), + } + self.cores_available = np.full(MAX_NODES, CORES_PER_NODE, dtype=np.int32) + self.baseline_cores_available = np.full(MAX_NODES, CORES_PER_NODE, dtype=np.int32) + self.running_jobs = {} + self.baseline_running_jobs = {} + self.next_job_id = 0 + self.next_empty_slot = 0 + self.baseline_next_empty_slot = 0 def reset_timeline_state(self): self.metrics.current_hour = 0 - self.state = { - # ... all the initialization code ... - } - # ... rest of initialization ... + self._init_state_structures() def reset(self, seed=None, options=None): # ... existing code ... if not self.carry_over_state: # ... price reset logic ... - self.state = { - # ... all the initialization code ... - } - # ... rest of initialization ... + self._init_state_structures()src/plotter.py (3)
46-58: Potential off-by-one in monthly percentage calculation.The monthly percentage is calculated every 2 episodes (assuming 2-week episodes = 1 month), but the values are duplicated twice (
extend([pct, pct])). This creates step-like visualization but may cause array length mismatches. The padding logic at lines 66-70 handles this, but it's fragile.Consider documenting this behavior more clearly or simplifying by computing monthly percentages at the end rather than during iteration.
81-81: Unused parameterepisode_costsshould be removed.The docstring at line 84-85 acknowledges this is kept for backward compatibility. Consider removing it entirely if it's truly unused, or add
_ = episode_coststo silence the warning if backward compatibility is required.♻️ Proposed fix
-def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""): +def plot_dashboard(env, num_hours, max_nodes, save=True, show=True, suffix=""): """ Per-hour dashboard: price, nodes, queue, reward components, etc. - NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore. - Cumulative savings now lives in plot_cumulative_savings(). """Or if backward compatibility is truly needed:
def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""): + del episode_costs # Unused, kept for backward compatibility
290-290: Ambiguous variable namel.The variable
lcan be confused with1(one) in certain fonts. Use a more descriptive name.♻️ Proposed fix
- labels = [l.get_label() for l in lines] + labels = [line.get_label() for line in lines]train.py (4)
156-159: Remove extraneous f-prefix from plain string.Line 156 uses an f-string without any placeholders.
Suggested fix
- print(f"Starting a new model training...") + print("Starting a new model training...")
259-275: Avoid hardcodingmax_nodes=335; retrieve from environment.The hardcoded value will break if the environment's node count changes. The commented-out code shows the right approach.
Suggested fix
plot_dashboard( env, num_hours=args.dashboard_hours, - # max_nodes=env.max_nodes if hasattr(env, "max_nodes") else env.num_nodes if hasattr(env, "num_nodes") else 0, - max_nodes=335, + max_nodes=getattr(env, "max_nodes", getattr(env, "num_nodes", 335)),
287-295: Minor issues in training loop instrumentation.
- Line 288-289: The progress log is confusingly worded - it says "at {iters + 1}" but
iterswas already incremented on line 286.- Line 294: f-string without placeholders on
tb_log_name=f"PPO".Suggested fixes
iters += 1 t0 = time.time() - if (iters+1)%10==0: - print(f"Running... at {iters + 1} of {STEPS_PER_ITERATION * (iters + 1)} steps") + if iters % 10 == 0: + print(f"Running iteration {iters}, cumulative steps: {STEPS_PER_ITERATION * iters}") if args.iter_limit > 0 and iters > args.iter_limit: print(f"iterations limit ({args.iter_limit}) reached: {iters}.") break try: - model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name=f"PPO", callback=ComputeClusterCallback()) + model.learn(total_timesteps=STEPS_PER_ITERATION, reset_num_timesteps=False, tb_log_name="PPO", callback=ComputeClusterCallback())
152-159: Consider increasingn_stepsor documenting the rationale for the small value.The
n_steps=64is significantly smaller than stable-baselines3's default of 2048. For this environment with long episodes (336 hours), SB3 best practices recommend largern_steps(1024–2048) to reduce correlated data and improve sample efficiency. The current configuration results in ~5 updates per episode, compared to ~1 per 6 episodes with the default, which increases variance and may impact learning stability. If the intent is faster iteration for development/testing, consider adding a comment explaining the choice or making it configurable via command-line argument.train_iter.py (1)
175-178: Consider improving readability of long function call.Line 178 is quite long and could benefit from line breaks for readability.
Suggested improvement
for combo in combinations: efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight = combo print(f"Running with weights: efficiency={efficiency_weight}, price={price_weight}, idle={idle_weight}, job_age={job_age_weight}, drop={drop_weight}") - run(efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, args.iter_limit_per_step, args.session, args.prices, args.job_durations, args.jobs, args.hourly_jobs,plot_dashboard=args.plot_dashboard,dashboard_hours=args.dashboard_hours) + run( + efficiency_weight, price_weight, idle_weight, job_age_weight, drop_weight, + args.iter_limit_per_step, args.session, + args.prices, args.job_durations, args.jobs, args.hourly_jobs, + plot_dashboard=args.plot_dashboard, + dashboard_hours=args.dashboard_hours, + )test/test_sanity_env.py (3)
48-101: Consider using imported constants instead of hardcoded values.The invariant checks use hardcoded values (170, 96, 168, 16) that should match environment constants. The comment at line 84 acknowledges this, but since you already import
MAX_JOB_DURATION,MAX_NODES_PER_JOB,CORES_PER_NODE, etc., consider using them for consistency.Example for one check
- assert np.all((dur >= 0) & (dur <= 170)), f"duration out of bounds min={dur.min()} max={dur.max()}" + assert np.all((dur >= 0) & (dur <= MAX_JOB_DURATION)), f"duration out of bounds min={dur.min()} max={dur.max()}"
243-246: Fix indentation of nested function.The
norm_pathfunction has inconsistent indentation (extra leading space).Suggested fix
- # Train.py passes strings; the env treats "" as falsy in some places and truthy in others. + # Train.py passes strings; the env treats "" as falsy in some places and truthy in others. # To be safe: normalize "" -> None here. def norm_path(x): return None if (x is None or str(x).strip() == "") else x
336-359: Consider removing or documenting inline debugging test.Lines 336-359 appear to be inline debugging/verification code that duplicates the formal
determinism_test. If this is intentional for quick sanity checks, consider adding a comment explaining its purpose, or removing it if it's leftover debugging code.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/baseline.pysrc/environment.pysrc/job_management.pysrc/metrics_tracker.pysrc/plotter.pytest/test_sanity_env.pytrain.pytrain_iter.py
🧰 Additional context used
📓 Path-based instructions (2)
train.py
📄 CodeRabbit inference engine (CLAUDE.md)
train.py: Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)
Files:
train.py
{train.py,weights.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)
Files:
train.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime
Applied to files:
src/job_management.pysrc/environment.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to {train.py,weights.py} : Use weighted reward components for training: efficiency-weight (default 0.7), price-weight (default 0.2), idle-weight (default 0.1), and job-age-weight (default 0.0)
Applied to files:
train_iter.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)
Applied to files:
train_iter.pytrain.pysrc/plotter.pysrc/environment.pysrc/metrics_tracker.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Implement stable-baselines3 PPO integration in the training script with tensorboard logging and model checkpointing every 100K steps
Applied to files:
train.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Applied to files:
src/environment.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler
Applied to files:
src/environment.py
🧬 Code graph analysis (3)
train_iter.py (1)
src/plotter.py (1)
plot_dashboard(81-222)
train.py (2)
src/plotter.py (2)
plot_dashboard(81-222)plot_cumulative_savings(225-316)src/plot.py (1)
plot_cumulative_savings(136-276)
src/environment.py (3)
src/metrics_tracker.py (2)
reset_episode_metrics(43-80)reset_state_metrics(38-41)test/test_sanity_env.py (1)
reset(321-327)src/prices_deterministic.py (1)
reset(52-76)
🪛 Ruff (0.14.11)
test/test_sanity_env.py
126-126: Unpacked variable obs is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
150-150: Unpacked variable obs is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
150-150: Unpacked variable r is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
150-150: Unpacked variable info is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
281-281: Unpacked variable nodes is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
281-281: Unpacked variable prices is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
341-341: Unpacked variable o1 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
342-342: Unpacked variable t1 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
342-342: Unpacked variable tr1 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
344-344: Unpacked variable o2 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
345-345: Unpacked variable t2 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
345-345: Unpacked variable tr2 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
383-383: Unpacked variable r is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
383-383: Unpacked variable info is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
train_iter.py
119-119: subprocess call: check for execution of untrusted input
(S603)
train.py
156-156: f-string without any placeholders
Remove extraneous f prefix
(F541)
276-276: Do not catch blind exception: Exception
(BLE001)
294-294: f-string without any placeholders
Remove extraneous f prefix
(F541)
317-317: Do not catch blind exception: Exception
(BLE001)
src/plotter.py
81-81: Unused function argument: episode_costs
(ARG001)
290-290: Ambiguous variable name: l
(E741)
🔇 Additional comments (23)
src/environment.py (3)
175-176: Initialization order may cause issues withself.prices.
reset_timeline_state()at line 175 accessesself.prices.predicted_prices(line 276), butself.pricesis initialized at line 121, so this should work. However, the call order means the state is initialized twice—once inreset_timeline_state()during__init__, and again inreset()when the environment is first reset. This is intentional for the carry-over flow but worth noting.
331-333: LGTM - Episode-level metrics propagation is consistent.The episode-level metrics (
episode_jobs_submitted,episode_jobs_rejected_queue_full,episode_jobs_completed,episode_total_job_wait_time,episode_jobs_dropped) are correctly incremented alongside their cumulative counterparts, maintaining parallel tracking.Also applies to: 377-379
401-402: LGTM - Baseline and cost episode metrics are properly tracked.Episode-level baseline metrics and costs are correctly propagated from the
baseline_metricsdict and accumulated intoepisode_total_cost,episode_baseline_cost, andepisode_baseline_cost_off.Also applies to: 437-448, 446-447, 468-470
src/metrics_tracker.py (3)
7-14: LGTM - Clean initialization pattern.The constructor properly separates cumulative metrics initialization from state/episode metrics, establishing a clear hierarchy.
62-76: LGTM - Comprehensive episode-level metrics.The episode-level metrics for both agent and baseline are well-structured and mirror the cumulative metrics, enabling accurate per-episode reporting.
93-101: LGTM - Correct episode-level calculations.The calculations for
avg_wait_time,completion_rate, anddrop_rateproperly use episode-level counters with appropriate zero-division guards.src/job_management.py (2)
159-167: LGTM - Backward-compatible episode metrics updates.The presence checks (
if 'episode_*' in metrics) allow this function to work with both old and new metrics dictionaries, maintaining backward compatibility.
187-193: LGTM - Consistent pattern for drop metrics.The episode-level drop tracking mirrors the completion tracking pattern, with proper presence checks for both baseline and non-baseline paths.
src/baseline.py (2)
42-46: LGTM - Episode metrics for baseline submissions.Correctly increments
episode_baseline_jobs_submittedandepisode_baseline_jobs_rejected_queue_fullwith presence checks for backward compatibility.
61-63: LGTM - Episode max queue size tracking.Properly tracks per-episode maximum queue size for baseline, with the inner condition correctly checking against the episode-level threshold.
src/plotter.py (2)
8-16: LGTM - Robust series normalization helper.The
_as_seriesfunction correctly handlesNone, arrays of varying lengths, and padding withNaNvalues.
225-316: LGTM - Comprehensive cumulative savings visualization.The function properly handles dual-axis plotting, combines legends, and provides a summary statistics box. The return value with computed statistics is useful for programmatic access.
train.py (4)
5-11: LGTM!Import changes are appropriate -
plot_dashboardandplot_cumulative_savingsfrom the new plotter module, andtimefor performance instrumentation.
60-62: LGTM!New CLI options for dashboard plotting are well-defined with sensible defaults (336 hours = 2 weeks).
298-318: LGTM on dashboard plotting integration.The exception handling around
plot_dashboardis appropriate for non-fatal plotting operations. UsingExceptionhere is acceptable since you want to catch any plotting failure without crashing training.
218-251: Good implementation of evaluation metrics.The evaluation output correctly includes per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections), aligning with the coding guidelines.
train_iter.py (2)
84-98: LGTM!The extended function signature with
plot_dashboardanddashboard_hoursparameters is clean and maintains backward compatibility with defaults.
114-122: LGTM on subprocess forwarding.The subprocess call constructs commands from user-provided CLI arguments, which is appropriate for this local training orchestration tool. The Ruff S603 warning is a false positive in this context.
test/test_sanity_env.py (5)
1-11: Good documentation and usage examples.The docstrings provide clear usage examples for different test scenarios (quick run, full belt-and-braces, with external data).
111-142: Well-designed determinism test.The test correctly pins
price_start_indexto ensure external price variation doesn't affect determinism checking. The unusedobsvariable from unpacking (line 126) is a common pattern in RL code and can be safely ignored.
144-177: Excellent carry-over continuity test.This test directly validates the PR's core objective - carrying state between episodes rather than resetting to zero. The comprehensive snapshot comparison ensures all critical state components (nodes, job_queue, predicted_prices, running_jobs, price_index, etc.) are preserved across resets.
377-377: Clarify whycheck_obs_is_copyis commented out.If there's a known issue with observation aliasing, consider adding a TODO or comment explaining why this check is disabled.
398-406: Good test orchestration structure.The conditional test execution (gym compliance → invariants → determinism → carry-over) provides flexibility for different testing scenarios.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| def reset_state_metrics(self): | ||
| """Reset timeline-dependent metrics.""" | ||
| self.current_hour = 0 | ||
| self.reset_episode_metrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset_state_metrics resets current_hour but reset_episode_metrics does not.
This is the expected behavior for the non-carry-over case, but confirms the issue noted in environment.py: in carry-over mode, calling only reset_episode_metrics() will not reset current_hour, causing episode boundary detection to fail.
If carry-over mode is intended to span multiple episodes without resetting time, then current_hour should be reset separately in reset() when carry_over_state=True:
🐛 Proposed fix in environment.py reset()
def reset(self, seed=None, options=None):
# ...
# Reset metrics
if self.carry_over_state:
+ self.metrics.current_hour = 0 # Reset hour for new episode
self.metrics.reset_episode_metrics()
else:
self.metrics.reset_state_metrics()… now checks for determinism and carry-over continuitiy. Add carry-over state handling in the refactored environment by initializing timeline state on construction and skipping resets when enabled, while separating cumulative from episode-scoped metrics. Update MetricsTracker to track episode_* counters, propagate those counters through job assignment and baseline steps, and ensure episode summaries use per-episode values. Extend sanity tests and metrics usage to align with the refactor (metrics.current_hour, carry-over continuity checks).
2e15d30 to
2c9ade7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/environment.py`:
- Around line 175-177: The constructor calls reset_timeline_state() before
Prices.reset(), so reset_timeline_state() tries to copy
self.prices.predicted_prices while it's still None; fix by invoking
self.prices.reset(start_index=0) before calling self.reset_timeline_state() in
__init__ (same order used in reset()), ensuring predicted_prices is initialized
before reset_timeline_state() uses self.prices.predicted_prices.copy().
In `@src/plotter.py`:
- Around line 81-86: The parameter episode_costs on plot_dashboard is
intentionally unused but triggers Ruff ARG001; silence the warning by explicitly
using it in the function body (e.g., assign it to a throwaway variable or delete
it) so the signature stays the same but the linter treats it as used; update the
body of plot_dashboard to include a line like "_ = episode_costs" (or "del
episode_costs") immediately after the docstring.
In `@test/test_sanity_env.py`:
- Around line 118-137: In the rollout() function and other places where you
unpack but don't use values (e.g., the env.reset and env.step returns and
variables like obs, r, info, nodes, prices), prefix those unused variable names
with an underscore (e.g., _obs, _r, _info, _nodes, _prices) so the linter RUF059
is satisfied; update every unpacking site (including the env.reset(...) return
and env.step(...) tuples in rollout and the other similar blocks referenced) to
use the underscore-prefixed names consistently.
- Around line 166-176: Update the carry-over test to expect metrics.current_hour
to be reset to 0 after calling ComputeClusterEnv.reset() instead of matching the
saved snapshot; specifically modify the assertion that references
env.metrics.current_hour (in test/test_sanity_env.py) so it asserts
env.metrics.current_hour == 0 (reflecting the reset behavior in
ComputeClusterEnv.reset) rather than comparing to snapshot["current_hour"].
♻️ Duplicate comments (1)
src/plotter.py (1)
89-179: Useenv.metricsfor costs, job counts, and time‑series (current code can crash or show zeros).
plot_dashboard()still readsenv.jobs_completed,env.total_cost,env.price_stats, etc., but these live onenv.metrics. This will either raiseAttributeErroror render empty panels. Please route all metric reads throughmetrics.🛠️ Proposed fix
- base_cost = float(getattr(env, "baseline_cost", 0.0)) - base_cost_off = float(getattr(env, "baseline_cost_off", 0.0)) - agent_cost = float(getattr(env, "total_cost", 0.0)) + base_cost = float(getattr(metrics, "baseline_cost", 0.0)) + base_cost_off = float(getattr(metrics, "baseline_cost_off", 0.0)) + agent_cost = float(getattr(metrics, "total_cost", 0.0)) @@ - f"Jobs: {env.jobs_completed}/{env.jobs_submitted} ({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={env.max_queue_size_reached}) | " - f"Base: {env.baseline_jobs_completed}/{env.baseline_jobs_submitted} ({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={env.baseline_max_queue_size_reached})" + f"Jobs: {getattr(metrics, 'jobs_completed', 0)}/{getattr(metrics, 'jobs_submitted', 0)} " + f"({completion_rate:.0f}%, wait={avg_wait:.1f}h, Q={getattr(metrics, 'max_queue_size_reached', 0)}) | " + f"Base: {getattr(metrics, 'baseline_jobs_completed', 0)}/{getattr(metrics, 'baseline_jobs_submitted', 0)} " + f"({baseline_completion_rate:.0f}%, wait={baseline_avg_wait:.1f}h, Q={getattr(metrics, 'baseline_max_queue_size_reached', 0)})" @@ - add_panel("Electricity price", getattr(env, "price_stats", None), "€/MWh", None) + add_panel("Electricity price", getattr(metrics, "price_stats", None), "€/MWh", None) @@ - add_panel("Online nodes", getattr(env, "on_nodes", None), "count", (0, max_nodes * 1.1)) + add_panel("Online nodes", getattr(metrics, "on_nodes", None), "count", (0, max_nodes * 1.1)) @@ - add_panel("Used nodes", getattr(env, "used_nodes", None), "count", (0, max_nodes)) + add_panel("Used nodes", getattr(metrics, "used_nodes", None), "count", (0, max_nodes)) @@ - add_panel("Job queue & running jobs", - getattr(env, "job_queue_sizes", None), + add_panel("Job queue & running jobs", + getattr(metrics, "job_queue_sizes", None), "jobs", None, overlay=("Running jobs", running_series), ) @@ - add_panel("Efficiency reward (%)", getattr(env, "eff_rewards", None), "score", None) + add_panel("Efficiency reward (%)", getattr(metrics, "eff_rewards", None), "score", None) @@ - add_panel("Price reward (%)", getattr(env, "price_rewards", None), "score", None) + add_panel("Price reward (%)", getattr(metrics, "price_rewards", None), "score", None) @@ - add_panel("Idle penalty (%)", getattr(env, "idle_penalties", None), "score", None) + add_panel("Idle penalty (%)", getattr(metrics, "idle_penalties", None), "score", None) @@ - add_panel("Job-age penalty (%)", getattr(env, "job_age_penalties", None), "score", None) + add_panel("Job-age penalty (%)", getattr(metrics, "job_age_penalties", None), "score", None) @@ - add_panel("Total reward", getattr(env, "rewards", None), "reward", None) + add_panel("Total reward", getattr(metrics, "rewards", None), "reward", None)
🧹 Nitpick comments (1)
src/plotter.py (1)
305-308: Minor readability: avoid single‑letterlin list comprehension.Ruff flags this as ambiguous (E741).
✏️ Suggested tweak
- labels = [l.get_label() for l in lines] + labels = [line.get_label() for line in lines]
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/baseline.pysrc/environment.pysrc/job_management.pysrc/metrics_tracker.pysrc/plotter.pytest/test_sanity_env.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/baseline.py
- src/job_management.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Set environment constraints as: 2-week episodes (336 hours), max 1000 jobs in queue, max 1500 new jobs per hour, max 170 hours job runtime
Applied to files:
src/environment.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to environment.py : Configure cluster parameters as: 335 nodes maximum, 96 cores per node, up to 16 nodes per job, 150W idle power, 450W active power per node
Applied to files:
src/environment.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to sampler_hourly.py : Sample job characteristics independently (job count, duration, nodes, cores-per-node) from hour-of-day distributions in the Hourly Sampler
Applied to files:
src/environment.py
📚 Learning: 2025-12-04T08:51:13.779Z
Learnt from: CR
Repo: FairRootGroup/powersched PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-04T08:51:13.779Z
Learning: Applies to train.py : Structure evaluation output to include per-episode metrics (cost, savings, completion rate, wait time, queue size) and cumulative analysis (total savings, monthly reduction, annual projections)
Applied to files:
src/environment.pysrc/plotter.pysrc/metrics_tracker.py
🧬 Code graph analysis (1)
src/environment.py (3)
src/metrics_tracker.py (2)
reset_episode_metrics(43-80)reset_state_metrics(38-41)test/test_sanity_env.py (1)
reset(321-327)src/prices_deterministic.py (1)
reset(52-76)
🪛 Ruff (0.14.11)
test/test_sanity_env.py
126-126: Unpacked variable obs is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
150-150: Unpacked variable obs is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
150-150: Unpacked variable r is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
150-150: Unpacked variable info is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
281-281: Unpacked variable nodes is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
281-281: Unpacked variable prices is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
341-341: Unpacked variable o1 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
342-342: Unpacked variable t1 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
342-342: Unpacked variable tr1 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
344-344: Unpacked variable o2 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
345-345: Unpacked variable t2 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
345-345: Unpacked variable tr2 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
383-383: Unpacked variable r is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
383-383: Unpacked variable info is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
src/plotter.py
81-81: Unused function argument: episode_costs
(ARG001)
307-307: Ambiguous variable name: l
(E741)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| self.reset_timeline_state() | ||
| self.metrics.reset_episode_metrics() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, find the Prices class and check __init__
rg -n "class Prices" src/ -A 20 | head -50Repository: FairRootGroup/powersched
Length of output: 2426
🏁 Script executed:
# Also check the environment.py file at the mentioned lines
sed -n '160,190p' src/environment.pyRepository: FairRootGroup/powersched
Length of output: 1410
🏁 Script executed:
# And check lines 268-277 as mentioned
sed -n '260,285p' src/environment.pyRepository: FairRootGroup/powersched
Length of output: 1050
🏁 Script executed:
# Now let's look at the reset_timeline_state implementation
rg -n "def reset_timeline_state" src/environment.py -A 10Repository: FairRootGroup/powersched
Length of output: 577
🏁 Script executed:
# Search for reset method in Prices class
rg -n "def reset" src/prices_deterministic.py -A 15Repository: FairRootGroup/powersched
Length of output: 796
🏁 Script executed:
# Check the full environment.__init__ to see the complete sequence
rg -n "def __init__" src/environment.py -A 80 | head -120Repository: FairRootGroup/powersched
Length of output: 3707
🏁 Script executed:
# Also check if Prices.reset() is called anywhere in environment.__init__
rg -n "self.prices.reset" src/environment.pyRepository: FairRootGroup/powersched
Length of output: 183
🏁 Script executed:
# Get full context around these critical lines
sed -n '115,250p' src/environment.py | cat -nRepository: FairRootGroup/powersched
Length of output: 6999
reset_timeline_state() attempts to copy self.prices.predicted_prices while it is still None.
In __init__ at line 175, reset_timeline_state() is called before Prices.reset() is invoked. However, Prices.__init__ initializes predicted_prices = None (line 19), and it only gets populated when Prices.reset() is called. This causes AttributeError: 'NoneType' object has no attribute 'copy' at line 277 when trying to execute self.prices.predicted_prices.copy().
Call self.prices.reset(start_index=0) before self.reset_timeline_state() in __init__, matching the pattern used in the reset() method (lines 120–123).
🤖 Prompt for AI Agents
In `@src/environment.py` around lines 175 - 177, The constructor calls
reset_timeline_state() before Prices.reset(), so reset_timeline_state() tries to
copy self.prices.predicted_prices while it's still None; fix by invoking
self.prices.reset(start_index=0) before calling self.reset_timeline_state() in
__init__ (same order used in reset()), ensuring predicted_prices is initialized
before reset_timeline_state() uses self.prices.predicted_prices.copy().
| def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""): | ||
| """ | ||
| Per-hour dashboard: price, nodes, queue, reward components, etc. | ||
| NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore. | ||
| Cumulative savings now lives in plot_cumulative_savings(). | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silence Ruff’s unused-parameter warning for episode_costs.
The parameter is kept for backwards compatibility but is unused, so Ruff (ARG001) will complain. Mark it intentionally unused to keep lint clean without breaking the signature.
🧹 Proposed fix
def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""):
"""
Per-hour dashboard: price, nodes, queue, reward components, etc.
NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore.
Cumulative savings now lives in plot_cumulative_savings().
"""
+ _ = episode_costs # backward-compat placeholder📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""): | |
| """ | |
| Per-hour dashboard: price, nodes, queue, reward components, etc. | |
| NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore. | |
| Cumulative savings now lives in plot_cumulative_savings(). | |
| """ | |
| def plot_dashboard(env, num_hours, max_nodes, episode_costs=None, save=True, show=True, suffix=""): | |
| """ | |
| Per-hour dashboard: price, nodes, queue, reward components, etc. | |
| NOTE: episode_costs is accepted for backwards compatibility but NOT used here anymore. | |
| Cumulative savings now lives in plot_cumulative_savings(). | |
| """ | |
| _ = episode_costs # backward-compat placeholder |
🧰 Tools
🪛 Ruff (0.14.11)
81-81: Unused function argument: episode_costs
(ARG001)
🤖 Prompt for AI Agents
In `@src/plotter.py` around lines 81 - 86, The parameter episode_costs on
plot_dashboard is intentionally unused but triggers Ruff ARG001; silence the
warning by explicitly using it in the function body (e.g., assign it to a
throwaway variable or delete it) so the signature stays the same but the linter
treats it as used; update the body of plot_dashboard to include a line like "_ =
episode_costs" (or "del episode_costs") immediately after the docstring.
| def rollout(): | ||
| env = make_env() | ||
| # Pin external price window so determinism doesn't vary by episode. | ||
| obs, _ = env.reset(seed=seed, options={"price_start_index": 0}) | ||
| traj = [] | ||
| done = False | ||
| i = 0 | ||
| while not done and i < n_steps: | ||
| obs, r, term, trunc, info = env.step(actions[i]) | ||
| # record a small fingerprint | ||
| traj.append(( | ||
| float(r), | ||
| float(info.get("step_cost", 0.0)), | ||
| int(info.get("num_unprocessed_jobs", -1)), | ||
| int(info.get("num_on_nodes", -1)), | ||
| int(info.get("dropped_this_episode", -1)), | ||
| )) | ||
| done = term or trunc | ||
| i += 1 | ||
| env.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ruff RUF059: unused unpacked variables.
Several unpacked values aren’t used (e.g., obs, r, info, nodes, prices). Prefix them with _ to keep lint clean.
🧹 Suggested clean‑up
- obs, r, term, trunc, info = env.step(actions[i])
+ _obs, r, term, trunc, info = env.step(actions[i])
@@
- obs, r, term, trunc, info = env.step(action)
+ _obs, _r, term, trunc, _info = env.step(action)
@@
- nodes, q, prices = _extract(obs)
+ _nodes, q, _prices = _extract(obs)
@@
- o1, _ = env.reset(seed=seed)
+ _o1, _ = env.reset(seed=seed)
- o1s, r1, t1, tr1, i1 = env.step(action)
+ o1s, r1, _t1, _tr1, i1 = env.step(action)
@@
- o2, _ = env.reset(seed=seed)
+ _o2, _ = env.reset(seed=seed)
- o2s, r2, t2, tr2, i2 = env.step(action)
+ o2s, r2, _t2, _tr2, i2 = env.step(action)
@@
- obs, r, term, trunc, info = env.step(action)
+ obs, _r, term, trunc, _info = env.step(action)Also applies to: 145-152, 281-283, 341-346, 383-384
🧰 Tools
🪛 Ruff (0.14.11)
126-126: Unpacked variable obs is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
In `@test/test_sanity_env.py` around lines 118 - 137, In the rollout() function
and other places where you unpack but don't use values (e.g., the env.reset and
env.step returns and variables like obs, r, info, nodes, prices), prefix those
unused variable names with an underscore (e.g., _obs, _r, _info, _nodes,
_prices) so the linter RUF059 is satisfied; update every unpacking site
(including the env.reset(...) return and env.step(...) tuples in rollout and the
other similar blocks referenced) to use the underscore-prefixed names
consistently.
| env.reset(seed=seed) | ||
|
|
||
| assert np.array_equal(env.state["nodes"], snapshot["nodes"]), "carry-over failed: nodes reset" | ||
| assert np.array_equal(env.state["job_queue"], snapshot["job_queue"]), "carry-over failed: job_queue reset" | ||
| assert np.array_equal(env.state["predicted_prices"], snapshot["predicted_prices"]), "carry-over failed: predicted_prices reset" | ||
| assert np.array_equal(env.cores_available, snapshot["cores_available"]), "carry-over failed: cores_available reset" | ||
| assert env.running_jobs == snapshot["running_jobs"], "carry-over failed: running_jobs reset" | ||
| assert env.prices.price_index == snapshot["price_index"], "carry-over failed: price_index reset" | ||
| assert env.next_job_id == snapshot["next_job_id"], "carry-over failed: next_job_id reset" | ||
| assert env.next_empty_slot == snapshot["next_empty_slot"], "carry-over failed: next_empty_slot reset" | ||
| assert env.metrics.current_hour == snapshot["current_hour"], "carry-over failed: current_hour reset" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Carry‑over test should expect current_hour to reset.
ComputeClusterEnv.reset() sets metrics.current_hour = 0 in carry‑over mode, so asserting it stays unchanged will fail. Update the assertion to reflect the intended per‑episode hour reset.
🧪 Proposed fix
- assert env.metrics.current_hour == snapshot["current_hour"], "carry-over failed: current_hour reset"
+ assert env.metrics.current_hour == 0, "carry-over failed: current_hour should reset per episode"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| env.reset(seed=seed) | |
| assert np.array_equal(env.state["nodes"], snapshot["nodes"]), "carry-over failed: nodes reset" | |
| assert np.array_equal(env.state["job_queue"], snapshot["job_queue"]), "carry-over failed: job_queue reset" | |
| assert np.array_equal(env.state["predicted_prices"], snapshot["predicted_prices"]), "carry-over failed: predicted_prices reset" | |
| assert np.array_equal(env.cores_available, snapshot["cores_available"]), "carry-over failed: cores_available reset" | |
| assert env.running_jobs == snapshot["running_jobs"], "carry-over failed: running_jobs reset" | |
| assert env.prices.price_index == snapshot["price_index"], "carry-over failed: price_index reset" | |
| assert env.next_job_id == snapshot["next_job_id"], "carry-over failed: next_job_id reset" | |
| assert env.next_empty_slot == snapshot["next_empty_slot"], "carry-over failed: next_empty_slot reset" | |
| assert env.metrics.current_hour == snapshot["current_hour"], "carry-over failed: current_hour reset" | |
| env.reset(seed=seed) | |
| assert np.array_equal(env.state["nodes"], snapshot["nodes"]), "carry-over failed: nodes reset" | |
| assert np.array_equal(env.state["job_queue"], snapshot["job_queue"]), "carry-over failed: job_queue reset" | |
| assert np.array_equal(env.state["predicted_prices"], snapshot["predicted_prices"]), "carry-over failed: predicted_prices reset" | |
| assert np.array_equal(env.cores_available, snapshot["cores_available"]), "carry-over failed: cores_available reset" | |
| assert env.running_jobs == snapshot["running_jobs"], "carry-over failed: running_jobs reset" | |
| assert env.prices.price_index == snapshot["price_index"], "carry-over failed: price_index reset" | |
| assert env.next_job_id == snapshot["next_job_id"], "carry-over failed: next_job_id reset" | |
| assert env.next_empty_slot == snapshot["next_empty_slot"], "carry-over failed: next_empty_slot reset" | |
| assert env.metrics.current_hour == 0, "carry-over failed: current_hour should reset per episode" |
🤖 Prompt for AI Agents
In `@test/test_sanity_env.py` around lines 166 - 176, Update the carry-over test
to expect metrics.current_hour to be reset to 0 after calling
ComputeClusterEnv.reset() instead of matching the saved snapshot; specifically
modify the assertion that references env.metrics.current_hour (in
test/test_sanity_env.py) so it asserts env.metrics.current_hour == 0 (reflecting
the reset behavior in ComputeClusterEnv.reset) rather than comparing to
snapshot["current_hour"].
Note: Not tested "in production" yet. But Continuity checks between consecutive episodes have passed so far.