Skip to content

Initial plan#5

Open
stevensdavid wants to merge 7 commits intov1.0from
claude/scripting-engine-design-4SxXe
Open

Initial plan#5
stevensdavid wants to merge 7 commits intov1.0from
claude/scripting-engine-design-4SxXe

Conversation

@stevensdavid
Copy link
Copy Markdown
Owner

No description provided.

Design for embedding Rhai scripting engine to enable complex monitors
with custom logic, conditional flows, polling, and multi-step orchestration
beyond what YAML-based monitors support.

https://claude.ai/code/session_01DT5Gh9uynVDN2xDcFFXbQu
@stevensdavid stevensdavid changed the base branch from main to v1.0 March 11, 2026 07:24
claude and others added 4 commits March 11, 2026 16:26
Adds Section 7 covering integration with the v1.0 database-backed
config system: dual representation (script_path for YAML, script for
DB/API), CRUD API validation, seeding flow, MonitorManager hot-reload
compatibility, and updated architecture diagram.

https://claude.ai/code/session_01DT5Gh9uynVDN2xDcFFXbQu
Fix async bridging (Handle::block_on panics from spawn_blocking),
ScriptRunner borrow violations, wrong add_monitor_result signature,
incorrect Rhai API, and wrong ApiError discriminant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduces a third monitor type alongside single-step and multi-step.
Scripted monitors run arbitrary Rhai scripts with access to HTTP, JSON
parsing, assertions, environment variables, and structured step recording.

- Add `script`, `script_path`, `script_timeout_seconds` fields to Monitor
- Validate three-way mutual exclusivity (url+method | steps | script)
- Resolve `script_path` to inline content at config load time
- New `src/scripting/` module: ScriptRunner, engine with resource limits,
  and host functions (http, assert, parse_json, env, uuid, step, logging)
- Branch monitor execution to ScriptRunner for scripted monitors
- Validate inline scripts on create/update via the CRUD API
- Add .cargo/config.toml to redirect target dir to native fs (virtiofs fix)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a Scripted Monitors section to README with YAML config examples,
full function reference table, and a step-chaining example that mirrors
the multi-step IP flow. Update project overview and feature roadmap.
Update AGENTS.md to reflect the new monitor type and scripting module.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@stevensdavid
Copy link
Copy Markdown
Owner Author

Code review

Found 1 issue:

  1. Duration metric for scripted monitors is computed incorrectly. After runner.execute(ctx).await completes, the code iterates over step results and computes Utc::now() - step.timestamp_started for each. Since Utc::now() is called well after all steps finished, every step's "duration" is inflated by the time of all subsequent steps. Then they're summed, producing a total that double/triple counts time. For a 3-step script where each step takes 100ms, this reports ~600ms instead of ~300ms. The non-scripted path correctly calls time_since(&step_started) immediately after each step completes (line 201, 240) and records total monitor duration once via time_since(&timestamp_started) (line 274).

let success = monitor_result.success;
app_state.metrics.duration.record(
monitor_result
.step_results
.iter()
.map(|s| {
Utc::now()
.signed_duration_since(s.timestamp_started)
.num_milliseconds() as u64
})
.sum::<u64>(),
&monitor_attributes,
);

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@stevensdavid
Copy link
Copy Markdown
Owner Author

Issue: .unwrap() on ScriptRunner::new() can panic the monitor task

The non-scripted execution path never panics — all errors are captured as failed MonitorResults. The scripted path uses .unwrap() which will panic the tokio task if compilation fails at runtime. The comment says "validated at load time" but API-created monitors are validated separately in the handler (not at "load time"), and a direct DB write or future code path could bypass validation. Should match the non-scripted pattern: return a failed MonitorResult instead of panicking.

// Script was validated at load time, so unwrap is safe here
let runner = ScriptRunner::new(script_content).unwrap();
let monitor_result = runner.execute(ctx).await;

@stevensdavid
Copy link
Copy Markdown
Owner Author

Issue: step() error message gets overwritten with Rhai-wrapped version

When an assertion fails inside a step() closure, two things happen: (1) the step records the clean error message (e.g., "health check failed") in step_results at line 24, and (2) the error is re-thrown to the Rhai engine at line 30, which wraps it as something like "Runtime error: health check failed (line 3, position 13)\nin closure call (line 2, position 9)". Then in execute() at lines 95-102, the code overwrites the step's already-correct error_message with this noisier Rhai-wrapped version. Users see implementation details instead of their clean assertion message.

Fix: in execute(), don't overwrite error_message if it's already set from within the step closure — only set it if error_message is None.

-> Result<Dynamic, Box<EvalAltResult>> {
let start = chrono::Utc::now();
let result = f.call_raw(&ncc, None, Vec::<rhai::Dynamic>::new());
let success = result.is_ok();
let error_msg = result.as_ref().err().map(|e| e.to_string());
let step_result = StepResult {
step_name: name,
timestamp_started: start,
success,
error_message: error_msg,
response: None,
trace_id: None,
span_id: None,
};
ctx_clone.step_results.lock().unwrap().push(step_result);
result
},

});
} else if !success {
// Mark the last step as failed if the overall script failed
if let Some(last) = step_results.last_mut() {
if last.success {
last.success = false;
last.error_message = error_message.clone();
}
}
}

@stevensdavid
Copy link
Copy Markdown
Owner Author

Issue: script_path serializes into DB records and API responses

script_path is documented as YAML-only (CLAUDE.md: "script_path (YAML-only) is resolved to inline script content at startup; the API only accepts inline script"), and the API handler correctly rejects it on input. But Monitor derives Serialize without #[serde(skip_serializing)] on this field, so every API GET response and DB record includes "script_path": null. This leaks an internal YAML-only concept into the public API surface. Consider adding #[serde(skip_serializing)] to the field.

// Fields for scripted monitors
pub script: Option<String>,
pub script_path: Option<String>,
pub script_timeout_seconds: Option<u64>,

@stevensdavid
Copy link
Copy Markdown
Owner Author

Issue: assertion_failures and metadata fields are dead code

ScriptContext allocates assertion_failures: Mutex<Vec<String>> and metadata: Mutex<HashMap<String, String>> on every script execution, both marked #[allow(dead_code)]. Neither field is ever read or written by any host function or execution logic. The assert functions raise Rhai runtime errors directly rather than accumulating into assertion_failures. These are speculative fields for features that don't exist yet, adding unnecessary allocations per execution.

pub step_results: Mutex<Vec<StepResult>>,
#[allow(dead_code)] // reserved for future soft_assert support
pub assertion_failures: Mutex<Vec<String>>,
#[allow(dead_code)] // reserved for future script metadata/context passing
pub metadata: Mutex<HashMap<String, String>>,
pub monitor_name: String,

@stevensdavid
Copy link
Copy Markdown
Owner Author

Issue: Test comment and logic don't account for scripted monitors

The test comment says "Verify one is single-step and one is multi-step" and uses !m.is_multi_step() as the filter for single-step. Now that the PR introduces a third monitor type (scripted), !is_multi_step() would also match scripted monitors. If a scripted monitor is added to the test YAML in the future, the count assertion will fail for a non-obvious reason. Consider updating the filter to explicitly check for single-step (e.g., !m.is_multi_step() && !m.is_scripted()) or adding a third category.

prodzilla/src/config.rs

Lines 113 to 123 in 9849a03

assert_eq!(2, config.monitors.len(), "Monitors length should be 2");
// Verify one is single-step and one is multi-step
let single_step_count = config
.monitors
.iter()
.filter(|m| !m.is_multi_step())
.count();
let multi_step_count = config.monitors.iter().filter(|m| m.is_multi_step()).count();
assert_eq!(1, single_step_count, "Should have 1 single-step monitor");
assert_eq!(1, multi_step_count, "Should have 1 multi-step monitor");
}

stevensdavid and others added 2 commits April 3, 2026 16:08
- Fix duration metric double-counting by using time_since(timestamp_started)
- Replace unwrap() on ScriptRunner::new() with failed MonitorResult
- Preserve clean assertion messages in step() by unwrapping Rhai error wrappers
- Add #[serde(skip_serializing)] to script_path (YAML-only field)
- Remove dead code: assertion_failures and metadata from ScriptContext
- Fix config test filter to exclude scripted monitors from single-step count
- Add regression tests for all fixed bugs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants