✨ Add kind provisioning and attempt behavior improvements#10
✨ Add kind provisioning and attempt behavior improvements#10jmontleon wants to merge 22 commits intokonveyor-ecosystem:mainfrom
Conversation
Add a reusable sub-recipe that provisions a kind cluster with OLM and OpenShift Console, returning authentication coordinates as structured JSON. Integrate it into the migration workflow so that projects detected as OpenShift Console dynamic plugins are automatically deployed and tested inside an OpenShift Console during Phase 3 validation. Detection checks for @openshift-console/dynamic-plugin-sdk in package.json, console-extensions.json, or ConsolePlugin CRs. When present the cluster is provisioned, the plugin image is built, loaded into kind, and verified inside the console. Changes propagated across all three runtime implementations per CLAUDE.md synchronization rules: - goose/recipes (sub-recipe + migration.yaml) - skills/code-migration (SKILL.md + agents/kind-cluster.md) - skills/code-migration-inline (SKILL.md with inline instructions) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…start The visual-fix sub-recipe previously started and stopped the dev server for every single page fix, causing repeated cold-start delays (30-120s each for webpack/vite compilation, HTTP polling, and post-ready buffer). For a project with 8 pages, this added 10-30 minutes of idle waiting. Restructure to start the dev server once before the fix loop and stop it once after all pages are processed. Code changes are picked up automatically via hot module replacement (HMR) with a 3-5 second wait. A fallback restarts the server if it crashes mid-loop. Propagated across all three runtime implementations per CLAUDE.md: - goose/recipes/subrecipes/visual-fix.yaml - agents/visual-fix.md - skills/code-migration-inline/targets/patternfly.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address multiple issues causing Goose to fail when starting webpack dev servers during console plugin migrations: - Accept HTTP 404 (curl exit code 22) when polling webpack on port 9001, since the server returns "Cannot GET /" before the console bridge connects - Replace generic startup flow with explicit if/else branching for console plugin (multi-stage) vs standard app (single server) startup paths - Add WRONG/RIGHT anti-pattern examples showing foreground vs background execution to prevent session hangs from running npm start without & - Instruct agents to write multi-line dev commands to a shell script file ($WORK_DIR/start-dev.sh) instead of passing them to bash -c, which silently drops newlines and breaks backgrounding, loops, and conditionals - Save console dev command as a .sh script file rather than a JSON string - Add max_turns: 250 to the migration recipe settings - Propagate all changes across recipes, skills, agents, and inline targets per CLAUDE.md sync requirements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
goose/recipes/migration.yaml
Outdated
| description: "Migrate applications between technology stacks using kantra static analysis and automated fixes." | ||
|
|
||
| settings: | ||
| max_turns: 250 |
There was a problem hiding this comment.
I do not think this has an effect yet. There is some work that was merged a couple weeks ago and an open issue so it's in progress at any rate.
The console bridge start script may run its container in detached mode (`podman run -d`) or in foreground mode (blocking). Previously the instructions always backgrounded the script with `&`, which is wrong for detached scripts — it hides startup errors. Now the agent is instructed to read the script first and choose the correct approach: background with `&` for blocking scripts, run inline for detached ones. Includes examples for both cases across all three format variants (Goose recipe, Claude Code skill, and inline skill). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kantra fails with "output dir already exists and --overwrite not set" when the workspace directory persists between runs or when the agent pre-creates the output directory with mkdir -p before running kantra. This was observed across multiple migration runs, forcing the agent to retry with --overwrite and wasting turns each time. Add --overwrite to every kantra analyze command template and update the kantra-command-builder instructions to note that --overwrite is provided by the caller, not the builder. Changes propagated across all three parallel systems: Goose recipes, Claude Code/Gemini skills, and the inline skill. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ipts Goose logs showed dev server fragility from multiple root causes: shell timeouts killing backgrounded processes, EADDRINUSE from leaked processes, subagents re-implementing startup logic instead of reusing scripts, and no cleanup on failure. Replace inline dev server examples with a mandatory start-dev.sh template that handles port cleanup, nohup+log redirection, PID files, and readiness polling. Add companion stop-dev.sh for clean shutdown. Update all 7 files across all 3 parallel systems (goose recipes, skills+agents, inline skill) and update Guidelines sections to reference the scripts instead of inline commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The main recipe was hitting its 250-turn limit before completing full migrations. The visual-captures subrecipe had no explicit setting and defaulted to 25 turns, which was insufficient for capturing all screenshots in plugins with many pages — the agent would run out of turns mid-capture and leave incomplete screenshot sets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run 4 logs showed subagents exhausting their turn budget at 30 turns
(visual_discovery, visual_captures, visual_compare all failed). The
agent was choosing max_turns: 30 on its own because the recipe gave
no guidance. Setting max_turns in the subrecipe YAML (visual-captures)
didn't help because the tool call override has highest precedence.
Fix: update the subagent invocation example in the main recipe to
include settings: {max_turns: 300} and add a bold instruction to
always pass it. Remove the inert settings.max_turns from the
visual-captures subrecipe since it was being overridden. Update
README to document the new approach.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run 4 logs showed the visual_captures subagent abandoning start-dev.sh after initial verification failed, then spending ~15 turns manually trying npx webpack serve and npm install — burning its entire turn budget with zero screenshots captured. Add explicit guardrails to all visual subrecipe/agent instructions: - Bold top-level warning: never run npm start, npx webpack serve, or any dev server command directly - Strengthen verification failure steps: "report the error and stop. Do not attempt alternative startup commands." - Simplify standard app startup to also use start-dev.sh/stop-dev.sh instead of inline nohup commands Updated in both goose subrecipes and claude/gemini agents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run 4 logs showed 3 subagents (visual_discovery, visual_captures, visual_compare) failing by exhausting turns. The main agent recovered each time by doing the work itself, but had no explicit guidance on how to handle these failures — it figured it out ad-hoc. Add explicit fallback instructions to the PatternFly target file for each visual subagent invocation: - visual_discovery: retry once, then create manifest manually - visual_captures: start dev server via start-dev.sh and capture missing screenshots manually with playwright-mcp - visual_compare: do the comparison manually using Python/PIL Updated in both goose recipe and skill target files. The inline skill doesn't use subagents so needs no change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run 5 logs showed kantra timing out on first attempt (PT300S shell timeout) then the agent trying 4 different background strategies before succeeding, wasting turns. Kantra regularly takes 5-15 minutes for large projects. Add explicit instructions to always run kantra with nohup, redirect to a log file, and poll for completion. Updated in all 3 parallel systems (goose recipe, skill, inline skill) for both initial analysis and per-round validation steps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs 4 and 5 showed the agent rationalizing away the hardest fix groups (e.g. Modal deprecated→new API) by classifying them as "deprecated but still works" and then renumbering subsequent groups to hide the gap. In run 5, original Group 6 (Modal migration) was skipped and relabeled, then Group 7 was renumbered to Group 6. Add two new guidelines to all 3 parallel systems: - Never renumber, relabel, or remove groups — mark unfixable groups with [!] notation instead of deleting them - Attempt every group — do not skip groups just because deprecated APIs still compile; only classify as unfixable after 2+ failed approaches that break the build or tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Runs 4 and 5 both showed the agent blindly applying Kantra's alignRight→alignEnd rename on FlexItem, which broke the build because PF6's types still use alignRight/alignLeft. The agent correctly reverted each time, but wasted a full fix round. Add instruction to verify the new name exists in the target framework's type definitions (check .d.ts files or run tsc --noEmit) before applying any prop or API rename suggested by Kantra. Updated in all 3 parallel systems. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
JonahSussman
left a comment
There was a problem hiding this comment.
I'm starting to become an expert at reading markdown, man
|
|
||
| # Kind Cluster Provisioner | ||
|
|
||
| You are provisioning a local Kubernetes environment. Follow every step in order. **Do not skip verification gates** — each step must succeed before proceeding to the next. |
There was a problem hiding this comment.
Do we want to say for it to return anything about any failures?
agents/visual-captures.md
Outdated
| fuser -k 9001/tcp 2>/dev/null || true | ||
| fuser -k 9000/tcp 2>/dev/null || true | ||
| podman stop migration-console okd-console 2>/dev/null || true | ||
| sleep 1 |
There was a problem hiding this comment.
Should it be a requirement that stop-dev.sh exists for the agent to run? I can foresee some issues if this agent executes on non-podman/OKD stuff.
And/or since we are using fuser -k here, we could put some thing like "use fuser -k to kill the process associated with the server if stop-dev.sh does not exist. Note that it may be a containerized service that has things associated with it."
This cleanup is also present on line 90 and 124.
agents/visual-captures.md
Outdated
| ```bash | ||
| bash <work_dir>/start-dev.sh | ||
| ``` | ||
| If `start-dev.sh` does not exist, write the dev command to `<work_dir>/start-dev.sh` first, then run it. |
There was a problem hiding this comment.
At the top we say that
The
start-dev.shscript created by the main agent handles all startup logic
But here we give this agent the authority to start up a dev server on its own.
There was a problem hiding this comment.
Similar comments as visual-captures.md
There was a problem hiding this comment.
Similar concerns as above
There was a problem hiding this comment.
Similar concerns as above
Key changes based on log analysis of forklift-console-plugin migration: - Add ESLint config backup/restore around pf-codemods (prevents constructor serialization corruption that wasted 5-10 turns/run) - Add known Kantra false positives table for PF6 (eliminates 30-50 turns/run spent verifying rules against type definitions) - Add test baseline recording before migration starts (pre-existing failures no longer block exit criteria) - Ban sed for import modifications (prevents broken multi-line imports) - Add CSS custom property migration guidance (catches silent pf-v5-chart failures) - Update exit criteria to allow documented false positives and baseline comparison for test results - Add post-codemods cleanup steps (prettier fix + import consolidation) All changes propagated to goose/recipes, skills/code-migration, and skills/code-migration-inline per CLAUDE.md sync requirements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. kind-cluster: Add error reporting — agents/subrecipes now return failure details (step, error message, diagnostics) instead of silently continuing past failed steps. 2. visual-captures, visual-fix: Remove podman-specific assumptions from fallback cleanup. Container stop commands now try both podman and docker. Cleanup comments explain that the dev server may be containerized so the agent checks for running containers too. 3. visual-captures, visual-fix: Remove contradiction where agents were told "Do NOT start dev servers manually" but then instructed to "write the dev command to start-dev.sh if it doesn't exist." Now agents consistently require start-dev.sh to exist (created by the main agent in Phase 1) and report an error if it's missing. All changes propagated across agents/, goose/recipes/subrecipes/, skills/code-migration/, and skills/code-migration-inline/ per CLAUDE.md sync requirements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on knowledge - Expand Kantra false positives table from 10 to 16 entries (spacer→gap, ButtonVariant.control, Modal title→titleText, ErrorState prop renames, CardHeader selectableActions, ToolbarFilter chips→labels) - Strengthen skip language to "do not re-verify against type definitions" to eliminate the ~21% of run time spent re-checking known false positives - Add hasNoBodyWrapper guidance for deprecated Modal with composable children to catch ~60px layout shifts during Phase 2 instead of visual fix phase - Extend pf-v5 CSS grep to include .ts/.tsx files for test file selectors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Visual compare: - Add efficiency guidance to load each image only once and batch reads, eliminating redundant 4x image loading observed in goose-run-7 - Add anti-aliasing noise threshold (<0.5% pixels, ≤15 max channel diff) to filter out unfixable subpixel differences before the fix phase — this eliminates ~20 unfixable items from the report Visual fix: - Require verification screenshot before marking any issue [x] — prevents bulk-marking 30+ issues as "not a regression" without actually verifying fixes, as observed in goose-run-7 - Leave unfixable issues as [ ] with documented reason instead of falsely marking them resolved Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of instructing the agent to skip known false positives (which it ignores), filter them out of kantra output before the agent sees them. The agent can't re-verify what it never receives. - New script: filter_kantra_false_positives.py (all 3 copies) - Recipe updated to run filter before kantra_output_helper analysis - patternfly.md updated to reference the filter script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of instructing the agent to check every Modal file (which it inconsistently does), run a deterministic script after pf-codemods. The script finds all deprecated Modal imports with composable children and adds hasNoBodyWrapper automatically. - New script: fix_deprecated_modal_wrapper.py (all 3 copies) - patternfly.md: replaced prose guidance with numbered step 5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of the agent loading each screenshot 10+ times and writing ad-hoc PIL scripts, a deterministic comparison script runs first. The agent only visually inspects screenshots flagged as "different", skipping identical and anti-aliasing-only pairs entirely. - New script: compare_screenshots.py (all 3 copies) - visual-compare subrecipe: use script output to skip identical images - agents/visual-compare.md: same changes for Claude/Gemini - inline skill: same changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Subagents ignored start-dev.sh and ran npx webpack serve manually. Fix: remove all server startup/shutdown logic from subagents. The main agent starts the server, verifies it's responsive, passes the verified dev_url, and stops the server after the subagent returns. Subagents now only verify the URL responds and report error if not. This makes it impossible for them to improvise startup commands. - visual-captures: removed startup/shutdown, receives dev_url (required) - visual-fix: removed startup/shutdown, receives dev_url (required) - patternfly.md (all 3): main agent wraps subagent calls with start/stop - agents/*.md: same changes for Claude/Gemini Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verbose ground rules diluted critical instructions. Condensed to 4-5 bullet points each. Added explicit prohibitions against two anti-patterns observed in logs: - Creating CSS override files (pf6-overrides.css) - Writing PIL/pixel analysis scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No description provided.