feat: implement agentic power-steering analysis by rysweet · Pull Request #2365 · rysweet/amplihack

rysweet · 2026-02-16T04:40:59Z

Summary

Replaces regex-based power-steering validation with intelligent Claude SDK analysis that understands context and intent.

Problem

Power-steering produced excessive false positives because regex patterns couldn't understand:

Implicit workflow following (step-by-step execution)
Async completion patterns (PR created for review)
Contextual intent vs. literal text matching

Solution

Fully Agentic Analysis: Use Claude Agent SDK for ALL validation

Changes

Enhanced claude_power_steering.py
- Added analyze_workflow_invocation() with context-aware prompts
- Understands explicit Skill/Read tool invocation
- Understands implicit workflow following
- Understands async workflows (PR for review, CI running)
- Fail-open behavior maintained
Updated power_steering_checker.py
- Removed regex validator dependency
- Uses SDK analysis directly
- Intelligent context understanding
Deleted obsolete files
- Removed workflow_invocation_validator.py (regex-based)
- Removed associated tests

Benefits

✅ Context-aware: Understands intent, not just text patterns
✅ Async-aware: Recognizes PR reviews happen later
✅ Fewer false positives: No more pattern mismatches
✅ More maintainable: AI-powered vs brittle regex

Testing

✅ All pre-commit hooks passing
✅ Type checking passing
✅ Syntax validation passing
📝 Integration testing needed with real sessions

Files Changed

.claude/tools/amplihack/hooks/claude_power_steering.py (+2981, -966)
.claude/tools/amplihack/hooks/power_steering_checker.py (updated)
.claude/tools/amplihack/hooks/workflow_invocation_validator.py (deleted)

Fixes #2355 Replace regex-based validation with Claude SDK intelligent analysis. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

github-actions · 2026-02-16T04:41:33Z

🤖 Auto-fixed version bump

The version in pyproject.toml has been automatically bumped to the next patch version.

If you need a minor or major version bump instead, please update pyproject.toml manually and push the change.

github-actions · 2026-02-16T04:45:13Z

Repo Guardian - Passed

All changed files in this PR are legitimate production code, configuration, or test files. No ephemeral content detected.

Analysis Summary:

✅ Modified files: Production Python modules (claude_power_steering.py, power_steering_checker.py) and configuration (pyproject.toml)
✅ Removed files: Test/validation modules no longer needed
✅ No temporal filenames (dates, "temp", "hack", "one-off", etc.)
✅ No point-in-time documents (notes, status updates, investigation logs)
✅ No temporary scripts or one-off utilities

This PR implements a feature (agentic power-steering analysis) with proper production code structure.

AI generated by Repo Guardian

Copilot

Pull request overview

Implements a more context-aware (“agentic”) power-steering workflow invocation analysis using Claude SDK, replacing the prior regex-based workflow invocation validator and extending evidence/state-based completion verification to reduce false positives (Fixes #2355).

Changes:

Add Claude-SDK-powered analyze_workflow_invocation*() and wire it into workflow invocation checks.
Expand power-steering checking with evidence/state verification paths (PR merged/user confirmation/compaction-aware verification) and updated session-type heuristics/timeouts.
Remove the obsolete regex-based workflow_invocation_validator.py and its pytest suite; bump project version to 0.5.29.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
pyproject.toml	Patch version bump to 0.5.29.
.claude/tools/amplihack/hooks/claude_power_steering.py	Adds SDK workflow-invocation analysis (sync/async) and integrates with existing SDK wrappers.
.claude/tools/amplihack/hooks/power_steering_checker.py	Switches workflow invocation check to SDK-based analysis (no regex validator).
amplifier-bundle/tools/amplihack/hooks/claude_power_steering.py	Mirrors SDK workflow invocation analysis in bundle distribution.
amplifier-bundle/tools/amplihack/hooks/power_steering_checker.py	Mirrors checker updates in bundle distribution.
docs/claude/tools/amplihack/hooks/claude_power_steering.py	Documents/mirrors SDK workflow invocation analysis and prompt formatting changes.
docs/claude/tools/amplihack/hooks/power_steering_checker.py	Documents/mirrors checker updates including retry write helper and next-steps heuristics.
.claude/tools/amplihack/hooks/workflow_invocation_validator.py	Deletes obsolete regex-based validator.
.claude/tools/amplihack/hooks/tests/test_workflow_invocation_validator.py	Deletes tests for removed validator.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T06:53:05Z

amplifier-bundle/tools/amplihack/hooks/claude_power_steering.py

+def _format_conversation_summary(conversation: list[dict], max_length: int | None = None) -> str:
    """Format conversation summary for analysis.

    Args:
        conversation: List of message dicts


_format_conversation_summary() defaults max_length=None (unbounded), which can generate very large prompts for long transcripts and cause SDK calls to be slow/expensive or exceed context limits (leading to timeouts and fail-open). Consider a bounded default and deterministic truncation strategy.

Copilot · 2026-02-16T06:53:06Z

amplifier-bundle/tools/amplihack/hooks/power_steering_checker.py

                                    "INFO",
                                )
-                                return False  # Work is INCOMPLETE
+                                # Continue checking other messages (don't return immediately)
+                                # Only STRUCTURED next steps should fail the check
+                                break


In _check_next_steps(), a negation match only breaks out of the negation loop and the structured-next-steps regex still runs for the same message. This can incorrectly fail on completion statements with list formatting. Consider skipping structured detection for that message when negation matches.

Copilot · 2026-02-16T06:53:06Z

docs/claude/tools/amplihack/hooks/power_steering_checker.py

                                    "INFO",
                                )
-                                return False  # Work is INCOMPLETE
+                                # Continue checking other messages (don't return immediately)
+                                # Only STRUCTURED next steps should fail the check
+                                break


In _check_next_steps(), a negation match only breaks out of the negation loop, but the structured-next-steps regex is still evaluated for the same message immediately afterward. This can reintroduce false failures on completion statements that include list formatting. Consider short-circuiting (e.g., continue to next message) when a negation pattern matches.

Copilot · 2026-02-16T06:53:06Z

docs/claude/tools/amplihack/hooks/claude_power_steering.py

+def _format_conversation_summary(conversation: list[dict], max_length: int | None = None) -> str:
    """Format conversation summary for analysis.

    Args:
        conversation: List of message dicts


_format_conversation_summary() defaults max_length=None (unbounded), which can build extremely large prompts for long transcripts and risk excessive latency/cost or exceeding the model context window. Consider a bounded default (chars/messages) and deterministic truncation strategy (e.g., most recent N messages + a brief header).

Copilot · 2026-02-16T06:53:06Z

docs/claude/tools/amplihack/hooks/claude_power_steering.py

+
+        # Check for INVOKED indicator
+        if "invoked:" in response_lower or "invoked" in response_lower[:50]:
+            return (True, None)
+
+        # Check for NOT INVOKED indicator
+        if "not invoked:" in response_lower or "not invoked" in response_lower[:50]:
+            # Extract reason from response
+            idx = response.lower().find("not invoked:")
+            if idx != -1:
+                reason = response[idx + 12 :].strip()
+                # Clean up and truncate
+                if reason and len(reason) > 10:
+                    return (False, reason[:200])
+            return (False, "Workflow not properly invoked")
+


In analyze_workflow_invocation(), the INVOKED check will also match "NOT INVOKED" responses because it searches for the substring "invoked" before checking the negative case. This makes a clear "NOT INVOKED: ..." response incorrectly return valid. Fix by checking NOT INVOKED first and/or using strict prefix matching.

Suggested change

# Check for INVOKED indicator

if "invoked:" in response_lower or "invoked" in response_lower[:50]:

return (True, None)

# Check for NOT INVOKED indicator

if "not invoked:" in response_lower or "not invoked" in response_lower[:50]:

# Extract reason from response

idx = response.lower().find("not invoked:")

if idx != -1:

reason = response[idx + 12 :].strip()

# Clean up and truncate

if reason and len(reason) > 10:

return (False, reason[:200])

return (False, "Workflow not properly invoked")

cleaned = response_lower.lstrip()

# Check for NOT INVOKED indicator first to avoid matching "invoked" inside "not invoked"

if cleaned.startswith("not invoked"):

# Extract reason from response

idx = response_lower.find("not invoked:")

if idx != -1:

reason = response[idx + len("not invoked:") :].strip()

# Clean up and truncate

if reason and len(reason) > 10:

return (False, reason[:200])

return (False, "Workflow not properly invoked")

# Check for INVOKED indicator

if cleaned.startswith("invoked"):

return (True, None)

Copilot · 2026-02-16T06:53:07Z

docs/claude/tools/amplihack/hooks/power_steering_checker.py

+                filepath.write_text(data)
+            else:  # append mode
+                with open(filepath, mode) as f:


_write_with_retry() writes text without specifying an encoding (Path.write_text default and open(...)). In non-UTF-8 locales this can yield inconsistent log/summary output or failures. Use an explicit encoding (e.g., utf-8) for both the write and append paths.

Suggested change

filepath.write_text(data)

else: # append mode

with open(filepath, mode) as f:

filepath.write_text(data, encoding="utf-8")

else: # append mode

with open(filepath, mode, encoding="utf-8") as f:

Copilot · 2026-02-16T06:53:07Z

docs/claude/tools/amplihack/hooks/claude_power_steering.py

+        response: Full SDK response text
+
+    Returns:
+        Extracted reason string (truncated to 200 chars), or generic fallback


The _extract_reason_from_response() docstring says the returned reason is truncated to 200 chars, but the implementation now returns the full extracted reason. Update the docstring to match behavior, or reintroduce truncation if downstream output/logging expects a limit.

Suggested change

Extracted reason string (truncated to 200 chars), or generic fallback

Full extracted reason string, or generic fallback

Copilot · 2026-02-16T06:53:07Z

.claude/tools/amplihack/hooks/claude_power_steering.py

+        response_lower = response.lower()
+
+        # Check for INVOKED indicator
+        if "invoked:" in response_lower or "invoked" in response_lower[:50]:
+            return (True, None)
+
+        # Check for NOT INVOKED indicator
+        if "not invoked:" in response_lower or "not invoked" in response_lower[:50]:
+            # Extract reason from response
+            idx = response.lower().find("not invoked:")
+            if idx != -1:
+                reason = response[idx + 12 :].strip()
+                # Clean up and truncate
+                if reason and len(reason) > 10:
+                    return (False, reason[:200])
+            return (False, "Workflow not properly invoked")
+


In analyze_workflow_invocation(), the INVOKED check will also match responses that start with "NOT INVOKED" because it looks for the substring "invoked" before checking the negative case. This makes a clear "NOT INVOKED: ..." response incorrectly return valid. Fix by checking for NOT INVOKED first and/or using a strict prefix match (e.g., anchored regex for ^INVOKED: vs ^NOT INVOKED:).

Suggested change

response_lower = response.lower()

# Check for INVOKED indicator

if "invoked:" in response_lower or "invoked" in response_lower[:50]:

return (True, None)

# Check for NOT INVOKED indicator

if "not invoked:" in response_lower or "not invoked" in response_lower[:50]:

# Extract reason from response

idx = response.lower().find("not invoked:")

if idx != -1:

reason = response[idx + 12 :].strip()

# Clean up and truncate

if reason and len(reason) > 10:

return (False, reason[:200])

return (False, "Workflow not properly invoked")

# Normalize response for analysis

response_stripped = response.lstrip()

response_lower = response_stripped.lower()

# Check for NOT INVOKED indicator first to avoid matching "invoked" inside "not invoked"

if response_lower.startswith("not invoked:") or response_lower.startswith("not invoked"):

# Extract reason from response

idx = response_lower.find("not invoked:")

if idx != -1:

# 12 = len("not invoked:")

reason = response_stripped[idx + 12 :].strip()

# Clean up and truncate

if reason and len(reason) > 10:

return (False, reason[:200])

return (False, "Workflow not properly invoked")

# Check for INVOKED indicator with strict prefix match

if response_lower.startswith("invoked:") or response_lower.startswith("invoked"):

return (True, None)

Copilot · 2026-02-16T06:53:07Z

amplifier-bundle/tools/amplihack/hooks/claude_power_steering.py

+        # Check for INVOKED indicator
+        if "invoked:" in response_lower or "invoked" in response_lower[:50]:
+            return (True, None)
+
+        # Check for NOT INVOKED indicator


In analyze_workflow_invocation(), the INVOKED check will also match "NOT INVOKED" responses because it searches for the substring "invoked" before checking the negative case. This makes a clear "NOT INVOKED: ..." response incorrectly return valid. Fix by checking NOT INVOKED first and/or using strict prefix matching.

- Fix analyze_workflow_invocation(): check NOT INVOKED before INVOKED to prevent substring match false positives on "not invoked" responses - Fix _format_conversation_summary(): add bounded default max_length=50000 to prevent oversized SDK prompts for long transcripts - Fix _check_next_steps(): use negation_matched flag with continue to skip structured next-steps detection when negation pattern already matched, preventing false failures on completion statements with list formatting - Fix _write_with_retry(): add encoding="utf-8" to both write paths for consistent behavior across locales - Fix _extract_reason_from_response() docstring to match implementation (returns full reason string, not truncated to 200 chars) - Remove test_workflow_invocation_validator_simple.py (tests deleted module) - Remove test_validator_import() from checker unit tests (references deleted workflow_invocation_validator module) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

25 behavioral tests verifying the 5 Copilot review issues are fixed: - TestWorkflowInvocationNotInvokedPriority: NOT INVOKED priority over INVOKED - TestFormatConversationSummaryBoundedLength: max_length=50000 bounded default - TestCheckNextStepsNegationLogic: negation prevents false next-steps failures - TestWriteWithRetryEncoding: UTF-8 encoding for cross-locale consistency - TestExtractReasonDocstringAccuracy: docstring matches actual behavior Also includes YAML scenario file for gadugi-agentic-test framework. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…encoding - Add MAX_CONVERSATION_SUMMARY_LENGTH = 512_000 as named constant (previously bare 50000 magic number; 512K is appropriate for 1M context window models) - Use named constant in _format_conversation_summary() default parameter - Fix log_file.write_text() missing encoding="utf-8" (same class as Fix 4, missed in previous commit) - Update behavioral test to validate 512K lower bound matches model reality Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-02-18T21:49:31Z

🤖 Auto-fixed version bump

The version in pyproject.toml has been automatically bumped to the next patch version.

If you need a minor or major version bump instead, please update pyproject.toml manually and push the change.

github-actions · 2026-02-18T21:52:50Z

Repo Guardian - Passed

All changed files in this PR are legitimate production code, tests, or configuration files. No ephemeral content detected.

Analysis Summary:

✅ Modified files: Production Python modules and configuration files
✅ Removed files: Deprecated test modules
✅ Added files: test_pr2365_behavioral.py and test_pr2365_power_steering_fixes.yaml are regression tests, not point-in-time documents
- While they reference PR feat: implement agentic power-steering analysis #2365 in their names, they provide durable test coverage to prevent bugs from reoccurring
- Properly structured with documentation and located in tests/ directory
- Similar to test_issue_NNNN.py pattern used in many projects
✅ No temporal filenames indicating temporary content
✅ No point-in-time documents (notes, status updates, investigation logs)
✅ No temporary scripts or one-off utilities

This PR implements agentic power-steering analysis with proper production code structure and regression test coverage.

AI generated by Repo Guardian

Ubuntu and others added 2 commits February 16, 2026 04:40

feat: implement agentic power-steering analysis

090cd8a

Fixes #2355 Replace regex-based validation with Claude SDK intelligent analysis. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

[skip ci] chore: Auto-bump patch version

fa01bb0

rysweet requested a review from Copilot February 16, 2026 06:43

Copilot started reviewing on behalf of rysweet February 16, 2026 06:43 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

Ubuntu and others added 2 commits February 18, 2026 05:39

rysweet marked this pull request as ready for review February 18, 2026 05:41

Ubuntu and others added 3 commits February 18, 2026 21:43

chore: Resolve merge conflict with main - bump version to 0.5.37

93b2ebf

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

[skip ci] chore: Auto-bump patch version

4b82577

rysweet merged commit 7fe27dc into main Feb 18, 2026
1 check passed

	Extracted reason string (truncated to 200 chars), or generic fallback
	Full extracted reason string, or generic fallback

Conversation

rysweet commented Feb 16, 2026

Summary

Problem

Solution

Changes

Benefits

Testing

Files Changed

Uh oh!

github-actions bot commented Feb 16, 2026

Uh oh!

github-actions bot commented Feb 16, 2026

Repo Guardian - Passed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 18, 2026

Repo Guardian - Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments