Merge pull request #6 from smartwatermelon/claude/adversarial-reviewer-v1.3.0

smartwatermelon · web-flow · commit 5bf079985256 · 2026-02-11T12:26:22.000-08:00
feat(code-critic): adversarial-reviewer v1.3.0 refinements
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -15,7 +15,7 @@
       "name": "code-critic",
       "source": "./plugins/code-critic",
       "description": "Skeptical code review agent that assumes code is wrong until proven otherwise. Challenges architectural decisions, identifies failure modes systematically, and prioritizes long-term maintainability over validation.",
-      "version": "1.2.0",
+      "version": "1.3.0",
       "author": {
         "name": "Andrew Rich",
         "url": "https://github.com/smartwatermelon"
diff --git a/README.md b/README.md
@@ -33,7 +33,7 @@ claude plugin install --all smartwatermelon-marketplace
 
 ### Code Critic
 
-**Status**: Stable v1.2.0
+**Status**: Stable v1.3.0
 **Category**: Quality / Code Review
 
 Skeptical code review agent that assumes code is wrong until proven otherwise. Unlike validation-focused reviewers, Code Critic challenges architectural decisions, identifies failure modes, and prioritizes long-term maintainability.
@@ -155,5 +155,5 @@ Inspired by the philosophy that good tools challenge us to be better developers.
 
 ---
 
-**Latest Version**: 1.2.0
+**Latest Version**: 1.3.0
 **Last Updated**: 2026-02-11
diff --git a/plugins/code-critic/.claude-plugin/plugin.json b/plugins/code-critic/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "code-critic",
   "description": "Skeptical code review agent that assumes code is wrong until proven otherwise. Challenges architectural decisions, identifies failure modes systematically, and prioritizes long-term maintainability over validation.",
-  "version": "1.2.0",
+  "version": "1.3.0",
   "author": {
     "name": "Andrew Rich",
     "email": "andrew.rich@gmail.com"
diff --git a/plugins/code-critic/CHANGELOG.md b/plugins/code-critic/CHANGELOG.md
@@ -0,0 +1,67 @@
+# Changelog
+
+## v1.3.0
+
+- **"Lead with verdict" resolved**: Automated pipelines state
+  verdict in Summary line; interactive reviews keep verdict at end
+- **Sections consolidated**: Review Context Awareness merged into
+  Context and Scope
+- **Hook over-trigger guard**: Insufficient Context verdict reserved
+  for interactive reviews; automated pipelines use partial
+  assessment
+- **Skip-critic clarified**: Application-level skips distinguished
+  from git-level bypass; skip logging added to examples
+- **Pre-commit hook fix**: Dead code from `set -e` + `$?` replaced
+  with proper exit code capture
+- **USAGE.md Example 2**: Alternative Approaches section added for
+  presigned URL pattern
+- **Changelogs extracted**: Version history moved to this file
+
+## v1.2.0
+
+- **Observability checklist**: New failure mode category — can you
+  tell when this is broken in production?
+- **Insufficient Context verdict**: Fourth verdict option for partial
+  reviews where context is missing
+- **Diff-awareness**: Guidance for adapting review depth to diffs vs.
+  full files vs. snippets
+- **Output length calibration**: Concise for git hooks, thorough for
+  interactive review
+- **Confidence signaling**: Distinguishes confirmed findings from
+  pattern-based suspicions
+- **Domain Awareness repositioned**: Now sets the review lens before
+  the hierarchy is applied
+- **Configuration/prompts/IaC domain**: New domain-specific guidance
+  for reviewing config and prompt files
+- **Honest calibration restored**: "Genuinely good code is rare"
+  qualifier reinstated
+- **Documentation fixes**: Examples updated to match defined response
+  format, severity/verdict duplication removed, `--no-verify`
+  references replaced with safe alternatives
+
+## v1.1.0
+
+- **Failure mode checklist**: Systematic review across concurrency,
+  resource management, distributed systems, error handling, security,
+  and data integrity — not just general skepticism
+- **Data model review**: Schema design elevated to second priority in
+  the review hierarchy
+- **Severity calibration**: Clear definitions for Critical (blocks
+  merge), Concern (should fix), and Question (needs justification)
+- **Verdict**: Every review ends with a clear disposition — Block,
+  Revise, or Accept
+- **Follow-up protocol**: Guidance for iterative review, handling
+  pushback, and updating assessments
+- **Domain awareness**: Adjusts review lens for backend, frontend,
+  data pipelines, APIs, and tests
+- **Context and scope**: Explicit instructions to read surrounding
+  code and trace data flow before forming opinions
+- **Honest calibration**: Good code gets recognized with the same
+  rigor as bad code — no manufactured criticism
+- **Activation Criteria removed**: Usage guidance now lives in README
+  and USAGE.md, not in the agent prompt
+
+## v1.0.0
+
+- Initial release: adversarial code review agent with review
+  hierarchy, behavioral rules, and response format
diff --git a/plugins/code-critic/README.md b/plugins/code-critic/README.md
@@ -121,29 +121,17 @@ Code Critic addresses concerns in this order:
 **Standard Review:** "Nice use of modern JS!"
 **Code Critic:** "This will be unreadable in 6 months. Break it into explicit steps. Cleverness is not a virtue in production code."
 
-## What's New in v1.1.0
-
-- **Failure mode checklist**: Systematic review across concurrency, resource management, distributed systems, error handling, security, and data integrity — not just general skepticism
-- **Data model review**: Schema design elevated to second priority in the review hierarchy
-- **Severity calibration**: Clear definitions for Critical (blocks merge), Concern (should fix), and Question (needs justification)
-- **Verdict**: Every review ends with a clear disposition — Block, Revise, or Accept
-- **Follow-up protocol**: Guidance for iterative review, handling pushback, and updating assessments
-- **Domain awareness**: Adjusts review lens for backend, frontend, data pipelines, APIs, and tests
-- **Context and scope**: Explicit instructions to read surrounding code and trace data flow before forming opinions
-- **Honest calibration**: Good code gets recognized with the same rigor as bad code — no manufactured criticism
-- **Activation Criteria removed**: Usage guidance now lives in README and USAGE.md, not in the agent prompt
-
-## What's New in v1.2.0
-
-- **Observability checklist**: New failure mode category — can you tell when this is broken in production?
-- **Insufficient Context verdict**: Fourth verdict option for partial reviews where context is missing
-- **Diff-awareness**: Guidance for adapting review depth to diffs vs. full files vs. snippets
-- **Output length calibration**: Concise for git hooks, thorough for interactive review
-- **Confidence signaling**: Distinguishes confirmed findings from pattern-based suspicions
-- **Domain Awareness repositioned**: Now sets the review lens before the hierarchy is applied
-- **Configuration/prompts/IaC domain**: New domain-specific guidance for reviewing config and prompt files
-- **Honest calibration restored**: "Genuinely good code is rare" qualifier reinstated
-- **Documentation fixes**: Examples updated to match defined response format, severity/verdict duplication removed, `--no-verify` references replaced with safe alternatives
+## Changelog
+
+See [CHANGELOG.md](CHANGELOG.md) for version history.
+
+### Latest: v1.3.0
+
+- Resolved "lead with verdict" vs. verdict-at-end ambiguity
+- Merged Review Context Awareness into Context and Scope
+- Fixed pre-commit hook dead code in GIT_HOOKS.md examples
+- Extracted changelogs to CHANGELOG.md
+- Clarified skip-critic vs bypass distinction in GIT_HOOKS.md
 
 ## What Code Critic is NOT
 
@@ -165,6 +153,8 @@ model: opus
 
 To use a different model, edit `agents/adversarial-reviewer.md` and change the model field.
 
+Note: The agent's Configuration/prompts/IaC domain covers prompt and config files, which includes its own prompt file — useful for self-review workflows where the agent evaluates changes to its own definition.
+
 ## Contributing
 
 Contributions are welcome! Please:
diff --git a/plugins/code-critic/agents/adversarial-reviewer.md b/plugins/code-critic/agents/adversarial-reviewer.md
@@ -116,11 +116,6 @@ Before forming opinions, understand the context:
 - **Identify the system boundary.** Is this code internal plumbing or a public contract? Internal code can change; public contracts cannot. Review accordingly.
 - **Consider the data flow.** Trace where data comes from, what transforms it, and where it goes. Most bugs live at transformation boundaries.
 - **Ask about what you can't see.** If the review context is insufficient to form a judgment, say so. "I can't evaluate this without seeing how X is handled" is a valid and useful review comment.
-
-## Review Context Awareness
-
-You may receive code as a full file, a diff, or a partial snippet. Adapt accordingly:
-
 - **If reviewing a diff**: You see only changed lines with context. State what you can and cannot evaluate. Do not assume unchanged surrounding code is correct — it may be the source of the problem. Flag when a diff-only review is insufficient for safety judgment.
 - **If reviewing a full file**: You have more context but may lack knowledge of callers and system integration. State this.
 - **If context is insufficient**: Use the Insufficient Context verdict rather than guessing.
@@ -135,6 +130,7 @@ You may receive code as a full file, a diff, or a partial snippet. Adapt accordi
 - Calibrate honestly. If the code is good, say so and explain *why* it's good — this is just as valuable as identifying problems. Do not manufacture criticism to fill a quota. But genuinely good code is rare — most code has real problems worth discussing.
 - Signal confidence on findings. "This is a race condition" and "this pattern sometimes causes race conditions but I cannot confirm without seeing the thread model" are meaningfully different statements. State which one you mean.
 - Never say "looks good to me" or "LGTM" unless you would mass-refactor the codebase to match this pattern.
+- In automated pipeline context (git hooks, CI), prefer a partial assessment with stated limitations over the Insufficient Context verdict. Reserve Insufficient Context for interactive reviews where the developer can provide the missing information.
 
 ## Handling Follow-Up
 
@@ -156,7 +152,7 @@ When the developer responds to your review:
 
 Calibrate review length to the context:
 
-- **Automated pipeline (git hooks)**: Lead with the verdict. Keep the review focused on blocking and high-priority issues. Aim for concise.
+- **Automated pipeline (git hooks)**: State the verdict in the Summary line (e.g., "**Block.** Summary text...") so the hook consumer sees the disposition immediately. Keep the full Verdict section at the end. Focus on blocking and high-priority issues. Aim for concise.
 - **Interactive review**: Provide full analysis across all hierarchy levels. Depth over brevity.
 - **Always**: Every section that appears should contain substance. Omit sections with nothing to say rather than writing "None."
 
diff --git a/plugins/code-critic/docs/GIT_HOOKS.md b/plugins/code-critic/docs/GIT_HOOKS.md
@@ -73,7 +73,8 @@ if [ -z "$DIFF" ]; then
   exit 0
 fi
 
-# Run review
+# Run review (temporarily allow non-zero exit to capture result)
+set +e
 echo "$DIFF" | claude --agent adversarial-reviewer \
   --no-session-persistence \
   -p "Review these changes before push. Focus on:
@@ -83,8 +84,8 @@ echo "$DIFF" | claude --agent adversarial-reviewer \
 - Maintenance burden
 
 Provide specific, actionable feedback."
-
 REVIEW_EXIT=$?
+set -e
 
 if [ $REVIEW_EXIT -ne 0 ]; then
   echo "${RED}❌ Code Critic found issues${NC}"
@@ -125,12 +126,15 @@ echo "🔍 Reviewing staged changes..."
 # Get staged diff
 DIFF=$(git diff --cached)
 
-# Run review
+# Run review (temporarily allow non-zero exit to capture result)
+set +e
 echo "$DIFF" | claude --agent adversarial-reviewer \
   --no-session-persistence \
   -p "Quick review of staged changes. Focus on critical issues only."
+REVIEW_EXIT=$?
+set -e
 
-if [ $? -ne 0 ]; then
+if [ $REVIEW_EXIT -ne 0 ]; then
   echo "❌ Review found issues. Fix and retry, or split into smaller commits."
   exit 1
 fi
@@ -179,6 +183,8 @@ fi
 echo "🔒 Security-critical files detected, running adversarial review..."
 echo "$CRITICAL_FILES"
 
+# Run review (temporarily allow non-zero exit to capture result)
+set +e
 git diff origin/main...HEAD | claude --agent adversarial-reviewer \
   --no-session-persistence \
   -p "Review these security-critical changes:
@@ -192,8 +198,10 @@ Focus on:
 - XSS vulnerabilities
 - Race conditions
 - Data exposure"
+REVIEW_EXIT=$?
+set -e
 
-if [ $? -ne 0 ]; then
+if [ $REVIEW_EXIT -ne 0 ]; then
   echo "❌ Security review failed"
   exit 1
 fi
@@ -339,6 +347,13 @@ fi
 
 ### Skipping Review When Appropriate
 
+> **Note**: Application-level skips (`[skip-critic]`, `SKIP_CODE_CRITIC`)
+> are distinct from `--no-verify`. These skip mechanisms are visible in
+> commit history or CI logs, making skipped reviews auditable.
+> `--no-verify` silently bypasses all hooks with no trace. Use
+> application-level skips for docs-only commits or known-safe changes;
+> never use `--no-verify`.
+
 Allow developers to skip when justified:
 
 ```bash
@@ -347,12 +362,14 @@ Allow developers to skip when justified:
 # Check for skip marker in commit message
 if git log -1 --pretty=%B | grep -q '\[skip-critic\]'; then
   echo "⏭️  Skipping Code Critic (commit message contains [skip-critic])"
+  echo "[$(date)] SKIP: Code Critic skipped via [skip-critic]" >> .git/code-critic-skip.log
   exit 0
 fi
 
 # Check for skip marker in environment
 if [ "$SKIP_CODE_CRITIC" = "1" ]; then
   echo "⏭️  Skipping Code Critic (SKIP_CODE_CRITIC=1)"
+  echo "[$(date)] SKIP: Code Critic skipped via SKIP_CODE_CRITIC" >> .git/code-critic-skip.log
   exit 0
 fi
 
diff --git a/plugins/code-critic/docs/USAGE.md b/plugins/code-critic/docs/USAGE.md
@@ -106,15 +106,21 @@ presigned URLs would eliminate.
   reaches your storage layer.
 
 ### Concerns
-- File proxied through API server to S3 adds latency and cost.
-  Clients should upload directly via S3 presigned URLs.
 - No resume capability. A 90%-complete upload that fails means the
   user starts over. Consider the tus protocol for resumable uploads.
 - File metadata stored separately from files in S3. These will
   drift. S3 object tags would keep them co-located.
 - No per-user upload rate limiting. A single user can consume all
   your upload bandwidth.
 
+### Alternative Approaches
+- **Direct-to-S3 presigned URLs**: Instead of proxying uploads
+  through your API server, generate a presigned URL and let the
+  client upload directly to S3. This eliminates server-side
+  bandwidth cost, reduces latency, and removes your API server
+  as a failure point in the upload path. Your API only handles
+  the lightweight presign request and post-upload metadata.
+
 ### Questions
 - Why synchronous upload? This blocks API workers. What's the
   expected file size distribution? Anything over 10MB should be
@@ -184,9 +190,14 @@ if [ -z "$FILES" ]; then
 fi
 
 echo "Running adversarial code review..."
-git diff origin/main...HEAD | claude --agent adversarial-reviewer -p "Review these changes before push" --no-session-persistence -p
+set +e
+git diff origin/main...HEAD | claude --agent adversarial-reviewer \
+  --no-session-persistence \
+  -p "Review these changes before push"
+REVIEW_EXIT=$?
+set -e
 
-if [ $? -ne 0 ]; then
+if [ $REVIEW_EXIT -ne 0 ]; then
   echo "❌ Adversarial review found critical issues"
   exit 1
 fi

Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,7 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "code-critic",`
`3`	`3`	`"description": "Skeptical code review agent that assumes code is wrong until proven otherwise. Challenges architectural decisions, identifies failure modes systematically, and prioritizes long-term maintainability over validation.",`
`4`		`- "version": "1.2.0",`
	`4`	`+ "version": "1.3.0",`
`5`	`5`	`"author": {`
`6`	`6`	`"name": "Andrew Rich",`
`7`	`7`	`"email": "andrew.rich@gmail.com"`