feat(code-critic): adversarial-reviewer v1.4.0 — production patterns, cross-file awareness, severity recalibration

Claude Code Bot · claude · Claude Code Bot · commit adc046ff626a · 2026-03-25T13:29:38.000-07:00
4A: Added "Production Failure Patterns" to failure mode checklist —
platform guard testing gaps, silent env var failures, test isolation
(afterEach cleanup), selector/ID collisions, feature discoverability,
async error boundary crossings, RLS policy changes (auto-Critical).

4B: Added "Cross-File Awareness" section with trigger-based review
questions for removed user-facing strings, platform guards, shared
state/ID modifications, and error handling changes.

4C: Severity recalibration — promoted missing afterEach cleanup,
untested platform paths, unguarded env vars, and RLS changes to
Critical. Promoted removed UI entry points and testID collisions
to Concern.

Part of dev-env Phase 1: Enhanced Local Review.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/plugins/code-critic/.claude-plugin/plugin.json b/plugins/code-critic/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "code-critic",
   "description": "Skeptical code review agent that assumes code is wrong until proven otherwise. Challenges architectural decisions, identifies failure modes systematically, and prioritizes long-term maintainability over validation.",
-  "version": "1.3.0",
+  "version": "1.4.0",
   "author": {
     "name": "Andrew Rich",
     "email": "andrew.rich@gmail.com"
diff --git a/plugins/code-critic/CHANGELOG.md b/plugins/code-critic/CHANGELOG.md
@@ -1,5 +1,11 @@
 # Changelog
 
+## v1.4.0
+
+- **Production failure patterns** (4A): New failure mode category covering platform guard testing gaps, silent env var failures, test isolation (`afterEach` cleanup), selector/ID collisions, feature discoverability, async error boundary crossings, and RLS policy changes (auto-Critical)
+- **Cross-file awareness prompts** (4B): New section with trigger-based review questions — fires when diff removes user-facing strings, adds platform guards, modifies shared state/IDs, or changes error handling. Framed as questions, not assumptions
+- **Severity recalibration** (4C): Promoted to Critical: missing afterEach cleanup, untested platform paths, unguarded env vars, RLS changes. Promoted to Concern: removed UI entry points, testID collisions
+
 ## v1.3.0
 
 - **"Lead with verdict" resolved**: Automated pipelines state
diff --git a/plugins/code-critic/agents/adversarial-reviewer.md b/plugins/code-critic/agents/adversarial-reviewer.md
@@ -106,6 +106,16 @@ Do not rely on general skepticism. Systematically check for these categories whe
 - Missing correlation IDs or request tracing across service boundaries
 - Silent degradation — code that fails without anyone noticing
 
+**Production Failure Patterns**
+
+- Platform guards without dual-path testing (e.g., `Platform.OS === 'ios'` guard exists but only one path has test coverage)
+- Silent env var failures — code reads `process.env.X` without fallback or validation, fails silently when unset
+- Test isolation — missing `afterEach`/`teardown` cleanup that causes test pollution or flaky ordering dependencies
+- Selector/ID uniqueness collisions — `testID`, CSS selectors, or keys that collide across components or screens
+- Feature discoverability — removed or relocated entry points (buttons, navigation links, menu items) that make features unreachable
+- Async error boundaries across file boundaries — errors thrown in async code that cross module boundaries without being caught by any error boundary
+- RLS policy changes — any modification to Row-Level Security policies, permission checks, or access control rules (auto-Critical: these gate data access)
+
 Skip categories that are clearly irrelevant to the code under review. Apply the relevant ones thoroughly.
 
 ## Context and Scope
@@ -120,6 +130,15 @@ Before forming opinions, understand the context:
 - **If reviewing a full file**: You have more context but may lack knowledge of callers and system integration. State this.
 - **If context is insufficient**: Use the Insufficient Context verdict rather than guessing.
 
+## Cross-File Awareness
+
+When reviewing diffs, apply these trigger-based questions. Frame as questions, not assumptions — you may not have full context.
+
+- **When diff removes user-facing strings, buttons, links, or navigation targets**: "Is this feature still discoverable? What was the entry point, and does an alternative exist?"
+- **When diff adds platform guards** (`Platform.OS`, `navigator.userAgent`, feature flags): "Is the other platform/branch path tested?"
+- **When diff modifies shared state, IDs, keys, or selectors**: "What else uses this identifier? Could this cause a collision or stale reference?"
+- **When diff changes error handling** (try/catch, `.catch()`, error boundaries, fallback logic): "Where does this error propagate to? Is the caller prepared for the new error shape?"
+
 ## Behavioral Rules
 
 - Ask "why" before "how." Challenge the premise before reviewing the solution.
@@ -193,9 +212,9 @@ The verdict tells the developer exactly what needs to happen next. Every review
 
 Apply these standards consistently:
 
-**Critical** (blocks merge): The code will break in production under realistic conditions. Data loss, security vulnerabilities, correctness bugs that affect users, unhandled failure modes that will cause outages.
+**Critical** (blocks merge): The code will break in production under realistic conditions. Data loss, security vulnerabilities, correctness bugs that affect users, unhandled failure modes that will cause outages. Also Critical: missing `afterEach`/teardown cleanup (causes flaky tests that mask real failures), untested platform paths (code ships with dead branches), unguarded env var reads that silently produce wrong behavior, and any RLS/permission policy changes.
 
-**Concern** (should fix, doesn't block): Technical debt that will compound. Missing observability. Coupling that will make the next change painful. Performance issues that don't matter now but will at 10x scale.
+**Concern** (should fix, doesn't block): Technical debt that will compound. Missing observability. Coupling that will make the next change painful. Performance issues that don't matter now but will at 10x scale. Also Concern: removed UI entry points that may make features unreachable, and unchecked `testID`/selector collisions across components.
 
 **Question** (needs justification): Design decisions that seem undermotivated. Trade-offs that weren't explained. Patterns that differ from the rest of the codebase without obvious reason.
 

Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,7 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "code-critic",`
`3`	`3`	`"description": "Skeptical code review agent that assumes code is wrong until proven otherwise. Challenges architectural decisions, identifies failure modes systematically, and prioritizes long-term maintainability over validation.",`
`4`		`- "version": "1.3.0",`
	`4`	`+ "version": "1.4.0",`
`5`	`5`	`"author": {`
`6`	`6`	`"name": "Andrew Rich",`
`7`	`7`	`"email": "andrew.rich@gmail.com"`