Improve /validate skill and analyze script

lucaspimentel · lucaspimentel · commit 3570ab9b8ba4 · 2026-03-22T15:37:34.000-04:00
SKILL.md: write JSON output to CWD instead of temp dir, recommend
--details-only flag, expand "What to look for" with modifier-sensitive,
context-sensitive, and concentration patterns.

analyze.py: add --details-only flag (pipe-friendly, no summary), add 6
new suspicious pattern detectors: const field init, struct fields, digit-
suffixed type params, naming in interop files, suggestion rule outliers,
concentrated violations.

🤖 Co-Authored-By: Claude Code &lt;noreply@anthropic.com&gt;
diff --git a/.claude/skills/validate/SKILL.md b/.claude/skills/validate/SKILL.md
@@ -35,15 +35,17 @@ If the path is ambiguous, ask the user to clarify before running.
 Run CsLint with `--format json`, save the output, then run the analysis script:
 
 ```bash
-dotnet run --project src/CsLint.Cli -- --format json <path> [--exclude <exclude-pattern>] 2>/dev/null > "$TEMP/cslint-validate.json"
+dotnet run --project src/CsLint.Cli -- --format json <path> [--exclude <exclude-pattern>] 2>/dev/null > cslint-validate.json
 ```
 
+Write the JSON output to the current directory (not `$TEMP` — that path doesn't work reliably on Windows with Git Bash). Clean up `cslint-validate.json` when done.
+
 CsLint exits 0 (clean), 1 (violations found), or 2 (error). Exit code 1 is expected — it means violations were found, which is what we want to analyze. If exit code is 2, report the error and stop.
 
 Then run the bundled analysis script to get a summary and flag suspicious patterns:
 
 ```bash
-python "${SKILL_DIR}/scripts/analyze.py" "$TEMP/cslint-validate.json"
+python "${SKILL_DIR}/scripts/analyze.py" cslint-validate.json
 ```
 
 `${SKILL_DIR}` is the directory containing this SKILL.md file.
@@ -56,10 +58,10 @@ The script prints:
 To get file:line details for specific rules (useful for investigation):
 
 ```bash
-python "${SKILL_DIR}/scripts/analyze.py" "$TEMP/cslint-validate.json" --details CSLINT210 CSLINT104
+python "${SKILL_DIR}/scripts/analyze.py" cslint-validate.json --details-only CSLINT210 CSLINT104
 ```
 
-Use `--details` with no rule IDs to print all violations, or pass specific rule IDs to filter.
+Use `--details-only` (not `--details`) to skip the summary and print only file:line violations — this works well with piping and `head`. Pass rule IDs to filter, or omit them to print all violations.
 
 ## Step 2 — Investigate suspicious patterns
 
@@ -74,10 +76,23 @@ Use `--details` on rules that look suspicious from the summary, then **read the
 - Fields in interop/P/Invoke structs — names must match native APIs
 - Local constants flagged by the class-level constant rule
 
+**Modifier-sensitive rules (rules that should behave differently based on modifiers):**
+- `const` fields flagged by rules that only apply to mutable fields (e.g., unnecessary initialization — constants *require* an initializer)
+- `static readonly` fields flagged by instance-only rules
+- Fields in structs flagged by class-only rules (e.g., "field should be private" — struct fields are commonly public for data carriers, interop, etc.)
+
+**Context-sensitive rules (rules that should consider the containing type/scope):**
+- Fields in `[StructLayout]` interop structs — must be public for marshaling
+- Members in test classes or test harnesses — may follow different conventions
+- Members in nested private types — encapsulation is already provided by the outer type
+
 **Style rules (CSLINT200+):**
 - Rule suggestion doesn't apply to the actual code pattern (e.g., suggesting `??` on a ternary that returns different types)
 - Extremely high violation counts for a single rule vs others (outlier)
 
+**Concentration pattern:**
+- If >80% of a rule's violations come from <3 files, it often indicates a context the rule doesn't handle (interop files, generated code, lookup tables with alignment whitespace, etc.)
+
 Group confirmed false positives by root cause. A single bug in CsLint can produce many false positives.
 
 ## Step 3 — Report
diff --git a/.claude/skills/validate/scripts/analyze.py b/.claude/skills/validate/scripts/analyze.py
@@ -1,9 +1,10 @@
 """Analyze CsLint JSON output for false positive patterns.
 
 Usage:
-  python analyze.py <results.json>              # summary + suspicious patterns
-  python analyze.py <results.json> --details    # also print file:line for every violation
-  python analyze.py <results.json> --details CSLINT210 CSLINT104  # details for specific rules only
+  python analyze.py <results.json>                              # summary + suspicious patterns
+  python analyze.py <results.json> --details                    # summary + file:line for every violation
+  python analyze.py <results.json> --details CSLINT210          # summary + file:line for specific rules
+  python analyze.py <results.json> --details-only CSLINT210     # file:line ONLY (no summary, pipe-friendly)
 """
 
 import argparse
@@ -98,6 +99,86 @@ def find_suspicious(data: list[dict]) -> dict[str, list[dict]]:
         if any(p in fp for p in (".g.cs", ".Generated.", "AssemblyInfo.cs", ".designer.cs")):
             suspicious["generated_file"].append(d)
 
+        # CSLINT238: "Do not initialize field" on const fields (constants require initializers)
+        if rid == "CSLINT238" and "Do not initialize field" in msg:
+            suspicious["possible_const_field_init"].append(d)
+
+        # CSLINT251: "Field should be private" — may be in struct/interop context
+        if rid == "CSLINT251":
+            suspicious["possible_struct_field"].append(d)
+
+        # CSLINT106: digit-suffixed type params (T0, T1, T2) are valid convention
+        if rid == "CSLINT106":
+            name_match = re.search(r"'(T\d+)'", msg)
+            if name_match:
+                suspicious["digit_suffixed_type_param"].append(d)
+
+    # Build per-file and per-rule indexes for cross-cutting checks
+    by_file: dict[str, list[dict]] = defaultdict(list)
+    by_rule: dict[str, list[dict]] = defaultdict(list)
+    for d in data:
+        by_file[d["filePath"]].append(d)
+        by_rule[d["ruleId"]].append(d)
+
+    # Naming violations in interop files (files containing DllImport/LibraryImport/StructLayout)
+    interop_files: set[str] = set()
+    for fp, items in by_file.items():
+        # Check if any violation in this file hints at interop context
+        # (we can't read source, but file names and other rule hits are clues)
+        has_interop_hint = False
+        for d in items:
+            # CSLINT251 in a file strongly hints at structs with public fields
+            if d["ruleId"] == "CSLINT251":
+                has_interop_hint = True
+                break
+            # Field naming violations mentioning ALL_CAPS or native-style names
+            if d["ruleId"] == "CSLINT104" and d["message"]:
+                name_match = re.search(r"'(\w+)'", d["message"])
+                if name_match:
+                    name = name_match.group(1)
+                    # Names with underscores or ALL_CAPS suggest native/interop
+                    if "_" in name or (name.isupper() and len(name) > 2):
+                        has_interop_hint = True
+                        break
+        if has_interop_hint:
+            interop_files.add(fp)
+
+    # Flag naming rule violations (CSLINT100-106) co-located in interop files
+    naming_rules = {"CSLINT100", "CSLINT101", "CSLINT102", "CSLINT103", "CSLINT104", "CSLINT105", "CSLINT106"}
+    for fp in interop_files:
+        for d in by_file[fp]:
+            if d["ruleId"] in naming_rules and d not in suspicious.get("verbatim_identifier", []):
+                suspicious["naming_in_interop_file"].append(d)
+
+    # Suggestion rules with outlier counts (5x+ median suggests rule is too aggressive)
+    suggestion_rules = {
+        "CSLINT200", "CSLINT201", "CSLINT208", "CSLINT209", "CSLINT210",
+        "CSLINT216", "CSLINT218", "CSLINT220", "CSLINT222", "CSLINT306",
+    }
+    suggestion_counts = {rid: len(items) for rid, items in by_rule.items() if rid in suggestion_rules and len(items) > 0}
+    if len(suggestion_counts) >= 3:
+        median_count = sorted(suggestion_counts.values())[len(suggestion_counts) // 2]
+        if median_count > 0:
+            for rid, count in suggestion_counts.items():
+                if count >= median_count * 5:
+                    for d in by_rule[rid]:
+                        suspicious["suggestion_rule_outlier"].append(d)
+
+    # Concentration check: rules where >80% of violations come from <3 files
+    by_rule: dict[str, list[dict]] = defaultdict(list)
+    for d in data:
+        by_rule[d["ruleId"]].append(d)
+
+    for rid, items in by_rule.items():
+        if len(items) < 5:
+            continue
+        file_counts = Counter(d["filePath"] for d in items)
+        top_files = file_counts.most_common(3)
+        top_count = sum(c for _, c in top_files)
+        if top_count / len(items) > 0.8 and len(file_counts) <= 3:
+            for d in items:
+                suspicious["concentrated_violations"].append(d)
+
     return suspicious
 
 
@@ -116,6 +197,12 @@ def print_suspicious(suspicious: dict[str, list[dict]]) -> None:
         "empty_identifier": "Empty identifier names — CsLint reading wrong token",
         "pascal_case_parameter": "PascalCase parameters — may be primary ctor / record params",
         "generated_file": "Violations in generated/auto-generated files",
+        "possible_const_field_init": "Unnecessary init on possible const fields — constants require initializers",
+        "possible_struct_field": "Field visibility in possible structs — struct fields are commonly public",
+        "digit_suffixed_type_param": "Digit-suffixed type parameters (T0, T1) — valid naming convention",
+        "naming_in_interop_file": "Naming violations in likely interop files — names must match native APIs",
+        "suggestion_rule_outlier": "Suggestion rule with 5x+ median count — rule may be too aggressive",
+        "concentrated_violations": "Highly concentrated violations (>80% in ≤3 files) — check for unhandled context",
     }
 
     for pattern, items in suspicious.items():
@@ -155,6 +242,13 @@ def main() -> None:
         metavar="RULE_ID",
         help="Print file:line for every violation. Optionally filter by rule IDs (e.g. --details CSLINT210 CSLINT104)",
     )
+    parser.add_argument(
+        "--details-only",
+        nargs="*",
+        default=None,
+        metavar="RULE_ID",
+        help="Print ONLY file:line details (no summary). Pipe-friendly. Optionally filter by rule IDs.",
+    )
     args = parser.parse_args()
 
     data = load_results(args.results)
@@ -163,6 +257,12 @@ def main() -> None:
         print("No violations found. CsLint reported a clean run.")
         sys.exit(0)
 
+    # --details-only: skip summary, print only file:line details
+    if args.details_only is not None:
+        rule_filter = args.details_only if args.details_only else None
+        print_details(data, rule_filter)
+        return
+
     print_summary(data)
     suspicious = find_suspicious(data)
     print_suspicious(suspicious)