docs: Add scope.md, expand AI enhancement documentation with architecture and targets, and introduce output verification for writers.

tercel · tercel · commit c457869bf972 · 2026-03-11T21:57:51.000+08:00
diff --git a/docs/ai-enhancement.md b/docs/ai-enhancement.md
@@ -1,56 +1,177 @@
-# AI-Driven Metadata Enhancement for apcore-toolkit
+# AI-Driven Metadata Enhancement
 
-This document outlines the strategy for using Small Language Models (SLMs) like **Qwen 1.5 (0.6B - 1.7B)** to enhance the metadata extracted by `apcore-toolkit-python`.
+This document specifies how `apcore-toolkit` uses Small Language Models (SLMs) to fill metadata gaps that static analysis cannot resolve.
 
 ## 1. Goal
 
-The toolkit's primary mission is to make existing code "AI-Perceivable". While static analysis (regex, AST) is efficient, it often fails to:
-- Generate meaningful `description` and `documentation` for legacy code.
-- Create effective `ai_guidance` for complex error handling.
-- Infer `input_schema` for functions using `*args` or `**kwargs`.
+The toolkit's primary mission is to make existing code "AI-Perceivable". While static analysis (regex, AST, type hints) is efficient, it often fails to:
 
-Using a local SLM allows the toolkit to "understand" the code logic and fill these gaps with high speed and zero cost.
+- Generate meaningful `description` and `documentation` for legacy code with no docstrings.
+- Create effective `ai_guidance` for complex error handling paths.
+- Infer `input_schema` for functions using `*args` or `**kwargs`.
+- Determine behavioral `annotations` (e.g., is this function destructive?) from code logic.
 
-## 2. Architecture: Local LLM Provider (Option B)
+A local SLM fills these gaps with high speed, zero cost, and no data leakage.
 
-To keep `apcore-toolkit-python` lightweight, we **DO NOT** bundle model weights. Instead, we use an OpenAI-compatible local API provider (e.g., Ollama, vLLM, LM Studio).
+## 2. Architecture
 
-### Configuration via Environment Variables
+To keep `apcore-toolkit` lightweight, we **do not** bundle model weights. Instead, we call an OpenAI-compatible local API provider.
 
-The AI enhancement feature is controlled by the following environment variables:
+### Configuration
 
 | Variable | Description | Default |
 |----------|-------------|---------|
-| `APCORE_AI_ENABLED` | Whether to enable SLM-based metadata enhancement. | `false` |
-| `APCORE_AI_ENDPOINT` | The URL of the OpenAI-compatible local API. | `http://localhost:11434/v1` |
-| `APCORE_AI_MODEL` | The model name to use (e.g., `qwen:0.6b`). | `qwen:0.6b` |
-| `APCORE_AI_THRESHOLD` | Confidence threshold for AI-generated metadata (0-1). | `0.7` |
-
-## 3. Recommended Setup (Ollama)
+| `APCORE_AI_ENABLED` | Enable SLM-based metadata enhancement. | `false` |
+| `APCORE_AI_ENDPOINT` | URL of the OpenAI-compatible API. | `http://localhost:11434/v1` |
+| `APCORE_AI_MODEL` | Model name (e.g., `qwen:0.6b`). | `qwen:0.6b` |
+| `APCORE_AI_THRESHOLD` | Confidence threshold for accepting AI-generated metadata (0.0–1.0). | `0.7` |
+| `APCORE_AI_BATCH_SIZE` | Number of modules to enhance per API call. | `5` |
+| `APCORE_AI_TIMEOUT` | Timeout in seconds for each API call. | `30` |
 
-For the best developer experience, we recommend using [Ollama](https://ollama.com/):
+### Recommended Setup (Ollama)
 
-1.  **Install Ollama**.
-2.  **Pull the recommended model**:
+1. **Install Ollama**: [ollama.com](https://ollama.com/)
+2. **Pull a model**:
     ```bash
     ollama run qwen:0.6b
     ```
-3.  **Configure environment**:
+3. **Configure**:
     ```bash
     export APCORE_AI_ENABLED=true
-    export APCORE_AI_MODEL="qwen:0.6b"
     ```
 
+## 3. Enhancement Targets
+
+The enhancer operates on `ScannedModule` instances **after** static scanning is complete. It only fills fields that are missing or below the confidence threshold.
+
+### 3.1 Description Generation
+
+**When**: `description` is empty or auto-generated (e.g., copied from function name).
+
+**Prompt strategy**: Send the function signature, docstring (if partial), and first 50 lines of the function body. Ask for a ≤200-character description following apcore's convention.
+
+**Audit tag**: `x-generated-by: slm` in `metadata`.
+
+### 3.2 Documentation Generation
+
+**When**: `documentation` is empty and the function has non-trivial logic (>10 lines).
+
+**Prompt strategy**: Send the full function body. Ask for a ≤5000-character Markdown explanation covering purpose, parameters, return value, and error conditions.
+
+### 3.3 Annotation Inference
+
+**When**: All annotations are at their default values (no explicit annotation was set by the scanner).
+
+This is where the SLM adds the most value — inferring behavioral semantics that static analysis cannot determine reliably.
+
+**Prompt strategy**: Send the function body and ask the model to classify each annotation with a confidence score:
+
+| Annotation | What the SLM looks for |
+|-----------|----------------------|
+| `readonly` | No writes to databases, files, or external services |
+| `destructive` | Deletes data, overwrites files, drops resources |
+| `idempotent` | Same input always produces same output, safe to retry |
+| `requires_approval` | Sends money, deletes accounts, modifies permissions |
+| `open_world` | HTTP calls, file I/O, database queries, subprocess calls |
+| `streaming` | Yields/iterates results incrementally |
+
+**Acceptance rule**: Only apply an annotation if the SLM's confidence ≥ `APCORE_AI_THRESHOLD`. Otherwise, leave as default and add a warning to `ScannedModule.warnings`.
+
+!!! tip "Inspired by HARNESS.md"
+    The annotation inference approach draws from [CLI-Anything's HARNESS.md](https://github.com/HKUDS/CLI-Anything) methodology, which catalogs undo/redo systems to determine destructiveness. For web frameworks, the equivalent is analyzing database transactions, file operations, and external API calls in the function body.
+
+### 3.4 Schema Inference for Untyped Functions
+
+**When**: `input_schema` is empty and the function uses `*args`, `**kwargs`, or `request` objects without type annotations.
+
+**Prompt strategy**: Send the function body. Ask the model to infer parameter names, types, and whether they are required, based on how `kwargs` keys are accessed in the code.
+
+**Output format**: A JSON Schema object that the toolkit merges into `ScannedModule.input_schema`.
+
 ## 4. Enhancement Workflow
 
-When `APCORE_AI_ENABLED` is set to `true`, the `Scanner` will:
+```
+Scanner.scan()
+    │
+    ▼
+list[ScannedModule]          ← static metadata (may have gaps)
+    │
+    ▼
+AIEnhancer.enhance(modules)  ← fills gaps using SLM
+    │
+    ├─ For each module:
+    │   1. Check which fields are missing/default
+    │   2. Build targeted prompt for each gap
+    │   3. Call SLM API
+    │   4. Parse response, check confidence
+    │   5. Merge accepted enhancements
+    │   6. Tag with x-generated-by: slm
+    │   7. Add warnings for rejected/low-confidence results
+    │
+    ▼
+list[ScannedModule]          ← enriched metadata
+    │
+    ▼
+Writer.write(modules)        ← output as YAML/Python/Registry
+```
+
+### Integration with BaseScanner
+
+The enhancer is **not** called automatically by `BaseScanner.scan()`. Framework adapters opt in explicitly:
+
+=== "Python"
+
+    ```python
+    from apcore_toolkit import AIEnhancer
+
+    scanner = MyFrameworkScanner()
+    modules = scanner.scan()
+
+    if AIEnhancer.is_enabled():
+        enhancer = AIEnhancer()
+        modules = enhancer.enhance(modules)
+
+    writer.write(modules, output_dir="./bindings")
+    ```
+
+=== "TypeScript"
+
+    ```typescript
+    import { AIEnhancer } from "apcore-toolkit";
+
+    const scanner = new MyFrameworkScanner();
+    let modules = scanner.scan();
+
+    if (AIEnhancer.isEnabled()) {
+      const enhancer = new AIEnhancer();
+      modules = await enhancer.enhance(modules);
+    }
+
+    writer.write(modules, { outputDir: "./bindings" });
+    ```
+
+## 5. Confidence Scoring
+
+Each AI-generated field includes a confidence score (0.0–1.0) stored in `metadata`:
+
+```yaml
+metadata:
+  x-generated-by: slm
+  x-ai-confidence:
+    description: 0.92
+    annotations.destructive: 0.85
+    annotations.readonly: 0.45    # below threshold, not applied
+```
+
+Fields below `APCORE_AI_THRESHOLD` are **not** applied to the module. Instead, a warning is added:
 
-1.  **Extract static metadata** from docstrings and type hints.
-2.  **Identify missing fields** (e.g., empty `description` or missing `ai_guidance`).
-3.  **Send code snippets** to the local SLM with a structured prompt.
-4.  **Merge the AI-generated metadata** into the final `ScannedModule`, marking them with a `x-generated-by: "slm"` tag for human audit.
+```
+"Low confidence (0.45) for annotations.readonly — skipped. Review manually."
+```
 
-## 5. Security and Privacy
+## 6. Security and Privacy
 
-- **No Data Leakage**: Since the model runs locally, your source code never leaves your machine.
-- **Auditability**: All AI-generated fields MUST be reviewed by the developer before committing the generated `apcore.yaml`.
+- **No data leakage**: The model runs locally. Source code never leaves the machine.
+- **Auditability**: All AI-generated fields are tagged with `x-generated-by: slm` for human review.
+- **Opt-in only**: Disabled by default (`APCORE_AI_ENABLED=false`).
+- **Graceful degradation**: If the SLM endpoint is unreachable, the enhancer logs a warning and returns modules unchanged.
diff --git a/docs/features/output-writers.md b/docs/features/output-writers.md
@@ -86,6 +86,65 @@ Directly registers the scanned modules into an active `apcore.Registry` instance
     writer.write(modules, registry);
     ```
 
+## Output Verification
+
+Writers can optionally **verify** that their output artifacts are well-formed after writing. This prevents silent failures where a writer produces a file that apcore cannot load.
+
+### Verification by Writer Type
+
+| Writer | Verification Checks |
+|--------|-------------------|
+| `YAMLWriter` | File exists, YAML parses without error, contains required `module_id` and `target` fields |
+| `PythonWriter` / `TypeScriptWriter` | File exists, source code parses without syntax errors (AST/TS compiler check) |
+| `RegistryWriter` | Module ID is registered, `registry.get(module_id)` returns a valid module |
+
+### Usage
+
+Verification is enabled via the `verify` parameter:
+
+=== "Python"
+
+    ```python
+    from apcore_toolkit import YAMLWriter
+
+    writer = YAMLWriter()
+    results = writer.write(modules, output_dir="./bindings", verify=True)
+
+    # results contains verification status per module
+    for r in results:
+        if not r.verified:
+            print(f"WARNING: {r.module_id} — {r.verification_error}")
+    ```
+
+=== "TypeScript"
+
+    ```typescript
+    import { YAMLWriter } from "apcore-toolkit";
+
+    const writer = new YAMLWriter();
+    const results = writer.write(modules, { outputDir: "./bindings", verify: true });
+
+    for (const r of results) {
+      if (!r.verified) {
+        console.warn(`WARNING: ${r.moduleId} — ${r.verificationError}`);
+      }
+    }
+    ```
+
+### Verification Result
+
+Each write operation returns a list of `WriteResult` objects:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `module_id` | `str` | The module that was written |
+| `path` | `str \| None` | Output file path (None for RegistryWriter) |
+| `verified` | `bool` | Whether verification passed (always `True` if `verify=False`) |
+| `verification_error` | `str \| None` | Error message if verification failed |
+
+!!! tip "Use in CI"
+    Enable verification in CI pipelines to catch binding generation issues before deployment. A scan → write → verify cycle ensures that generated artifacts are always loadable by apcore.
+
 ## Choosing a Writer
 
 | Use Case | Recommended Writer |
diff --git a/docs/features/overview.md b/docs/features/overview.md
@@ -6,16 +6,21 @@
 
 | Feature | Description |
 |---------|-------------|
-| **[Smart Scanning](scanning.md)** | Abstract base classes and utilities for framework-specific scanners. |
+| **[Smart Scanning](scanning.md)** | Abstract base classes and utilities for framework-specific scanners, with a 5-phase ability extraction methodology. |
 | **[OpenAPI Integration](openapi.md)** | Extract JSON Schemas directly from OpenAPI operation objects. |
 | **[Schema Utilities](pydantic.md)** | Flatten complex models (Pydantic / Zod) for easier AI interaction. |
-| **[Output Writers](output-writers.md)** | Export metadata to YAML bindings, Python wrappers, or direct Registry registration. |
+| **[Output Writers](output-writers.md)** | Export metadata to YAML bindings, source code wrappers, or direct Registry registration — with optional output verification. |
 | **[Formatting](formatting.md)** | Convert data structures into beautiful, human-readable Markdown. |
-| **[AI Enhancement](../ai-enhancement.md)** | Use local SLMs to automatically fill in missing metadata. |
+| **[AI Enhancement](../ai-enhancement.md)** | Use local SLMs to automatically fill in missing metadata, including behavioral annotation inference. |
 
 ## Design Philosophy
 
 - **Framework Agnostic**: The core logic has no dependency on specific web frameworks (Django, Flask, FastAPI).
 - **Separation of Concerns**: Scanning (extraction), Schema Utilities (refinement), and Writers (export) are kept distinct.
 - **Developer First**: Focuses on automating the tedious tasks of writing `apcore.yaml` or `@module` decorators.
 - **AI-Native**: Built with the assumption that the ultimate consumer of this metadata is a Large Language Model (LLM) or AI agent.
+- **Dual-Language Parity**: Every feature is implementable in both Python and TypeScript.
+
+## Scope
+
+For a detailed definition of what the toolkit does and does not do, see the [Scope & Boundaries](../scope.md) document.
diff --git a/docs/features/scanning.md b/docs/features/scanning.md
@@ -13,6 +13,69 @@ The `BaseScanner` ABC (Abstract Base Class) provides a consistent interface and
 | `infer_annotations(...)` | Infer `readonly`, `destructive`, or `idempotent` from HTTP methods. |
 | `deduplicate_ids(...)` | Automatically resolve duplicate module IDs by appending suffixes (`_2`, `_3`). |
 
+## Ability Extraction Methodology
+
+When building a scanner for a new framework, follow this systematic approach to ensure comprehensive metadata extraction. This methodology is adapted from real-world experience scanning 10+ software systems.
+
+### Phase 1: Identify the Backend Engine
+
+Separate the framework's **routing/dispatch layer** from its **business logic layer**. The scanner should target the dispatch layer to discover endpoints, then reach into the business logic layer for metadata.
+
+| Framework | Dispatch Layer | Business Logic |
+|-----------|---------------|----------------|
+| Django REST | `urlpatterns` + `ViewSet` | serializer methods, queryset logic |
+| Flask | `@app.route` + Blueprints | view function body |
+| FastAPI | `@router.get/post` | endpoint function with type hints |
+| Express | `router.get/post` | handler functions |
+| NestJS | `@Controller` + `@Get/@Post` | service methods |
+
+### Phase 2: Map Operations to Modules
+
+For each discovered endpoint, extract the canonical mapping:
+
+```
+Framework endpoint  →  ScannedModule
+─────────────────     ─────────────
+route path            module_id
+handler function      target
+request schema        input_schema
+response schema       output_schema
+docstring             description + documentation
+```
+
+### Phase 3: Extract Data Models
+
+Leverage the framework's native schema system:
+
+- **Python**: Pydantic models, Django serializers, marshmallow schemas
+- **TypeScript**: Zod schemas, class-validator decorators, interfaces
+
+Use the toolkit's `flatten_pydantic_params()` or `flattenParams()` to convert nested models into flat schemas when needed.
+
+### Phase 4: Discover Existing API Contracts
+
+Check for existing machine-readable API definitions that can supplement or replace code scanning:
+
+- OpenAPI/Swagger specs (use `extract_input_schema()` / `extract_output_schema()`)
+- GraphQL schemas
+- gRPC/Protobuf definitions
+- Existing MCP server manifests
+
+### Phase 5: Infer Behavioral Annotations
+
+Go beyond HTTP method heuristics. Analyze the function body for behavioral signals:
+
+| Signal in Code | Inferred Annotation |
+|---------------|-------------------|
+| `DELETE` method, `.delete()` calls, `DROP` SQL | `destructive=True` |
+| `GET` method, no DB writes, pure computation | `readonly=True` |
+| `PUT` method, upsert patterns | `idempotent=True` |
+| Sends email/SMS, processes payment, modifies permissions | `requires_approval=True` |
+| HTTP client calls, file I/O, subprocess | `open_world=True` |
+| `yield`, `StreamingResponse`, `async for` | `streaming=True` |
+
+Static analysis can detect some of these patterns. For ambiguous cases, the [AI Enhancement](../ai-enhancement.md) module can assist with SLM-based inference.
+
 ## Implementation Example
 
 When implementing a custom scanner, you inherit from `BaseScanner`:
@@ -93,7 +156,9 @@ Scanners often encounter naming collisions (e.g., `GET /users` and `POST /users`
 ## Behavioral Inference
 
 `infer_annotations_from_method()` provides a sensible default for mapping HTTP verbs to apcore's `ModuleAnnotations`:
-- `GET` $\rightarrow$ `readonly=True`
-- `DELETE` $\rightarrow$ `destructive=True`
-- `PUT` $\rightarrow$ `idempotent=True`
-- Others $\rightarrow$ Default (all False)
+- `GET` → `readonly=True`
+- `DELETE` → `destructive=True`
+- `PUT` → `idempotent=True`
+- Others → Default (all False)
+
+For deeper behavioral analysis beyond HTTP methods, see [Phase 5](#phase-5-infer-behavioral-annotations) above and the [AI Enhancement](../ai-enhancement.md) module.
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -263,7 +263,7 @@ Enhance your metadata using local Small Language Models (SLMs).
    ```
 3. **Run your scanner**: Missing descriptions and documentation will be automatically inferred.
 
-See the [AI Enhancement Guide](AI_ENHANCEMENT.md) for more details.
+See the [AI Enhancement Guide](ai-enhancement.md) for more details.
 
 ---
 
diff --git a/docs/scope.md b/docs/scope.md
diff --git a/mkdocs.yml b/mkdocs.yml