|
| 1 | +# AI-Driven Metadata Enhancement for apcore-toolkit |
| 2 | + |
| 3 | +This document outlines the strategy for using Small Language Models (SLMs) like **Qwen 1.5 (0.6B - 1.7B)** to enhance the metadata extracted by `apcore-toolkit-python`. |
| 4 | + |
| 5 | +## 1. Goal |
| 6 | + |
| 7 | +The toolkit's primary mission is to make existing code "AI-Perceivable". While static analysis (regex, AST) is efficient, it often fails to: |
| 8 | +- Generate meaningful `description` and `documentation` for legacy code. |
| 9 | +- Create effective `ai_guidance` for complex error handling. |
| 10 | +- Infer `input_schema` for functions using `*args` or `**kwargs`. |
| 11 | + |
| 12 | +Using a local SLM allows the toolkit to "understand" the code logic and fill these gaps with high speed and zero cost. |
| 13 | + |
| 14 | +## 2. Architecture: Local LLM Provider (Option B) |
| 15 | + |
| 16 | +To keep `apcore-toolkit-python` lightweight, we **DO NOT** bundle model weights. Instead, we use an OpenAI-compatible local API provider (e.g., Ollama, vLLM, LM Studio). |
| 17 | + |
| 18 | +### Configuration via Environment Variables |
| 19 | + |
| 20 | +The AI enhancement feature is controlled by the following environment variables: |
| 21 | + |
| 22 | +| Variable | Description | Default | |
| 23 | +|----------|-------------|---------| |
| 24 | +| `APCORE_AI_ENABLED` | Whether to enable SLM-based metadata enhancement. | `false` | |
| 25 | +| `APCORE_AI_ENDPOINT` | The URL of the OpenAI-compatible local API. | `http://localhost:11434/v1` | |
| 26 | +| `APCORE_AI_MODEL` | The model name to use (e.g., `qwen:0.6b`). | `qwen:0.6b` | |
| 27 | +| `APCORE_AI_THRESHOLD` | Confidence threshold for AI-generated metadata (0-1). | `0.7` | |
| 28 | + |
| 29 | +## 3. Recommended Setup (Ollama) |
| 30 | + |
| 31 | +For the best developer experience, we recommend using [Ollama](https://ollama.com/): |
| 32 | + |
| 33 | +1. **Install Ollama**. |
| 34 | +2. **Pull the recommended model**: |
| 35 | + ```bash |
| 36 | + ollama run qwen:0.6b |
| 37 | + ``` |
| 38 | +3. **Configure environment**: |
| 39 | + ```bash |
| 40 | + export APCORE_AI_ENABLED=true |
| 41 | + export APCORE_AI_MODEL="qwen:0.6b" |
| 42 | + ``` |
| 43 | + |
| 44 | +## 4. Enhancement Workflow |
| 45 | + |
| 46 | +When `APCORE_AI_ENABLED` is set to `true`, the `Scanner` will: |
| 47 | + |
| 48 | +1. **Extract static metadata** from docstrings and type hints. |
| 49 | +2. **Identify missing fields** (e.g., empty `description` or missing `ai_guidance`). |
| 50 | +3. **Send code snippets** to the local SLM with a structured prompt. |
| 51 | +4. **Merge the AI-generated metadata** into the final `ScannedModule`, marking them with a `x-generated-by: "slm"` tag for human audit. |
| 52 | + |
| 53 | +## 5. Security and Privacy |
| 54 | + |
| 55 | +- **No Data Leakage**: Since the model runs locally, your source code never leaves your machine. |
| 56 | +- **Auditability**: All AI-generated fields MUST be reviewed by the developer before committing the generated `apcore.yaml`. |
0 commit comments