@@ -1001,6 +1001,33 @@ Notes:
10011001- In soft mode, the client will require a patched server to accept soft embeddings. The flag ensures no breakage.
10021002
10031003
1004+ # ## Alternative: GLM API Provider
1005+
1006+ Instead of running llama.cpp locally, you can use the GLM API (ZhipuAI) as your decoder backend:
1007+
1008+ ** Setup:**
1009+ ` ` ` bash
1010+ # In .env
1011+ REFRAG_DECODER=1
1012+ REFRAG_RUNTIME=glm # Switch from llamacpp to glm
1013+ GLM_API_KEY=your-api-key # Required
1014+ GLM_MODEL=glm-4.6 # Optional, defaults to glm-4.6
1015+ ` ` `
1016+
1017+ ** How it works:**
1018+ - Uses OpenAI SDK with ` base_url=" https://api.z.ai/api/paas/v4/" `
1019+ - Supports prompt mode only (soft embeddings ignored)
1020+ - Handles GLM-4.6' s reasoning mode (`reasoning_content` field)
1021+ - Drop-in replacement for llama.cpp—same interface, no code changes needed
1022+
1023+ **Switch back to llama.cpp:**
1024+ ```bash
1025+ REFRAG_RUNTIME=llamacpp
1026+ ```
1027+
1028+ The GLM provider is implemented in `scripts/refrag_glm.py` and automatically selected when `REFRAG_RUNTIME=glm`.
1029+
1030+
10041031## How context_answer works (with decoder)
10051032
10061033The `context_answer` MCP tool answers natural-language questions using retrieval + a decoder sidecar.
@@ -1016,7 +1043,7 @@ Pipeline
101610431) Hybrid search (gate-first): Uses MINI-vector gating when `REFRAG_GATE_FIRST=1` to prefilter candidates, then runs dense+lexical fusion
101710442) Micro-span budgeting: Merges adjacent micro hits and applies a global token budget (`REFRAG_MODE=1`, `MICRO_BUDGET_TOKENS`, `MICRO_OUT_MAX_SPANS`)
101810453) Prompt assembly: Builds compact context blocks and a “Sources” footer
1019- 4) Decoder call (llama.cpp) : When ` REFRAG_DECODER=1` , calls ` LLAMACPP_URL ` to synthesize the final answer
1046+ 4) Decoder call: When `REFRAG_DECODER=1`, calls the configured runtime (`REFRAG_RUNTIME=llamacpp` or `glm`) to synthesize the final answer
102010475) Return: Answer + citations + usage flags; errors keep citations for debugging
10211048
10221049Environment toggles
0 commit comments