|
| 1 | +# Changelog |
| 2 | + |
| 3 | +All notable changes to this fork of ComfyUI-Copilot are documented in this file. |
| 4 | + |
| 5 | +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), |
| 6 | +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). |
| 7 | + |
| 8 | +## [3.0.0] - 2025-01-XX |
| 9 | + |
| 10 | +This is a major enhancement release that adds autonomous agent capabilities, multi-provider support, voice I/O, and fixes critical LM Studio integration issues. |
| 11 | + |
| 12 | +### Added |
| 13 | + |
| 14 | +#### Agent Mode - Autonomous Workflow Building |
| 15 | +- **PLAN/EXECUTE/VALIDATE/REPORT Loop** (`backend/service/agent_mode.py`) |
| 16 | + - Agent breaks down complex goals into discrete tasks |
| 17 | + - Autonomously searches nodes, builds workflows, sets parameters |
| 18 | + - Validates workflow integrity before presenting to user |
| 19 | + - Provides step-by-step progress reporting |
| 20 | + |
| 21 | +- **Tool Budget System** (`backend/service/agent_mode_tools.py`) |
| 22 | + - Per-tool call limits (e.g., `search_nodes` max 4x, `save_workflow` max 5x) |
| 23 | + - Global tool budget of 30 calls per agent session |
| 24 | + - Loop prevention: kills if same tool+args repeated 3x in last 8 calls |
| 25 | + - 5-minute total timeout, 25 max agent turns |
| 26 | + |
| 27 | +- **Visual Progress Tracking** (`ui/src/components/chat/AgentModeIndicator.tsx`) |
| 28 | + - Real-time step indicator showing current agent phase |
| 29 | + - Task queue visualization |
| 30 | + - Toggle with robot button in chat input |
| 31 | + |
| 32 | +#### Multi-Provider Support |
| 33 | +- **OpenAI-Compatible Provider Architecture** (`backend/utils/globals.py`) |
| 34 | + - `detect_provider()` function with URL pattern matching |
| 35 | + - Provider-specific constants: timeouts, token limits, features |
| 36 | + |
| 37 | +- **Supported Providers**: |
| 38 | + - **OpenAI**: Full feature support with default model `gemini-2.5-flash` |
| 39 | + - **Groq**: Free tier with `llama-3.3-70b-versatile`, reduced tool sets for rate limits |
| 40 | + - **Anthropic**: Via OpenAI compatibility layer, `claude-sonnet-4-20250514` |
| 41 | + - **LM Studio**: Fully local with auto-detection, no API key required |
| 42 | + |
| 43 | +- **4-Tab Settings Modal** (`ui/src/components/chat/ApiKeyModal.tsx`) |
| 44 | + - Auto-fill base URLs per provider |
| 45 | + - Provider-specific placeholders and hints |
| 46 | + - Model dropdown with refresh capability |
| 47 | + |
| 48 | +- **Provider-Aware Optimizations**: |
| 49 | + - Constrained providers get compressed prompts and reduced tool sets |
| 50 | + - HTTP timeout hierarchy: Groq 30s, Anthropic 60s, LMStudio/OpenAI 120s |
| 51 | + - Rate-limit detection with automatic wait-and-retry |
| 52 | + - Frontend SSE timeout: 360s > Backend agent: 300s > MCP session: 180s |
| 53 | + |
| 54 | +#### Voice I/O - Speech Interaction |
| 55 | +- **Speech-to-Text (STT)** (`ui/src/utils/vadRecorder.ts`) |
| 56 | + - Browser-based voice recording with Voice Activity Detection (VAD) |
| 57 | + - Web Audio AnalyserNode with RMS-based silence detection |
| 58 | + - Auto-stops after 1.8 seconds of silence |
| 59 | + - Real-time volume visualization on microphone button |
| 60 | + - Backend endpoints: `/api/voice/speech-to-text` |
| 61 | + - Groq: `whisper-large-v3-turbo` | OpenAI: `whisper-1` |
| 62 | + |
| 63 | +- **Text-to-Speech (TTS)** (`ui/src/utils/streamingTTS.ts`) |
| 64 | + - Streaming TTS that reads AI responses as they arrive |
| 65 | + - Sentence-boundary detection for natural pacing (min 40 chars per chunk) |
| 66 | + - Gapless audio queue for smooth playback |
| 67 | + - Speaker button toggle (purple when active) |
| 68 | + - Backend endpoints: `/api/voice/text-to-speech`, `/api/voice/capabilities` |
| 69 | + - Groq: Orpheus TTS (200 char chunks, WAV) | OpenAI: tts-1 (4096 char chunks, MP3) |
| 70 | + |
| 71 | +#### Fine-Tuning Pipeline |
| 72 | +- **Dataset Generation** (`training/generate_dataset.py`) |
| 73 | + - 18 conversation generators for ComfyUI tool-calling tasks |
| 74 | + - Augmentation with parameter variations |
| 75 | + - 9 current + 8 future tool schemas |
| 76 | + - 11 workflow templates with parameter pools |
| 77 | + |
| 78 | +- **Dataset Validation** (`training/validate_dataset.py`) |
| 79 | + - 5-pass validation: structural + semantic checks |
| 80 | + - JSON schema validation for tool calls |
| 81 | + - Turn sequence validation |
| 82 | + |
| 83 | +- **QLoRA Training** (`training/train.py`) |
| 84 | + - Unsloth-based training framework |
| 85 | + - Qwen3 model support with GGUF export |
| 86 | + - Consumer GPU optimized (RTX 5060 8GB validated) |
| 87 | + - Chunked cross-entropy loss (128-token chunks, ~37 MB vs 1.18 GB full) |
| 88 | + - Windows WDDM-compatible gradient checkpointing |
| 89 | + - Python 3.14 compatibility patches |
| 90 | + |
| 91 | +### Fixed |
| 92 | + |
| 93 | +#### LM Studio Integration - Complete Overhaul |
| 94 | +- **Port Configuration** (`backend/controller/llm_api.py`, `ui/src/components/chat/ApiKeyModal.tsx`) |
| 95 | + - FIXED: Port hint was wrong (1235 → 1234) |
| 96 | + - Correct default URL: `http://localhost:1234/v1` |
| 97 | + |
| 98 | +- **URL Normalization** (`backend/utils/globals.py`) |
| 99 | + - FIXED: `/api/v1` was not being converted to `/v1` for OpenAI SDK compatibility |
| 100 | + - Automatic URL normalization: strips `/api` prefix, ensures `/v1` suffix |
| 101 | + - Handles both `http://localhost:1234` and `http://localhost:1234/v1` inputs |
| 102 | + |
| 103 | +- **Model Listing** (`backend/controller/llm_api.py`) |
| 104 | + - FIXED: Did not parse LM Studio's native response format |
| 105 | + - Robust multi-format parser handles both OpenAI and LM Studio response formats |
| 106 | + - LM Studio format: `{"models": [...]}` with `key`/`display_name` fields |
| 107 | + - OpenAI format: `{"data": [...]}` with `id` field |
| 108 | + - 24-hour cache invalidation for model lists |
| 109 | + |
| 110 | +- **API Key Handling** (`backend/service/mcp_client.py`, UI) |
| 111 | + - FIXED: API key was required even though LM Studio doesn't need one |
| 112 | + - Uses `"lmstudio-local"` placeholder when API key is empty |
| 113 | + - Frontend allows empty API key for LM Studio |
| 114 | + |
| 115 | +- **Header Forwarding** (`backend/controller/conversation_api.py`) |
| 116 | + - FIXED: `Openai-Base-Url` header was not being sent from frontend |
| 117 | + - Proper header forwarding for custom base URLs |
| 118 | + |
| 119 | +- **Auto-Detection** (`backend/utils/globals.py`) |
| 120 | + - Provider detection via URL patterns: `localhost:1234`, `127.0.0.1:1234`, `lmstudio` |
| 121 | + - Automatic feature flagging for local models |
| 122 | + |
| 123 | +#### Metadata Handling |
| 124 | +- **None-Safe Operations** (various files) |
| 125 | + - FIXED: Crashes when node metadata was None or malformed |
| 126 | + - Uses `(meta.get("field") or "").lower()` pattern |
| 127 | + - Guards: `if not isinstance(meta, dict): continue` |
| 128 | + |
| 129 | +#### Canvas Rule Enforcement |
| 130 | +- **Tool Restrictions** (`backend/service/mcp_client.py`, tool implementations) |
| 131 | + - FIXED: Tools could unintentionally modify canvas state |
| 132 | + - Only `save_workflow` modifies the canvas |
| 133 | + - `explain_node` and `search_node` are read-only information tools |
| 134 | + - Enforcement at code level (not just prompts) for local model compatibility |
| 135 | + |
| 136 | +### Changed |
| 137 | + |
| 138 | +#### Architecture |
| 139 | +- **Provider Detection** (`backend/utils/globals.py`) |
| 140 | + - Added `detect_provider()` with URL pattern matching |
| 141 | + - Centralized provider constants: `GROQ_HTTP_TIMEOUT`, `ANTHROPIC_HTTP_TIMEOUT`, etc. |
| 142 | + |
| 143 | +- **Agent Factory** (`backend/agent_factory.py`) |
| 144 | + - Provider-aware client configuration |
| 145 | + - Timeout propagation from provider detection |
| 146 | + |
| 147 | +- **Settings Modal** (`ui/src/components/chat/ApiKeyModal.tsx`) |
| 148 | + - Restructured as 4-tab interface (was single form) |
| 149 | + - Auto-fill functionality for base URLs |
| 150 | + - Provider-specific help text and placeholders |
| 151 | + |
| 152 | +- **Chat Input** (`ui/src/components/chat/ChatInput.tsx`) |
| 153 | + - Added agent mode toggle button (robot icon) |
| 154 | + - Added voice input button (microphone icon) |
| 155 | + - Visual indicators for active modes |
| 156 | + |
| 157 | +#### API Endpoints |
| 158 | +- **New Endpoints** (`backend/controller/conversation_api.py`, `backend/controller/llm_api.py`) |
| 159 | + - `POST /api/workflow/agent-mode-stream` - Agent mode SSE stream |
| 160 | + - `POST /api/voice/speech-to-text` - STT transcription |
| 161 | + - `POST /api/voice/text-to-speech` - TTS audio generation |
| 162 | + - `GET /api/voice/capabilities` - Provider TTS/STT capability check |
| 163 | + |
| 164 | +- **Updated Endpoints**: |
| 165 | + - `GET /api/llm/models` - Now handles multiple provider formats |
| 166 | + - `POST /api/llm/verify` - Added base URL forwarding |
| 167 | + |
| 168 | +#### Documentation |
| 169 | +- **README.md** - Complete rewrite with feature comparison table |
| 170 | +- **Added Files**: |
| 171 | + - `HOW_TO_USE_LMSTUDIO.md` - LM Studio setup guide |
| 172 | + - `LMSTUDIO_SETUP.md` - Detailed configuration steps |
| 173 | + - `LMSTUDIO_IMPLEMENTATION.md` - Technical implementation details |
| 174 | +- **Authors.txt** - Updated attribution |
| 175 | + |
| 176 | +### Dependencies |
| 177 | + |
| 178 | +#### New Python Packages |
| 179 | +- `unsloth` - QLoRA training framework (training pipeline only) |
| 180 | +- Enhanced OpenAI SDK usage for multi-provider support |
| 181 | + |
| 182 | +#### Updated Node Packages |
| 183 | +- Enhanced React components for agent mode UI |
| 184 | +- Added audio recording/playback utilities |
| 185 | + |
| 186 | +### Technical Details |
| 187 | + |
| 188 | +#### Timeout Hierarchy |
| 189 | +``` |
| 190 | +Frontend SSE: 360s |
| 191 | + └─> Backend Agent: 300s |
| 192 | + └─> MCP Session: 180s |
| 193 | + └─> MCP Request: 120s |
| 194 | + └─> Provider HTTP: 30-120s (provider-dependent) |
| 195 | +``` |
| 196 | + |
| 197 | +#### Tool Budget Enforcement |
| 198 | +- Prevents runaway agent loops |
| 199 | +- Per-tool limits configurable in `agent_mode_tools.py` |
| 200 | +- Hard kill if tool abuse detected (3x same call in 8 turns) |
| 201 | + |
| 202 | +#### Provider Detection Logic |
| 203 | +```python |
| 204 | +def detect_provider(base_url: str) -> str: |
| 205 | + url_lower = base_url.lower() |
| 206 | + if "groq" in url_lower: return "groq" |
| 207 | + if "anthropic" in url_lower: return "anthropic" |
| 208 | + if "localhost:1234" in url_lower or "lmstudio" in url_lower: return "lmstudio" |
| 209 | + return "openai" # default |
| 210 | +``` |
| 211 | + |
| 212 | +## [2.0.0] - Original Upstream Release |
| 213 | + |
| 214 | +Features from the original [AIDC-AI/ComfyUI-Copilot](https://github.com/AIDC-AI/ComfyUI-Copilot) v2.0: |
| 215 | + |
| 216 | +- Workflow generation with library matching |
| 217 | +- One-click debug mode |
| 218 | +- Workflow rewriting via natural language |
| 219 | +- Parameter tuning (GenLab) |
| 220 | +- Node search and recommendations |
| 221 | +- Node query tool |
| 222 | +- Model recommendations |
| 223 | +- Downstream node suggestions |
| 224 | +- Multilingual support (English, Chinese) |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## Upgrade Guide |
| 229 | + |
| 230 | +### From Upstream v2.0 to This Fork v3.0 |
| 231 | + |
| 232 | +1. **Install new dependencies**: |
| 233 | + ```bash |
| 234 | + pip install -r requirements.txt |
| 235 | + ``` |
| 236 | + |
| 237 | +2. **Update API configuration**: |
| 238 | + - Open the settings modal in ComfyUI |
| 239 | + - Your existing OpenAI key will continue to work |
| 240 | + - If using LM Studio, clear the API key field and update the base URL to `http://localhost:1234/v1` |
| 241 | + |
| 242 | +3. **Optional: Try new features**: |
| 243 | + - Enable Agent Mode with the robot button |
| 244 | + - Enable Voice I/O with the speaker button |
| 245 | + - Test multiple providers by switching tabs in settings |
| 246 | + |
| 247 | +### Breaking Changes |
| 248 | + |
| 249 | +- **LM Studio URL format**: Old format `http://localhost:1235/api/v1` → New format `http://localhost:1234/v1` |
| 250 | + - The system will auto-normalize, but update your saved configuration for clarity |
| 251 | + |
| 252 | +### Migration Notes |
| 253 | + |
| 254 | +- All existing workflows are compatible |
| 255 | +- Chat history is preserved |
| 256 | +- Settings may need to be re-entered if base URL format changed |
| 257 | + |
| 258 | +--- |
| 259 | + |
| 260 | +## Support and Feedback |
| 261 | + |
| 262 | +For issues or questions: |
| 263 | +- **This fork**: https://github.com/vehoelite/ComfyUI-Copilot-w-Agent/issues |
| 264 | +- **Original project**: https://github.com/AIDC-AI/ComfyUI-Copilot/issues |
| 265 | + |
| 266 | +## Credits |
| 267 | + |
| 268 | +- **Original ComfyUI-Copilot v2.0**: [AIDC-AI](https://github.com/AIDC-AI) |
| 269 | +- **Fork enhancements v3.0**: Enhanced by Claude Opus 4.6 |
| 270 | +- **ComfyUI**: [ComfyUI Project](https://github.com/comfyanonymous/ComfyUI) |
| 271 | +- **Unsloth**: [Unsloth Project](https://github.com/unslothai/unsloth) |
0 commit comments