Skip to content

Commit 2916691

Browse files
authored
Merge pull request #1 from vehoelite/copilot/fix-enhance-features
Add upstream contribution documentation infrastructure
2 parents ffa359b + d6f3ac5 commit 2916691

File tree

10 files changed

+1695
-2
lines changed

10 files changed

+1695
-2
lines changed

.github/PR_DOCUMENTATION_README.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Pull Request Documentation
2+
3+
This directory contains comprehensive documentation for contributing your fork's enhancements back to the original repository.
4+
5+
## Quick Start
6+
7+
**New here?** Start with: [`SUMMARY_FOR_USER.md`](../SUMMARY_FOR_USER.md)
8+
9+
## Documentation Files
10+
11+
### For Quick Answers
12+
- **[QUICK_REFERENCE.md](../QUICK_REFERENCE.md)** - Fast answers, TL;DR, common questions
13+
14+
### For Planning Your Contribution
15+
- **[SUMMARY_FOR_USER.md](../SUMMARY_FOR_USER.md)** - Executive overview
16+
- **[NEXT_STEPS.md](../NEXT_STEPS.md)** - Strategic guidance and recommended paths
17+
18+
### For Submitting a PR
19+
- **[HOW_TO_SUBMIT_PR.md](../HOW_TO_SUBMIT_PR.md)** - Complete step-by-step guide
20+
- **[CONTRIBUTING.md](../CONTRIBUTING.md)** - General contribution guidelines
21+
- **[PULL_REQUEST_TEMPLATE.md](../PULL_REQUEST_TEMPLATE.md)** - PR template
22+
23+
### For Understanding Changes
24+
- **[CHANGELOG.md](../CHANGELOG.md)** - Complete v3.0 documentation
25+
- **[README.md](../README.md)** - Feature comparison with upstream
26+
27+
## Three Paths Forward
28+
29+
1. **LM Studio Fixes First** - Easiest, highest chance of acceptance
30+
2. **Everything at Once** - Comprehensive PR with all v3.0 features
31+
3. **Community Fork** - Maintain independently
32+
33+
Choose your path in [NEXT_STEPS.md](../NEXT_STEPS.md)
34+
35+
## Your Fork's Value
36+
37+
- 5+ major new features (Agent Mode, Multi-Provider, Voice I/O, Training)
38+
- Critical bug fixes (LM Studio completely repaired)
39+
- 45+ KB of comprehensive documentation
40+
41+
## Need Help?
42+
43+
Check these in order:
44+
1. [QUICK_REFERENCE.md](../QUICK_REFERENCE.md)
45+
2. [NEXT_STEPS.md](../NEXT_STEPS.md)
46+
3. [HOW_TO_SUBMIT_PR.md](../HOW_TO_SUBMIT_PR.md)
47+
48+
Good luck! 🚀

Authors.txt

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
11
# Name or Organization <email address>
22

3-
AIDC-AI
4-
Christian Byrne <cbyrne@comfy.org>
3+
# Original Authors
4+
AIDC-AI <original project maintainers>
5+
Christian Byrne <cbyrne@comfy.org>
6+
7+
# Fork Contributors (v3.0 Enhancements)
8+
vehoelite <maintainer of enhanced fork>
9+
# Agent Mode, Multi-Provider Support, Voice I/O, LM Studio fixes, Fine-tuning pipeline
10+
# Implementation assistance by Claude Opus 4.6

CHANGELOG.md

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
# Changelog
2+
3+
All notable changes to this fork of ComfyUI-Copilot are documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [3.0.0] - 2025-01-XX
9+
10+
This is a major enhancement release that adds autonomous agent capabilities, multi-provider support, voice I/O, and fixes critical LM Studio integration issues.
11+
12+
### Added
13+
14+
#### Agent Mode - Autonomous Workflow Building
15+
- **PLAN/EXECUTE/VALIDATE/REPORT Loop** (`backend/service/agent_mode.py`)
16+
- Agent breaks down complex goals into discrete tasks
17+
- Autonomously searches nodes, builds workflows, sets parameters
18+
- Validates workflow integrity before presenting to user
19+
- Provides step-by-step progress reporting
20+
21+
- **Tool Budget System** (`backend/service/agent_mode_tools.py`)
22+
- Per-tool call limits (e.g., `search_nodes` max 4x, `save_workflow` max 5x)
23+
- Global tool budget of 30 calls per agent session
24+
- Loop prevention: kills if same tool+args repeated 3x in last 8 calls
25+
- 5-minute total timeout, 25 max agent turns
26+
27+
- **Visual Progress Tracking** (`ui/src/components/chat/AgentModeIndicator.tsx`)
28+
- Real-time step indicator showing current agent phase
29+
- Task queue visualization
30+
- Toggle with robot button in chat input
31+
32+
#### Multi-Provider Support
33+
- **OpenAI-Compatible Provider Architecture** (`backend/utils/globals.py`)
34+
- `detect_provider()` function with URL pattern matching
35+
- Provider-specific constants: timeouts, token limits, features
36+
37+
- **Supported Providers**:
38+
- **OpenAI**: Full feature support with default model `gemini-2.5-flash`
39+
- **Groq**: Free tier with `llama-3.3-70b-versatile`, reduced tool sets for rate limits
40+
- **Anthropic**: Via OpenAI compatibility layer, `claude-sonnet-4-20250514`
41+
- **LM Studio**: Fully local with auto-detection, no API key required
42+
43+
- **4-Tab Settings Modal** (`ui/src/components/chat/ApiKeyModal.tsx`)
44+
- Auto-fill base URLs per provider
45+
- Provider-specific placeholders and hints
46+
- Model dropdown with refresh capability
47+
48+
- **Provider-Aware Optimizations**:
49+
- Constrained providers get compressed prompts and reduced tool sets
50+
- HTTP timeout hierarchy: Groq 30s, Anthropic 60s, LMStudio/OpenAI 120s
51+
- Rate-limit detection with automatic wait-and-retry
52+
- Frontend SSE timeout: 360s > Backend agent: 300s > MCP session: 180s
53+
54+
#### Voice I/O - Speech Interaction
55+
- **Speech-to-Text (STT)** (`ui/src/utils/vadRecorder.ts`)
56+
- Browser-based voice recording with Voice Activity Detection (VAD)
57+
- Web Audio AnalyserNode with RMS-based silence detection
58+
- Auto-stops after 1.8 seconds of silence
59+
- Real-time volume visualization on microphone button
60+
- Backend endpoints: `/api/voice/speech-to-text`
61+
- Groq: `whisper-large-v3-turbo` | OpenAI: `whisper-1`
62+
63+
- **Text-to-Speech (TTS)** (`ui/src/utils/streamingTTS.ts`)
64+
- Streaming TTS that reads AI responses as they arrive
65+
- Sentence-boundary detection for natural pacing (min 40 chars per chunk)
66+
- Gapless audio queue for smooth playback
67+
- Speaker button toggle (purple when active)
68+
- Backend endpoints: `/api/voice/text-to-speech`, `/api/voice/capabilities`
69+
- Groq: Orpheus TTS (200 char chunks, WAV) | OpenAI: tts-1 (4096 char chunks, MP3)
70+
71+
#### Fine-Tuning Pipeline
72+
- **Dataset Generation** (`training/generate_dataset.py`)
73+
- 18 conversation generators for ComfyUI tool-calling tasks
74+
- Augmentation with parameter variations
75+
- 9 current + 8 future tool schemas
76+
- 11 workflow templates with parameter pools
77+
78+
- **Dataset Validation** (`training/validate_dataset.py`)
79+
- 5-pass validation: structural + semantic checks
80+
- JSON schema validation for tool calls
81+
- Turn sequence validation
82+
83+
- **QLoRA Training** (`training/train.py`)
84+
- Unsloth-based training framework
85+
- Qwen3 model support with GGUF export
86+
- Consumer GPU optimized (RTX 5060 8GB validated)
87+
- Chunked cross-entropy loss (128-token chunks, ~37 MB vs 1.18 GB full)
88+
- Windows WDDM-compatible gradient checkpointing
89+
- Python 3.14 compatibility patches
90+
91+
### Fixed
92+
93+
#### LM Studio Integration - Complete Overhaul
94+
- **Port Configuration** (`backend/controller/llm_api.py`, `ui/src/components/chat/ApiKeyModal.tsx`)
95+
- FIXED: Port hint was wrong (1235 → 1234)
96+
- Correct default URL: `http://localhost:1234/v1`
97+
98+
- **URL Normalization** (`backend/utils/globals.py`)
99+
- FIXED: `/api/v1` was not being converted to `/v1` for OpenAI SDK compatibility
100+
- Automatic URL normalization: strips `/api` prefix, ensures `/v1` suffix
101+
- Handles both `http://localhost:1234` and `http://localhost:1234/v1` inputs
102+
103+
- **Model Listing** (`backend/controller/llm_api.py`)
104+
- FIXED: Did not parse LM Studio's native response format
105+
- Robust multi-format parser handles both OpenAI and LM Studio response formats
106+
- LM Studio format: `{"models": [...]}` with `key`/`display_name` fields
107+
- OpenAI format: `{"data": [...]}` with `id` field
108+
- 24-hour cache invalidation for model lists
109+
110+
- **API Key Handling** (`backend/service/mcp_client.py`, UI)
111+
- FIXED: API key was required even though LM Studio doesn't need one
112+
- Uses `"lmstudio-local"` placeholder when API key is empty
113+
- Frontend allows empty API key for LM Studio
114+
115+
- **Header Forwarding** (`backend/controller/conversation_api.py`)
116+
- FIXED: `Openai-Base-Url` header was not being sent from frontend
117+
- Proper header forwarding for custom base URLs
118+
119+
- **Auto-Detection** (`backend/utils/globals.py`)
120+
- Provider detection via URL patterns: `localhost:1234`, `127.0.0.1:1234`, `lmstudio`
121+
- Automatic feature flagging for local models
122+
123+
#### Metadata Handling
124+
- **None-Safe Operations** (various files)
125+
- FIXED: Crashes when node metadata was None or malformed
126+
- Uses `(meta.get("field") or "").lower()` pattern
127+
- Guards: `if not isinstance(meta, dict): continue`
128+
129+
#### Canvas Rule Enforcement
130+
- **Tool Restrictions** (`backend/service/mcp_client.py`, tool implementations)
131+
- FIXED: Tools could unintentionally modify canvas state
132+
- Only `save_workflow` modifies the canvas
133+
- `explain_node` and `search_node` are read-only information tools
134+
- Enforcement at code level (not just prompts) for local model compatibility
135+
136+
### Changed
137+
138+
#### Architecture
139+
- **Provider Detection** (`backend/utils/globals.py`)
140+
- Added `detect_provider()` with URL pattern matching
141+
- Centralized provider constants: `GROQ_HTTP_TIMEOUT`, `ANTHROPIC_HTTP_TIMEOUT`, etc.
142+
143+
- **Agent Factory** (`backend/agent_factory.py`)
144+
- Provider-aware client configuration
145+
- Timeout propagation from provider detection
146+
147+
- **Settings Modal** (`ui/src/components/chat/ApiKeyModal.tsx`)
148+
- Restructured as 4-tab interface (was single form)
149+
- Auto-fill functionality for base URLs
150+
- Provider-specific help text and placeholders
151+
152+
- **Chat Input** (`ui/src/components/chat/ChatInput.tsx`)
153+
- Added agent mode toggle button (robot icon)
154+
- Added voice input button (microphone icon)
155+
- Visual indicators for active modes
156+
157+
#### API Endpoints
158+
- **New Endpoints** (`backend/controller/conversation_api.py`, `backend/controller/llm_api.py`)
159+
- `POST /api/workflow/agent-mode-stream` - Agent mode SSE stream
160+
- `POST /api/voice/speech-to-text` - STT transcription
161+
- `POST /api/voice/text-to-speech` - TTS audio generation
162+
- `GET /api/voice/capabilities` - Provider TTS/STT capability check
163+
164+
- **Updated Endpoints**:
165+
- `GET /api/llm/models` - Now handles multiple provider formats
166+
- `POST /api/llm/verify` - Added base URL forwarding
167+
168+
#### Documentation
169+
- **README.md** - Complete rewrite with feature comparison table
170+
- **Added Files**:
171+
- `HOW_TO_USE_LMSTUDIO.md` - LM Studio setup guide
172+
- `LMSTUDIO_SETUP.md` - Detailed configuration steps
173+
- `LMSTUDIO_IMPLEMENTATION.md` - Technical implementation details
174+
- **Authors.txt** - Updated attribution
175+
176+
### Dependencies
177+
178+
#### New Python Packages
179+
- `unsloth` - QLoRA training framework (training pipeline only)
180+
- Enhanced OpenAI SDK usage for multi-provider support
181+
182+
#### Updated Node Packages
183+
- Enhanced React components for agent mode UI
184+
- Added audio recording/playback utilities
185+
186+
### Technical Details
187+
188+
#### Timeout Hierarchy
189+
```
190+
Frontend SSE: 360s
191+
└─> Backend Agent: 300s
192+
└─> MCP Session: 180s
193+
└─> MCP Request: 120s
194+
└─> Provider HTTP: 30-120s (provider-dependent)
195+
```
196+
197+
#### Tool Budget Enforcement
198+
- Prevents runaway agent loops
199+
- Per-tool limits configurable in `agent_mode_tools.py`
200+
- Hard kill if tool abuse detected (3x same call in 8 turns)
201+
202+
#### Provider Detection Logic
203+
```python
204+
def detect_provider(base_url: str) -> str:
205+
url_lower = base_url.lower()
206+
if "groq" in url_lower: return "groq"
207+
if "anthropic" in url_lower: return "anthropic"
208+
if "localhost:1234" in url_lower or "lmstudio" in url_lower: return "lmstudio"
209+
return "openai" # default
210+
```
211+
212+
## [2.0.0] - Original Upstream Release
213+
214+
Features from the original [AIDC-AI/ComfyUI-Copilot](https://github.com/AIDC-AI/ComfyUI-Copilot) v2.0:
215+
216+
- Workflow generation with library matching
217+
- One-click debug mode
218+
- Workflow rewriting via natural language
219+
- Parameter tuning (GenLab)
220+
- Node search and recommendations
221+
- Node query tool
222+
- Model recommendations
223+
- Downstream node suggestions
224+
- Multilingual support (English, Chinese)
225+
226+
---
227+
228+
## Upgrade Guide
229+
230+
### From Upstream v2.0 to This Fork v3.0
231+
232+
1. **Install new dependencies**:
233+
```bash
234+
pip install -r requirements.txt
235+
```
236+
237+
2. **Update API configuration**:
238+
- Open the settings modal in ComfyUI
239+
- Your existing OpenAI key will continue to work
240+
- If using LM Studio, clear the API key field and update the base URL to `http://localhost:1234/v1`
241+
242+
3. **Optional: Try new features**:
243+
- Enable Agent Mode with the robot button
244+
- Enable Voice I/O with the speaker button
245+
- Test multiple providers by switching tabs in settings
246+
247+
### Breaking Changes
248+
249+
- **LM Studio URL format**: Old format `http://localhost:1235/api/v1` → New format `http://localhost:1234/v1`
250+
- The system will auto-normalize, but update your saved configuration for clarity
251+
252+
### Migration Notes
253+
254+
- All existing workflows are compatible
255+
- Chat history is preserved
256+
- Settings may need to be re-entered if base URL format changed
257+
258+
---
259+
260+
## Support and Feedback
261+
262+
For issues or questions:
263+
- **This fork**: https://github.com/vehoelite/ComfyUI-Copilot-w-Agent/issues
264+
- **Original project**: https://github.com/AIDC-AI/ComfyUI-Copilot/issues
265+
266+
## Credits
267+
268+
- **Original ComfyUI-Copilot v2.0**: [AIDC-AI](https://github.com/AIDC-AI)
269+
- **Fork enhancements v3.0**: Enhanced by Claude Opus 4.6
270+
- **ComfyUI**: [ComfyUI Project](https://github.com/comfyanonymous/ComfyUI)
271+
- **Unsloth**: [Unsloth Project](https://github.com/unslothai/unsloth)

0 commit comments

Comments
 (0)