ComfyUI-Copilot-w-Agent

中文 | English

ComfyUI-Copilot-w-Agent

Your Intelligent ComfyUI Assistant - Now with Agent Mode, Voice, and Multi-Provider Support

A community-enhanced fork of AIDC-AI/ComfyUI-Copilot

What's Different in This Fork?

This fork builds on the excellent ComfyUI-Copilot v2.0 by AIDC-AI, adding significant new capabilities:

Feature	Upstream v2.0	This Fork v3.0
Agent Mode (autonomous multi-step workflows)	-	Yes
Multi-Provider (OpenAI, Groq, Anthropic, LMStudio)	OpenAI only	Yes
LM Studio Integration (fixed and working)	Broken	Yes
Voice I/O (STT + streaming TTS)	-	Yes
Provider-Aware Timeouts and Token Budgets	-	Yes
Loop Prevention and Tool Budget Enforcement	-	Yes
Fine-Tuning Pipeline (Qwen3 QLoRA for tool-calling)	-	Yes
Chat, Debug, Rewrite, GenLab	Yes	Yes

New Features

Agent Mode - Autonomous Workflow Building

Agent Mode lets the AI autonomously plan and execute multi-step tasks on your ComfyUI canvas. Instead of asking for one thing at a time, describe your goal and the agent will:

Plan - Break the goal into discrete tasks
Execute - Search nodes, build workflows, set parameters
Validate - Check the workflow for errors
Report - Summarize what was done and ask for confirmation

Toggle Agent Mode with the robot button in the chat input. A visual step tracker shows real-time progress.

Architecture:

backend/service/agent_mode.py - PLAN, EXECUTE, VALIDATE, REPORT loop
backend/service/agent_mode_tools.py - Task queue, tool call tracker, loop prevention
ui/src/components/chat/AgentModeIndicator.tsx - Visual progress indicator

Safety:

Per-tool call limits (e.g., search_nodes max 4x, save_workflow max 5x)
Global tool budget of 30 calls per session
Hard kill if same tool+args repeated 3x in last 8 calls
5-minute total timeout, 25 max agent turns

Multi-Provider Support

Use any OpenAI-compatible provider - no lock-in.

Provider	Base URL	Default Model	Notes
OpenAI	`https://api.openai.com/v1`	`gemini-2.5-flash`	Full feature support
Groq	`https://api.groq.com/openai/v1`	`llama-3.3-70b-versatile`	Free tier, blazing fast
Anthropic	`https://api.anthropic.com/v1`	`claude-sonnet-4-20250514`	Via OpenAI compatibility
LM Studio	`http://localhost:1234/v1`	Auto-detected	Fully local, no API key

The settings modal has 4 tabs with auto-fill base URLs and provider-specific placeholders. Provider is auto-detected from the base URL.

Provider-aware optimizations:

Constrained providers (Groq free tier, LMStudio) get reduced tool sets and compressed prompts
Provider-specific HTTP timeouts (Groq 30s, Anthropic 60s, LMStudio/OpenAI 120s)
Rate-limit detection with automatic wait-and-retry

LM Studio Integration - Fixed

The upstream LM Studio integration had several issues that made it non-functional:

What was broken:

Port hint was wrong (1235 instead of 1234)
URL normalization failed - /api/v1 was not being converted to /v1 for the OpenAI SDK
Model listing did not parse LM Studio's native response format ({"models": [...]} with key/display_name fields)
API key was required even though LM Studio does not need one
The Openai-Base-Url header was not being sent from the frontend
No cache invalidation for model lists (stale after 24h)

What was fixed:

Correct default URL: http://localhost:1234/v1
Automatic URL normalization (strips /api prefix, ensures /v1 suffix)
Robust multi-format model list parser (handles both OpenAI and LM Studio response formats)
API key is optional - uses "lmstudio-local" placeholder when empty
Proper header forwarding for base URL
24-hour cache invalidation for model lists
Auto-detection of LM Studio via URL patterns

See HOW_TO_USE_LMSTUDIO.md for setup instructions.

Voice I/O - Talk to Your Copilot

Full voice input/output with per-provider backend support:

Speech-to-Text (STT):

Browser-based voice recording with Voice Activity Detection (VAD)
Auto-stops after 1.8 seconds of silence
Real-time volume visualization on the mic button
Groq: whisper-large-v3-turbo | OpenAI: whisper-1

Text-to-Speech (TTS):

Streaming TTS that reads responses as they arrive
Sentence-boundary detection for natural pacing (min 40 chars per chunk)
Gapless audio queue for smooth playback
Groq: Orpheus TTS (200 char chunks, WAV) | OpenAI: tts-1 (4096 char chunks, MP3)
Toggle with the speaker button (purple when active)

Key files:

ui/src/utils/streamingTTS.ts - Sentence extraction, audio queue, gapless playback
ui/src/utils/vadRecorder.ts - Web Audio AnalyserNode, RMS-based silence detection
backend/controller/llm_api.py - _VOICE_PROVIDER_MAP, TTS/STT endpoints, GET /api/voice/capabilities

Fine-Tuning Pipeline

A complete training pipeline for fine-tuning Qwen3 models on ComfyUI tool-calling tasks:

training/ generate_dataset.py # 18 conversation generators, augmentation validate_dataset.py # 5-pass structural + semantic validation train.py # QLoRA training with Unsloth + GGUF export tool_schemas.py # 9 current + 8 future tool definitions workflow_templates.py # 11 workflow templates + parameter pools

Designed for consumer GPUs:

Chunked cross-entropy loss (128-token chunks, ~37 MB vs 1.18 GB full)
Windows WDDM-compatible gradient checkpointing (no CPU offloading)
Python 3.14 compatibility patches
RTX 5060 8GB validated (Qwen3-4B, 4-bit, 2048 seq len)

Inherited Features (from Upstream v2.0)

All original ComfyUI-Copilot features work as before:

Workflow Generation - Describe what you want, get 3 library matches + 1 AI-generated workflow
One-Click Debug - Auto-detect errors, fix parameters, repair connections
Workflow Rewriting - Modify existing workflows via natural language
Parameter Tuning (GenLab) - Batch parameter sweeps with visual comparison
Node Recommendations - Search and discover nodes by description
Node Query - Deep-dive into any node's inputs, outputs, and usage
Model Recommendations - Find checkpoints and LoRAs for your use case
Downstream Node Suggestions - Context-aware next-node recommendations

Getting Started

Installation

cd ComfyUI/custom_nodes
git clone https://github.com/vehoelite/ComfyUI-Copilot-w-Agent.git
cd ComfyUI-Copilot-w-Agent
pip install -r requirements.txt

Windows (embedded Python):

python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-Copilot-w-Agent\requirements.txt

Configuration

Launch ComfyUI and find the Copilot button on the left panel
Click the settings button to open the API configuration modal
Choose your provider tab (OpenAI, Groq, Anthropic, or LM Studio)
Enter your API key (or leave empty for LM Studio)
Verify the connection and select a model

Using Agent Mode

Toggle the robot button in the chat input
Describe your goal: "Build a workflow that generates an image, upscales it 4x, and fixes faces"
Watch the agent plan and execute steps automatically
Review the result and confirm or iterate

Using Voice

Enable voice with the speaker button
Click the microphone to speak your request
AI responses will be read aloud as they stream in

Architecture

` Backend (Python) OpenAI Agents SDK + aiohttp via ComfyUI's PromptServer agent_factory.py - Creates Agent with AsyncOpenAI client service/ agent_mode.py - Agent Mode orchestration (PLAN, EXECUTE, VALIDATE, REPORT) agent_mode_tools.py - Tool budget, loop prevention, task queue mcp_client.py - Main chat agent entry point debug_agent.py - Multi-agent workflow debugger workflow_rewrite_agent.py - Workflow modification controller/ conversation_api.py - SSE streaming endpoints llm_api.py - Model listing, verification, TTS/STT, voice capabilities utils/ globals.py - detect_provider(), provider constants comfy_gateway.py - Wraps ComfyUI HTTP APIs

Frontend (React + Vite + Tailwind) workflowChat/workflowChat.tsx - Main chat + agent mode handling components/chat/ ChatInput.tsx - Agent mode toggle, voice buttons AgentModeIndicator.tsx - Visual step tracker ApiKeyModal.tsx - 4-tab provider configuration apis/workflowChatApi.ts - API layer (streamAgentMode, textToSpeech, etc.) utils/ streamingTTS.ts - Streaming text-to-speech with sentence extraction vadRecorder.ts - Voice activity detection recorder `

Technical Decisions

Timeout Hierarchy: Frontend SSE (360s) > Backend Agent (300s) > MCP session (180s) > MCP request (120s) > Provider HTTP (30-120s)

Canvas Rule: Only save_workflow modifies the canvas. explain_node and search_node are read-only information tools.

Tool Enforcement: Local models ignore prompt-level rules. Enforcement is at code level via tools that refuse to execute, plus stream-level kill switches.

None-safe metadata: Uses (meta.get("field") or "").lower() with if not isinstance(meta, dict): continue guard.

Contributing

Contributions welcome! This fork aims to push ComfyUI-Copilot's capabilities forward while staying compatible with upstream.

Contributing to This Fork

Bug reports - Open an issue with reproduction steps
Feature requests - Describe the use case
Pull requests - Fork, branch, PR with clear description

Contributing Back to Original Repository

Want to help contribute these enhancements to the original ComfyUI-Copilot project? See:

HOW_TO_SUBMIT_PR.md - Detailed guide for submitting PRs to upstream
CONTRIBUTING.md - General contribution guidelines
NEXT_STEPS.md - Quick start guide for getting your changes merged

We've prepared comprehensive documentation to make it easy to contribute these fixes and features back to AIDC-AI's original repository.

Credits

AIDC-AI - Original ComfyUI-Copilot (v2.0)
Claude Opus 4.6 - Agent Mode, multi-provider support, voice I/O, LM Studio fixes, fine-tuning pipeline, and all enhancements in this fork
Unsloth - QLoRA training framework

License

This project is licensed under the MIT License - see the LICENSE file for details.

Original project: AIDC-AI/ComfyUI-Copilot

Name		Name	Last commit message	Last commit date
Latest commit History 864 Commits
.github		.github
assets		assets
backend		backend
dist/copilot_web		dist/copilot_web
entry		entry
locales		locales
public		public
scripts		scripts
training		training
ui		ui
.cursorignore		.cursorignore
.gitignore		.gitignore
Authors.txt		Authors.txt
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Framework.png		Framework.png
HOW_TO_SUBMIT_PR.md		HOW_TO_SUBMIT_PR.md
HOW_TO_USE_LMSTUDIO.md		HOW_TO_USE_LMSTUDIO.md
LICENSE		LICENSE
LMSTUDIO_IMPLEMENTATION.md		LMSTUDIO_IMPLEMENTATION.md
LMSTUDIO_SETUP.md		LMSTUDIO_SETUP.md
NEXT_STEPS.md		NEXT_STEPS.md
NOTICE.txt		NOTICE.txt
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
README_CN.md		README_CN.md
SUMMARY_FOR_USER.md		SUMMARY_FOR_USER.md
__init__.py		__init__.py
command		command
index.html		index.html
package-lock.json		package-lock.json
poster.html		poster.html
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComfyUI-Copilot-w-Agent

Your Intelligent ComfyUI Assistant - Now with Agent Mode, Voice, and Multi-Provider Support

What's Different in This Fork?

New Features

Agent Mode - Autonomous Workflow Building

Multi-Provider Support

LM Studio Integration - Fixed

Voice I/O - Talk to Your Copilot

Fine-Tuning Pipeline

Inherited Features (from Upstream v2.0)

Getting Started

Installation

Configuration

Using Agent Mode

Using Voice

Architecture

Technical Decisions

Contributing

Contributing to This Fork

Contributing Back to Original Repository

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-Copilot-w-Agent

Your Intelligent ComfyUI Assistant - Now with Agent Mode, Voice, and Multi-Provider Support

What's Different in This Fork?

New Features

Agent Mode - Autonomous Workflow Building

Multi-Provider Support

LM Studio Integration - Fixed

Voice I/O - Talk to Your Copilot

Fine-Tuning Pipeline

Inherited Features (from Upstream v2.0)

Getting Started

Installation

Configuration

Using Agent Mode

Using Voice

Architecture

Technical Decisions

Contributing

Contributing to This Fork

Contributing Back to Original Repository

Credits

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages