Skip to content

ComfyUI-Copilot-w-Agent v3.0: Agent Mode, Multi-Provider, Voice I/O, …#130

Open
vehoelite wants to merge 9 commits intoAIDC-AI:mainfrom
vehoelite:main
Open

ComfyUI-Copilot-w-Agent v3.0: Agent Mode, Multi-Provider, Voice I/O, …#130
vehoelite wants to merge 9 commits intoAIDC-AI:mainfrom
vehoelite:main

Conversation

@vehoelite
Copy link
Copy Markdown

…LM Studio fixes

Major enhancements over upstream AIDC-AI/ComfyUI-Copilot v2.0:

  • Agent Mode: Autonomous multi-step workflow building with PLAN/EXECUTE/VALIDATE/REPORT loop, tool budget enforcement, loop prevention, visual step tracker
  • Multi-Provider: OpenAI, Groq, Anthropic, LM Studio with auto-detection, provider-aware timeouts, token budgets, and rate-limit retry
  • LM Studio: Fixed broken integration (wrong port, URL normalization, model listing, API key handling, header forwarding, cache invalidation)
  • Voice I/O: Streaming TTS with sentence extraction and gapless playback, VAD-based STT with auto-silence detection, per-provider backend (Groq Orpheus, OpenAI tts-1)
  • Fine-Tuning Pipeline: Complete QLoRA training for Qwen3 tool-calling, dataset generator (18 conversation types), validator, chunked CE loss for 8GB GPUs
  • Bug fixes: None-safe metadata, robust JSON parsing, MCP timeout tuning, nonlocal declaration fix, NullCtx for optional async managers
  • **ComfyUI CoPilot's very own local AI AGENT/Assistant based off Qwen/Qwen3-4B and trained QLoRa with PREMIUM data set's aimed to make ComfyUI CoPilot tool calls native. - This will be coming in the next coming days. It's training and under vigorous testing. Eventually being submitted to huggingface.

Enhanced by Claude Opus 4.6

vehoelite and others added 6 commits February 14, 2026 10:58
…LM Studio fixes

Major enhancements over upstream AIDC-AI/ComfyUI-Copilot v2.0:

- Agent Mode: Autonomous multi-step workflow building with PLAN/EXECUTE/VALIDATE/REPORT loop,
  tool budget enforcement, loop prevention, visual step tracker
- Multi-Provider: OpenAI, Groq, Anthropic, LM Studio with auto-detection,
  provider-aware timeouts, token budgets, and rate-limit retry
- LM Studio: Fixed broken integration (wrong port, URL normalization, model listing,
  API key handling, header forwarding, cache invalidation)
- Voice I/O: Streaming TTS with sentence extraction and gapless playback,
  VAD-based STT with auto-silence detection, per-provider backend (Groq Orpheus, OpenAI tts-1)
- Fine-Tuning Pipeline: Complete QLoRA training for Qwen3 tool-calling,
  dataset generator (18 conversation types), validator, chunked CE loss for 8GB GPUs
- Bug fixes: None-safe metadata, robust JSON parsing, MCP timeout tuning,
  nonlocal declaration fix, NullCtx for optional async managers

Enhanced by Claude Opus 4.6
Co-authored-by: vehoelite <145181904+vehoelite@users.noreply.github.com>
Co-authored-by: vehoelite <145181904+vehoelite@users.noreply.github.com>
Co-authored-by: vehoelite <145181904+vehoelite@users.noreply.github.com>
Co-authored-by: vehoelite <145181904+vehoelite@users.noreply.github.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 14, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ vehoelite
❌ Copilot
You have signed the CLA already but the status is still pending? Let us recheck it.

@vehoelite
Copy link
Copy Markdown
Author

vehoelite commented Feb 14, 2026

AGENT mode might inherit small bugs but I am working on them and will continue to do so.

Also I would like to Propose: Live Interaction Logging for Training Data

Add an opt-in telemetry mode that records successful agent interactions during normal ComfyUI-Copilot usage. When a user's request completes successfully (workflow built, nodes connected, parameters set correctly), the full conversation trace — user prompt, tool calls, tool responses, and final result — gets saved as a training example in OpenAI chat-completion format.

How it works:

User enables logging via a toggle in settings (off by default, privacy-first)
Only successful interactions are captured (user confirms result or workflow executes without error)
Failed/abandoned interactions are discarded or flagged as negative examples
Traces are saved locally as JSONL, same format as the synthetic training dataset
Users can review/delete logged interactions before contributing
Why this matters:

Synthetic training data (what the fine-tuning pipeline currently generates) covers template patterns but can't anticipate the full diversity of real user requests
Real interaction logs capture the actual distribution of how people use ComfyUI — uncommon node combinations, creative workflows, domain-specific terminology
Creates a data flywheel: better model → more successful interactions → more training data → even better model
Community-contributed logs (with consent) could build a shared dataset that benefits all users
Implementation scope:

A logging middleware in the agent pipeline that serializes conversation turns
A local JSONL writer with configurable output path
A UI panel to review, approve, or delete captured interactions
Export format compatible with the existing training pipeline

If this is possible, it's obvious to allow a user to opt out - privacy and security is a personal priority. Let me know what you think.

- create_agent looked for 'model_select' but agent endpoint stored as 'model'
- Now checks both keys so user's model dropdown selection is respected
- Set OpenAI provider default to gpt-4.1-mini instead of gemini-2.5-flash
- Emphasize ALWAYS search_nodes first, never guess class_types
- Emphasize ALWAYS get_node_details before building JSON
- Add COMMON MISTAKES section: single-node, guessed names, string-for-connections
- Require complete pipelines: loader -> processing -> output
- Add list_available_models step for real filenames
- ~700 tokens, fits Groq 6K budget
Add upstream contribution documentation infrastructure
@vehoelite
Copy link
Copy Markdown
Author

Really strange the thing was signed for nearly 6 hours, then suddenly it's not and it's not allowing any changes/acting like the task is done while github reports something different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants