Skip to content

Conversation

@christian-byrne
Copy link
Contributor

@christian-byrne christian-byrne commented Jun 8, 2025

Four main goals of this PR:

Improved Task Tracking

Fixed inconsistencies in the progress tracker dialog during installations. Previously, task state wasn't tracked server-side, forcing the frontend to manually track queue state.

  • Tasks now have UUIDs and client IDs for proper identification
  • Maintain task history and queue server-side with thread-safe operations
  • WebSocket notifications when tasks start/complete keep clients synchronized
  • Failed tasks are tracked without interrupting the queue
  • Tasks can be cancelled by ID
  • Frontend receives consistent stdout logs for each task (previously buggy due to manual tracking)

Remove Legacy Features

  • Cleaning up features being replaced by other projects (native model management, subgraphs)
  • Removed unsafe routes like git URL downloads and pip installs - as part of core HTTP API, only registry-validated (security scanned) packages should be downloadable
  • Legacy features still available in legacy sub-package, accessible via --enable_manager_legacy_ui CLI flag

Type-Safe API-Driven Development

  • Generated OpenAPI specification for Manager HTTP API
  • Both frontend and backend generate types/SDKs from the same spec, enforcing contracts
  • Frontend uses TypeScript checking, backend uses Pydantic validation
  • OpenAPI validated against actual server in CI/CD
  • Auto-generated documentation

Set Groundwork for Restore Points

  • Captures environment state before/after task batches (ComfyUI version, frontend version, custom nodes, python environment)
  • State schemas defined in OpenAPI with validation
  • HTTP routes for accessing states (later, migration will be possible as part of snapshot/restore feature)

Additional refactoring includes splitting very large files out into more modules and applying ruff/pylint.

Test plan

  • Test task queue operations with client_id tracking
  • Verify batch history files are created correctly
  • Test OpenAPI model generation with datamodel-codegen
  • Verify all REST endpoints work with new models
  • Test queue status filtering by client_id
  • Verify tasks are partitioned by failed and succeeded

@ltdrdata
Copy link
Member

ltdrdata commented Jun 8, 2025

Was it intentional to include /dist dir?

@christian-byrne christian-byrne marked this pull request as draft June 9, 2025 01:37
@christian-byrne
Copy link
Contributor Author

Was it intentional to include /dist dir?

No. Good catch.

Also, this should be a draft PR until changes are finalized and the test plan is complete.

@christian-byrne christian-byrne marked this pull request as ready for review June 14, 2025 02:27
- Add client_id field to QueueTaskItem and TaskHistoryItem models
- Implement client-specific WebSocket message routing
- Add client filtering to queue status and history endpoints
- Follow ComfyUI patterns for session management
- Create data_models package for better code organization
… models

Enhances ComfyUI Manager with robust batch execution tracking and unified data model architecture:

- Implemented automatic batch history serialization with before/after system state snapshots
- Added comprehensive state management capturing installed nodes, models, and ComfyUI version info
- Enhanced task queue with proper client ID handling and WebSocket notifications
- Migrated all data models to OpenAPI-generated Pydantic models for consistency
- Added documentation for new TaskQueue methods (done_count, total_count, finalize)
- Fixed 64 linting errors with proper imports and code cleanup

Technical improvements:
- All models now auto-generated from openapi.yaml ensuring API/implementation consistency
- Batch tracking captures complete system state at operation start and completion
- Enhanced REST endpoints with comprehensive documentation
- Removed manual model files in favor of single source of truth
- Added helper methods for system state capture and batch lifecycle management
- Removed completed TODO comments about code quality checks and client_id handling
- Updated comments to reflect implemented features
- Fixed ruff linting errors:
  - Removed duplicate constant definitions
  - Added missing locale import
  - Fixed unused imports
  - Moved is_local_mode logic to security_utils module
  - Added model_dir_name_map import to model_utils

All ruff checks now pass successfully.
- Updated all POST endpoints to use proper Pydantic model validation:
  - `/v2/manager/queue/task` - validates QueueTaskItem
  - `/v2/manager/queue/install_model` - validates ModelMetadata
  - `/v2/manager/queue/reinstall` - validates InstallPackParams
  - `/v2/customnode/import_fail_info` - validates cnr_id/url fields

- Added proper error handling with ValidationError for detailed error messages
- Updated TaskQueue.put() to handle both dict and Pydantic model inputs
- Added missing imports: InstallPackParams, ModelMetadata, ValidationError

Benefits:
- Early validation catches invalid data at API boundaries
- Better error messages for clients with specific validation failures
- Type safety throughout the request processing pipeline
- Consistent validation behavior across all endpoints

All ruff checks pass and validation is now enabled by default.
- Replace deprecated openapi-spec-validator with @redocly/cli
- Remove fragile custom regex-based route alignment script
- Use industry-standard OpenAPI validation tooling
- Switch from Python to Node.js for validation pipeline
- New validation catches 41 errors and 141 warnings that old validator missed
- Eliminate TaskQueue.ExecutionStatus NamedTuple in favor of generated TaskExecutionStatus Pydantic model
- Remove manual conversion logic between NamedTuple and Pydantic model
- Use single source of truth for task execution status
- Clean up unused imports (Literal, NamedTuple)
- Maintain consistent data model usage throughout TaskQueue
- Add proper async worker management to TaskQueue class
- Remove redundant task_worker_thread and task_worker_lock global variables
- Replace manual threading with async task management
- Update is_processing() logic to use TaskQueue state instead of thread status
- Implement automatic worker cleanup when queue processing completes
- Simplify queue start endpoint to use TaskQueue.start_worker()
- Convert all nullable: true to OpenAPI 3.1 format using type: [type, 'null']
- Fix invalid array schema definition in ManagerMappings using oneOf
- Add default security: [] configuration to satisfy security-defined rule
- All 41 validation errors resolved, spec now passes with 0 errors
- 141 warnings remain (mostly missing operationId and example validation)
- Updated generated_models.py to reflect OpenAPI 3.1 nullable format changes
- Models now use Optional[type] instead of nullable: true
- All affected models regenerated with datamodel-codegen
- Syntax and linting checks pass
- Add tj-actions/changed-files to detect modified files in PR
- Only run OpenAPI validation if openapi.yaml was changed
- Only run Python linting on changed Python files (excluding legacy/)
- Remove incorrect "pip install ast" dependency
- Remove non-standard AST parsing and import checks
- Makes CI more efficient and prevents unrelated failures
@christian-byrne christian-byrne force-pushed the feat/implement-batch-tracking-clean branch from 6cd9b70 to 7a73f5d Compare June 14, 2025 02:51
- Fix async/sync mismatch in TaskQueue worker implementation
- Use threading.Thread with asyncio.run() as originally designed
- Remove incorrect async task approach that caused blocking issues
- TaskQueue now properly manages its own thread lifecycle
- Resolves WebSocket message delivery and task processing issues
- Convert batch tracking messages to debug level (batch start, history saved)
- Convert task processing details to debug level
- Convert cache update messages to debug level
- Replace print() with logging.debug() for task processing
- Keep user-relevant messages at info level (ComfyUI updates, installation success)
- Resolves verbose output appearing without --verbose flag
@christian-byrne christian-byrne force-pushed the feat/implement-batch-tracking-clean branch from 834f1fd to c888ea6 Compare June 14, 2025 04:20
christian-byrne and others added 4 commits June 17, 2025 10:36
- Add SecurityLevel enum with strong/normal/normal-/weak values
- Add RiskLevel enum with block/high/middle values
- These will be used for security policy management

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
christian-byrne and others added 14 commits June 17, 2025 13:06
- Add SecurityLevel and RiskLevel enums to generated models
- Enhance ComfyUISystemState with additional system information fields:
  - comfyui_root_path: ComfyUI installation directory
  - model_paths: Map of model types to configured paths
  - manager_version: ComfyUI Manager version
  - security_level: Current security configuration
  - network_mode: Network mode (online/offline/private)
  - cli_args: Selected CLI arguments
  - custom_nodes_count: Total number of custom nodes
  - failed_imports: List of failed imports
  - pip_packages: Installed pip packages

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Removed do_update_all function that was never called and only returned an error
- Removed "update-all" from OperationType enum as it's no longer used
- Regenerated data models to reflect the enum change

The update_all functionality now properly creates individual update tasks through the API endpoint rather than being a single monolithic task.

Co-Authored-By: Claude <[email protected]>
- Added query parameter models to OpenAPI spec for GET endpoints
- Regenerated data models to include new query param models
- Replaced manual validation with Pydantic model validation
- Removed obsolete validate_required_params helper function
- Provides better error messages and type safety for API endpoints

Co-Authored-By: Claude <[email protected]>
@christian-byrne
Copy link
Contributor Author

Test: Client ID Filtering ✔️

Test that task queue operations work with multiple connected clients and each client only sees the task states that they queued.

manage-custom-nodes-per-client.mp4

@christian-byrne
Copy link
Contributor Author

Test: Batch history record writing ✔️

Test that after the task queue is empty for 4 seconds consecutively after tasks were in process, a batch history record is written to disk. The batch history record includes a large number of properties representing the state before and after the operations, as well as the operations themselves.

Batch history records are kept for 16 days and are about 25kb per record (json file).

manager-batch-history-demo.mp4

@christian-byrne
Copy link
Contributor Author

Test: Tasks partitioned based on failed and succeeded in the UI ✔️

Test that failed tasks are put into separate section in the UI

manager-demo-task-failed-split.mp4

@christian-byrne
Copy link
Contributor Author

The test plan is now complete. The next step is to package all three feature branches into easily-testable portable release and begin QA testing. This will start tomorrow.

@christian-byrne christian-byrne merged commit cb0fa58 into draft-v4 Jun 22, 2025
6 checks passed
@ltdrdata ltdrdata deleted the feat/implement-batch-tracking-clean branch August 19, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants