Summary
I am planning to contribute several interconnected features to LocalAI that enhance P2P federation, realtime API security, function calling reasoning support, and distributed model caching. This issue serves as a discussion starter before breaking the work into smaller, reviewable PRs.
Proposed Changes
1. P2P Node Snapshot and Federation Routing (core/p2p/)
- NodeConfig struct for declarative node configuration
- discoveryTunnels full-node snapshot for cluster state synchronization
- ReplaceNodes for safe cluster membership updates
- HMAC-signed node advertisements for tamper-evident federation
2. Reasoning Content Parser (pkg/functions/)
- XMLToolCallFormat extended with three new fields for reasoning content
- ParseMsgWithXMLToolCalls now supports thinking reasoning extraction
- Enables models that emit chain-of-thought before tool calls
3. Realtime API Ephemeral Keys (core/http/endpoints/openai/realtime.go)
- 60-second HMAC-signed ephemeral keys for secure client authentication
- Optional transcription model per-session
- Any-to-any modality detection for unified audio/text sessions
4. Distributed Replica Cache (pkg/model/loader.go)
- replicaCache layer to reduce redundant FindAndLockNodeWithModel DB calls
- Per-model-ID caching with configurable TTL
5. Config YAML Endpoints (core/http/endpoints/localai/config_meta.go)
- New REST endpoints for retrieving and validating model configuration
- Supports dynamic config reloading without restart
6. MCP HTTP API Improvements (pkg/mcp/localaitools/httpapi/)
- Enhanced client with better error handling
- Additional routes for tool discovery
7. Metrics and Monitoring Enhancements (core/services/monitoring/)
- Additional backend monitoring metrics
- Improved worker file staging telemetry
8. Template Context Pipeline (core/templates/)
- Extended template loader with context-aware evaluation
- Support for dynamic template parameters
Breaking Down the Work
Per maintainer feedback, I will open separate PRs for each non-controversial subset:
- PR 1 - Config YAML endpoints (non-controversial, self-contained)
- PR 2 - Context pipeline and template loader improvements
- PR 3 - Metrics and monitoring additions
- PR 4 - MCP HTTP API routes and client hardening
- PR 5 - Reasoning parser (with tests, proper implementation)
- PR 6 - Distributed cache (with tests, document cache invalidation behavior)
- PR 7 - P2P node snapshot and ReplaceNodes (with design doc, cancellation safety)
- PR 8 - Realtime ephemeral key (with tests for HMAC round-trip)
Blocker Items to Resolve
Request for Feedback
I would appreciate early feedback on:
- The overall feature set - are all of these wanted in LocalAI?
- The P2P federation design - is a full-node snapshot the right approach?
- The ephemeral key TTL (60s) - is this appropriate?
- The distributed cache invalidation strategy - TTL vs event-driven?
Once we align on the design, I will open the individual PRs.
Related: This is a follow-up to the previously closed PR that combined all changes into one.
Summary
I am planning to contribute several interconnected features to LocalAI that enhance P2P federation, realtime API security, function calling reasoning support, and distributed model caching. This issue serves as a discussion starter before breaking the work into smaller, reviewable PRs.
Proposed Changes
1. P2P Node Snapshot and Federation Routing (core/p2p/)
2. Reasoning Content Parser (pkg/functions/)
3. Realtime API Ephemeral Keys (core/http/endpoints/openai/realtime.go)
4. Distributed Replica Cache (pkg/model/loader.go)
5. Config YAML Endpoints (core/http/endpoints/localai/config_meta.go)
6. MCP HTTP API Improvements (pkg/mcp/localaitools/httpapi/)
7. Metrics and Monitoring Enhancements (core/services/monitoring/)
8. Template Context Pipeline (core/templates/)
Breaking Down the Work
Per maintainer feedback, I will open separate PRs for each non-controversial subset:
Blocker Items to Resolve
Request for Feedback
I would appreciate early feedback on:
Once we align on the design, I will open the individual PRs.
Related: This is a follow-up to the previously closed PR that combined all changes into one.