Skip to content

Feature Discussion: P2P Federation, Realtime Ephemeral Keys, Reasoning Parser, and Distributed Cache #10317

@ghshhf

Description

@ghshhf

Summary

I am planning to contribute several interconnected features to LocalAI that enhance P2P federation, realtime API security, function calling reasoning support, and distributed model caching. This issue serves as a discussion starter before breaking the work into smaller, reviewable PRs.

Proposed Changes

1. P2P Node Snapshot and Federation Routing (core/p2p/)

  • NodeConfig struct for declarative node configuration
  • discoveryTunnels full-node snapshot for cluster state synchronization
  • ReplaceNodes for safe cluster membership updates
  • HMAC-signed node advertisements for tamper-evident federation

2. Reasoning Content Parser (pkg/functions/)

  • XMLToolCallFormat extended with three new fields for reasoning content
  • ParseMsgWithXMLToolCalls now supports thinking reasoning extraction
  • Enables models that emit chain-of-thought before tool calls

3. Realtime API Ephemeral Keys (core/http/endpoints/openai/realtime.go)

  • 60-second HMAC-signed ephemeral keys for secure client authentication
  • Optional transcription model per-session
  • Any-to-any modality detection for unified audio/text sessions

4. Distributed Replica Cache (pkg/model/loader.go)

  • replicaCache layer to reduce redundant FindAndLockNodeWithModel DB calls
  • Per-model-ID caching with configurable TTL

5. Config YAML Endpoints (core/http/endpoints/localai/config_meta.go)

  • New REST endpoints for retrieving and validating model configuration
  • Supports dynamic config reloading without restart

6. MCP HTTP API Improvements (pkg/mcp/localaitools/httpapi/)

  • Enhanced client with better error handling
  • Additional routes for tool discovery

7. Metrics and Monitoring Enhancements (core/services/monitoring/)

  • Additional backend monitoring metrics
  • Improved worker file staging telemetry

8. Template Context Pipeline (core/templates/)

  • Extended template loader with context-aware evaluation
  • Support for dynamic template parameters

Breaking Down the Work

Per maintainer feedback, I will open separate PRs for each non-controversial subset:

  1. PR 1 - Config YAML endpoints (non-controversial, self-contained)
  2. PR 2 - Context pipeline and template loader improvements
  3. PR 3 - Metrics and monitoring additions
  4. PR 4 - MCP HTTP API routes and client hardening
  5. PR 5 - Reasoning parser (with tests, proper implementation)
  6. PR 6 - Distributed cache (with tests, document cache invalidation behavior)
  7. PR 7 - P2P node snapshot and ReplaceNodes (with design doc, cancellation safety)
  8. PR 8 - Realtime ephemeral key (with tests for HMAC round-trip)

Blocker Items to Resolve

  • Dead code: GetConfigEndpointShutdown, StartReplicaCache, LoadModel - remove or implement with tests
  • Reasoning parser no-op branches: proper implementation needed
  • P2P cancellation safety: needs test under churn
  • HMAC ephemeral key details: sign userID, fix response fields
  • Tests: HMAC handshake, P2P snapshot/cancel, flag/validation round-trips
  • AI-assisted code review: add Assisted-by trailer per project guidelines

Request for Feedback

I would appreciate early feedback on:

  1. The overall feature set - are all of these wanted in LocalAI?
  2. The P2P federation design - is a full-node snapshot the right approach?
  3. The ephemeral key TTL (60s) - is this appropriate?
  4. The distributed cache invalidation strategy - TTL vs event-driven?

Once we align on the design, I will open the individual PRs.


Related: This is a follow-up to the previously closed PR that combined all changes into one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions