[Architecture] Top 3 key gaps: triple LLM execution paths, half-adopted provider adapters, multi-agent-unsafe global singletons

## Summary

After an in-depth read of `src/praisonai-agents/praisonaiagents/` (core SDK) and `src/praisonai/` (wrapper), three architectural gaps stand out as the **most impactful** to fix. Each directly contradicts one of the stated `MUST` principles (protocol-driven core, DRY, multi-agent + async safe by default, minimal API, agent-centric).

This issue deliberately ignores docs, tests, coverage, file sizes and line counts — it focuses only on architectural/feature gaps. All three have concrete file:line references and clean fix paths that don't need a rewrite.

---

## 1. Triple LLM execution paths in `Agent._chat_completion` (DRY + protocol-driven violation)

**Where:** `src/praisonai-agents/praisonaiagents/agent/chat_mixin.py:478-653`

`Agent._chat_completion()` currently branches into **three** mutually exclusive code paths to call the model. This is the single biggest source of drift and bugs in the core SDK.

```python
# chat_mixin.py:541  (Path A — opt-in, NEW unified dispatch)
if getattr(self, '_use_unified_llm_dispatch', False):
    final_response = self._execute_unified_chat_completion(...)

# chat_mixin.py:556  (Path B — LEGACY custom-LLM path via LLM.get_response)
elif self._using_custom_llm and hasattr(self, 'llm_instance'):
    final_response = self.llm_instance.get_response(...)

# chat_mixin.py:627  (Path C — LEGACY direct OpenAI client path)
else:
    final_response = self._openai_client.chat_completion_with_tools(**chat_kwargs)
```

### Why it's a problem

- **Duplicated behaviour at massive scale.** Path B lives in `llm/llm.py` (tool-calling loop, streaming, token tracking, retries, display). Path C lives in `llm/openai_client.py` (a parallel tool-calling loop, streaming, token tracking, retries, display). Every feature has to be implemented, debugged and kept consistent twice.
- **Async path regressions.** `achat()` / `_execute_unified_achat_completion` only uses the unified path, so features added to Path B or C (e.g. provider-specific tool parsing, per-call `response_format`) silently do not apply in async.
- **Dead code.** `chat_mixin.py:583-607` contains an `if False:` block that's been abandoned mid-refactor:
    ```python
    else:
        # Non-streaming with custom LLM - don't show streaming-like behavior
        if False:  # Don't use display_generating when stream=False ...
            # This block is disabled to maintain consistency with the OpenAI path fix
            with _get_live()(...):
                final_response = self.llm_instance.get_response(...)
        else:
            final_response = self.llm_instance.get_response(...)
    ```
    This is exactly the "guessing, not proof of done" anti-pattern the principles warn against.
- **Inconsistent features between paths.** e.g. Path C sets `"max_iterations": 10` and `"emit_events": True` inline (chat_mixin.py:646-648); Path B has its own iteration cap and event-emission plumbing inside `llm.py` that does not honour the same kwargs. Tool approval, tool timeout (P8/G11), retry logic and token accounting diverge accordingly.
- **Bypasses the half-finished adapter protocol** (see gap #2 below).

### Concrete proof

- `llm/llm.py` defines `get_response` at line 1630, `get_response_stream` at 3127, `get_response_async` at 3417 — three parallel implementations of the same loop.
- `llm/openai_client.py` is a **2200+ line** second implementation of the same loop, imported only via `agent._openai_client` and Path C above.
- The "unified" escape hatch exists (`_execute_unified_chat_completion` at `chat_mixin.py:772`) but is opt-in via `_use_unified_llm_dispatch=False` and therefore never runs for normal users.

### Suggested fix (non-breaking)

1. Flip the unified dispatch on by default: make `_execute_unified_chat_completion` the single entry point for both sync and async (`chat`, `achat`, `_chat_completion`, `_achat_completion`).
2. Collapse `openai_client.chat_completion_with_tools` into a thin adapter behind a `LLMProviderAdapter` (see gap #2) — not a parallel executor.
3. Delete the `if False:` dead block at `chat_mixin.py:583-607`.
4. Route the sync path through the same loop as async via a small sync wrapper (`asyncio.run` / `run_coroutine_threadsafe`), so that any feature added in one place ships in both surfaces automatically.

---

## 2. Half-adopted provider-adapter protocol — `llm.py` is still full of hardcoded provider branches

**Where:**
- Adapter scaffolding: `src/praisonai-agents/praisonaiagents/llm/adapters/__init__.py` (238 lines, docstring literally says *"This demonstrates the protocol-driven approach for Gap 2"*)
- Usage: `src/praisonai-agents/praisonaiagents/llm/llm.py:416, 525-548, 826-834, 1346-1347, 1392-1393, 3679-3680, 3905-3906`
- Hardcoded provider checks that bypass the adapter: `llm/llm.py:503, 507, 519, 550-568, 920, 1099, 1419, 2087, 2438, 2542, 2596, 2616, 2674, 2745, 3291, 3306, 3388, 3753, 3775, 3788, 3805, 3824, 3845, 4011, 4450, 4591-4593` (**~30 call-sites**)

### Why it's a problem

The repo already has `DefaultAdapter`, `OllamaAdapter`, `AnthropicAdapter`, `GeminiAdapter`, plus a registry and `get_provider_adapter()`. The Agent even stores `self._provider_adapter` at `llm.py:416`. But the adapter is only consulted in **six** places, mostly for `supports_prompt_caching`, `should_summarize_tools`, `supports_streaming[_with_tools]` and `get_default_settings`.

The **entire tool-calling loop, the streaming path, and the async path** still have inline branches like:

```python
# llm/llm.py:2438
if self._is_ollama_provider() and not tool_calls and response_text and formatted_tools:
    # custom JSON parsing to recover tool calls from Ollama
    ...

# llm/llm.py:2087
if use_streaming and formatted_tools and self._is_gemini_model():
    # Skip streaming for Gemini + tools
    ...

# llm/llm.py:4591-4593
if self._is_ollama_provider(): ...
if self._is_gemini_model(): ...
```

This directly contradicts *"Core SDK: protocols/hooks/adapters only; heavy code in wrapper/tools"* and *"no provider leakage into core API"*. Net effect:

- **First-class vs second-class providers.** OpenAI and Gemini get bespoke support; Anthropic cache-control, Bedrock, Ollama thinking-tokens, o1/Gemini reasoning-token accounting are all either missing or inconsistent between sync and async.
- **New provider = patch 30 call-sites** instead of implementing one protocol.
- **Adapter creep.** `DefaultAdapter` hard-codes sensible-looking but wrong defaults (e.g. `supports_structured_output → False`, `get_max_iteration_threshold → 10`) that silently degrade any provider not in the registry.
- **Dual branching.** Because the sync and async variants of `get_response` are separate functions, the same provider branch is duplicated in both (e.g. the Ollama tool-recovery logic exists at both 2438 and 3775).

### Suggested fix

1. Promote the adapter protocol to cover every decision currently done inline: `parse_tool_calls(raw_response)`, `stream_tool_calls_supported()`, `format_tool_result_message()`, `inject_cache_control()`, `extract_reasoning_tokens()`, `should_skip_streaming_with_tools()`, `recover_tool_calls_from_text()`.
2. Replace every `if self._is_ollama_provider()` / `if self._is_gemini_model()` / `if "claude" in model_lower` branch with `self._provider_adapter.<method>()`.
3. Move the tool-recovery-from-text Ollama logic (`llm.py:2438` and `3775`) into `OllamaAdapter.recover_tool_calls_from_text()` — it'll be called from one place, in both sync and async.
4. Auto-register adapters by provider prefix so a new provider only has to subclass `DefaultAdapter`.

This unblocks true multi-provider parity and kills roughly 30 `if` branches without touching business logic.

---

## 3. Multi-agent safety is broken by module-level singletons and shared mutable state

**Where:**
- `src/praisonai-agents/praisonaiagents/knowledge/index.py:101-123` — `IndexRegistry` with `_instance: Optional["IndexRegistry"] = None` and `__new__` singleton
- `src/praisonai-agents/praisonaiagents/memory/adapters/registry.py:37` — `_memory_registry = MemoryAdapterRegistry()` at module import
- `src/praisonai-agents/praisonaiagents/knowledge/` — same singleton pattern in `query_engine.py`, `rerankers.py`, `vector_store.py`
- `src/praisonai-agents/praisonaiagents/agent/agent.py:172-177`
    ```python
    # Global variables for API server (protected by _server_lock for thread safety)
    _server_lock = threading.Lock()
    _server_started = {}         # Dict of port -> started boolean
    _registered_agents = {}      # Dict of port -> Dict of path -> agent_id
    _shared_apps = {}            # Dict of port -> FastAPI app
    ```

### Why it's a problem

The philosophy is explicit: *"Multi-agent + async safe by default. No global singletons. No heavy module-level work."* The current state violates this in several ways:

1. **`IndexRegistry` is a classic `__new__` singleton.** Two agents in the same process that want **different** vector stores, different index factories, or different rerankers are forced to share one process-wide registry. Agent-local knowledge isolation is impossible without monkey-patching `cls._instance = None` between agents.
    ```python
    # knowledge/index.py:101
    class IndexRegistry:
        _instance: Optional["IndexRegistry"] = None
        def __new__(cls) -> "IndexRegistry":
            if cls._instance is None:
                cls._instance = super().__new__(cls)
                cls._instance._indices: Dict[str, Callable[..., IndexProtocol]] = {}
            return cls._instance
    ```
2. **`_memory_registry` is module-level mutable state.** First import wins. Tests and multi-tenant servers cannot hold two configurations side-by-side without races.
3. **`_server_started` / `_registered_agents` / `_shared_apps` in `agent.py`** are process-global dicts keyed by port. The lock only protects dict mutation — not the FastAPI app state underneath. Two agents that both call `.start_server(port=8000)` end up sharing a single `FastAPI` instance, no TTL / cleanup exists, and route registration races are possible. This is a production foot-gun the moment anyone runs more than one agent in the same interpreter.
4. **`__init__.py:93 _lazy_cache = {}`** is also a plain module-level dict; `_lazy_cache_local` below it is thread-local but unused by the main lazy path (the threading.local is only kept "for backward compatibility with tests"). The main cache is shared across threads without a lock.

### Why this is critical

Multi-agent and async-safe execution are the *product*. If two agents in the same process can't have independent knowledge indices or independent API-server state, the "production-ready" promise of the core SDK is not met regardless of how good the single-agent path is.

### Suggested fix

1. **Kill the knowledge singletons.** Move `IndexRegistry`, `QueryEngineRegistry`, `RerankerRegistry`, `VectorStoreRegistry` from `__new__`-based singletons to plain classes. Attach instances to the owning `Knowledge` / `Agent` (or pass them explicitly). Keep the class-name unchanged for compatibility.
2. **Kill the `_memory_registry` module global.** Instantiate it inside `MemoryManager` or pass it in. Expose a process-wide default as `MemoryAdapterRegistry.default()` only for convenience, not as the only path.
3. **Make API-server state per-app.** Replace the three module dicts with a `ServerRegistry` class stored on either the running FastAPI app or on the `Agent` that created it. Add TTL / deregistration on `agent.close()`.
4. **Wrap the lazy import cache** in `threading.Lock` or convert it to `functools.lru_cache` style — a plain `{}` shared across threads is a latent data race.

---

## Impact if these three are fixed

- **#1** deletes a ~2000-line duplicated execution stack and unifies sync/async feature parity.
- **#2** lets the SDK add a new provider by subclassing one adapter instead of patching 30 branches, and finally matches the "protocol-driven core" principle.
- **#3** makes it safe to run multiple agents, multiple knowledge bases, or a long-lived API server in one process — the intended production deployment model.

None of these require a rewrite. Each is a focused refactor behind stable public APIs (`Agent(...)`, `agent.chat()`, `agent.achat()`, `Knowledge(...)`, `Memory(...)`).

---

## Intentionally out of scope

Per the request, this issue deliberately does not raise: docs / examples parity, test coverage, file sizes, line counts, minor perf micro-optimisations, or the many smaller wrapper-layer gaps (multiple API servers, CLI ↔ YAML ↔ Python parity enforcement, chainlit/gradio eager imports, deprecated `Agent.__init__` params). Those are real, but strictly secondary to the three above.

https://claude.ai/code/session_01BWpUwhLuW3TVcKcNtMDEKX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Architecture] Top 3 key gaps: triple LLM execution paths, half-adopted provider adapters, multi-agent-unsafe global singletons #1362

Summary

1. Triple LLM execution paths in `Agent._chat_completion` (DRY + protocol-driven violation)

Why it's a problem

Concrete proof

Suggested fix (non-breaking)

2. Half-adopted provider-adapter protocol — `llm.py` is still full of hardcoded provider branches

Why it's a problem

Suggested fix

3. Multi-agent safety is broken by module-level singletons and shared mutable state

Why it's a problem

Why this is critical

Suggested fix

Impact if these three are fixed

Intentionally out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Architecture] Top 3 key gaps: triple LLM execution paths, half-adopted provider adapters, multi-agent-unsafe global singletons #1362

Description

Summary

1. Triple LLM execution paths in Agent._chat_completion (DRY + protocol-driven violation)

Why it's a problem

Concrete proof

Suggested fix (non-breaking)

2. Half-adopted provider-adapter protocol — llm.py is still full of hardcoded provider branches

Why it's a problem

Suggested fix

3. Multi-agent safety is broken by module-level singletons and shared mutable state

Why it's a problem

Why this is critical

Suggested fix

Impact if these three are fixed

Intentionally out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Triple LLM execution paths in `Agent._chat_completion` (DRY + protocol-driven violation)

2. Half-adopted provider-adapter protocol — `llm.py` is still full of hardcoded provider branches