Skip to content

[Architecture] Top 3 key gaps: triple LLM execution paths, half-adopted provider adapters, multi-agent-unsafe global singletonsΒ #1362

@MervinPraison

Description

@MervinPraison

Summary

After an in-depth read of src/praisonai-agents/praisonaiagents/ (core SDK) and src/praisonai/ (wrapper), three architectural gaps stand out as the most impactful to fix. Each directly contradicts one of the stated MUST principles (protocol-driven core, DRY, multi-agent + async safe by default, minimal API, agent-centric).

This issue deliberately ignores docs, tests, coverage, file sizes and line counts β€” it focuses only on architectural/feature gaps. All three have concrete file:line references and clean fix paths that don't need a rewrite.


1. Triple LLM execution paths in Agent._chat_completion (DRY + protocol-driven violation)

Where: src/praisonai-agents/praisonaiagents/agent/chat_mixin.py:478-653

Agent._chat_completion() currently branches into three mutually exclusive code paths to call the model. This is the single biggest source of drift and bugs in the core SDK.

# chat_mixin.py:541  (Path A β€” opt-in, NEW unified dispatch)
if getattr(self, '_use_unified_llm_dispatch', False):
    final_response = self._execute_unified_chat_completion(...)

# chat_mixin.py:556  (Path B β€” LEGACY custom-LLM path via LLM.get_response)
elif self._using_custom_llm and hasattr(self, 'llm_instance'):
    final_response = self.llm_instance.get_response(...)

# chat_mixin.py:627  (Path C β€” LEGACY direct OpenAI client path)
else:
    final_response = self._openai_client.chat_completion_with_tools(**chat_kwargs)

Why it's a problem

  • Duplicated behaviour at massive scale. Path B lives in llm/llm.py (tool-calling loop, streaming, token tracking, retries, display). Path C lives in llm/openai_client.py (a parallel tool-calling loop, streaming, token tracking, retries, display). Every feature has to be implemented, debugged and kept consistent twice.
  • Async path regressions. achat() / _execute_unified_achat_completion only uses the unified path, so features added to Path B or C (e.g. provider-specific tool parsing, per-call response_format) silently do not apply in async.
  • Dead code. chat_mixin.py:583-607 contains an if False: block that's been abandoned mid-refactor:
    else:
        # Non-streaming with custom LLM - don't show streaming-like behavior
        if False:  # Don't use display_generating when stream=False ...
            # This block is disabled to maintain consistency with the OpenAI path fix
            with _get_live()(...):
                final_response = self.llm_instance.get_response(...)
        else:
            final_response = self.llm_instance.get_response(...)
    This is exactly the "guessing, not proof of done" anti-pattern the principles warn against.
  • Inconsistent features between paths. e.g. Path C sets "max_iterations": 10 and "emit_events": True inline (chat_mixin.py:646-648); Path B has its own iteration cap and event-emission plumbing inside llm.py that does not honour the same kwargs. Tool approval, tool timeout (P8/G11), retry logic and token accounting diverge accordingly.
  • Bypasses the half-finished adapter protocol (see gap Merge pull request #1 from MervinPraison/developΒ #2 below).

Concrete proof

  • llm/llm.py defines get_response at line 1630, get_response_stream at 3127, get_response_async at 3417 β€” three parallel implementations of the same loop.
  • llm/openai_client.py is a 2200+ line second implementation of the same loop, imported only via agent._openai_client and Path C above.
  • The "unified" escape hatch exists (_execute_unified_chat_completion at chat_mixin.py:772) but is opt-in via _use_unified_llm_dispatch=False and therefore never runs for normal users.

Suggested fix (non-breaking)

  1. Flip the unified dispatch on by default: make _execute_unified_chat_completion the single entry point for both sync and async (chat, achat, _chat_completion, _achat_completion).
  2. Collapse openai_client.chat_completion_with_tools into a thin adapter behind a LLMProviderAdapter (see gap Merge pull request #1 from MervinPraison/developΒ #2) β€” not a parallel executor.
  3. Delete the if False: dead block at chat_mixin.py:583-607.
  4. Route the sync path through the same loop as async via a small sync wrapper (asyncio.run / run_coroutine_threadsafe), so that any feature added in one place ships in both surfaces automatically.

2. Half-adopted provider-adapter protocol β€” llm.py is still full of hardcoded provider branches

Where:

  • Adapter scaffolding: src/praisonai-agents/praisonaiagents/llm/adapters/__init__.py (238 lines, docstring literally says "This demonstrates the protocol-driven approach for Gap 2")
  • Usage: src/praisonai-agents/praisonaiagents/llm/llm.py:416, 525-548, 826-834, 1346-1347, 1392-1393, 3679-3680, 3905-3906
  • Hardcoded provider checks that bypass the adapter: llm/llm.py:503, 507, 519, 550-568, 920, 1099, 1419, 2087, 2438, 2542, 2596, 2616, 2674, 2745, 3291, 3306, 3388, 3753, 3775, 3788, 3805, 3824, 3845, 4011, 4450, 4591-4593 (~30 call-sites)

Why it's a problem

The repo already has DefaultAdapter, OllamaAdapter, AnthropicAdapter, GeminiAdapter, plus a registry and get_provider_adapter(). The Agent even stores self._provider_adapter at llm.py:416. But the adapter is only consulted in six places, mostly for supports_prompt_caching, should_summarize_tools, supports_streaming[_with_tools] and get_default_settings.

The entire tool-calling loop, the streaming path, and the async path still have inline branches like:

# llm/llm.py:2438
if self._is_ollama_provider() and not tool_calls and response_text and formatted_tools:
    # custom JSON parsing to recover tool calls from Ollama
    ...

# llm/llm.py:2087
if use_streaming and formatted_tools and self._is_gemini_model():
    # Skip streaming for Gemini + tools
    ...

# llm/llm.py:4591-4593
if self._is_ollama_provider(): ...
if self._is_gemini_model(): ...

This directly contradicts "Core SDK: protocols/hooks/adapters only; heavy code in wrapper/tools" and "no provider leakage into core API". Net effect:

  • First-class vs second-class providers. OpenAI and Gemini get bespoke support; Anthropic cache-control, Bedrock, Ollama thinking-tokens, o1/Gemini reasoning-token accounting are all either missing or inconsistent between sync and async.
  • New provider = patch 30 call-sites instead of implementing one protocol.
  • Adapter creep. DefaultAdapter hard-codes sensible-looking but wrong defaults (e.g. supports_structured_output β†’ False, get_max_iteration_threshold β†’ 10) that silently degrade any provider not in the registry.
  • Dual branching. Because the sync and async variants of get_response are separate functions, the same provider branch is duplicated in both (e.g. the Ollama tool-recovery logic exists at both 2438 and 3775).

Suggested fix

  1. Promote the adapter protocol to cover every decision currently done inline: parse_tool_calls(raw_response), stream_tool_calls_supported(), format_tool_result_message(), inject_cache_control(), extract_reasoning_tokens(), should_skip_streaming_with_tools(), recover_tool_calls_from_text().
  2. Replace every if self._is_ollama_provider() / if self._is_gemini_model() / if "claude" in model_lower branch with self._provider_adapter.<method>().
  3. Move the tool-recovery-from-text Ollama logic (llm.py:2438 and 3775) into OllamaAdapter.recover_tool_calls_from_text() β€” it'll be called from one place, in both sync and async.
  4. Auto-register adapters by provider prefix so a new provider only has to subclass DefaultAdapter.

This unblocks true multi-provider parity and kills roughly 30 if branches without touching business logic.


3. Multi-agent safety is broken by module-level singletons and shared mutable state

Where:

  • src/praisonai-agents/praisonaiagents/knowledge/index.py:101-123 β€” IndexRegistry with _instance: Optional["IndexRegistry"] = None and __new__ singleton
  • src/praisonai-agents/praisonaiagents/memory/adapters/registry.py:37 β€” _memory_registry = MemoryAdapterRegistry() at module import
  • src/praisonai-agents/praisonaiagents/knowledge/ β€” same singleton pattern in query_engine.py, rerankers.py, vector_store.py
  • src/praisonai-agents/praisonaiagents/agent/agent.py:172-177
    # Global variables for API server (protected by _server_lock for thread safety)
    _server_lock = threading.Lock()
    _server_started = {}         # Dict of port -> started boolean
    _registered_agents = {}      # Dict of port -> Dict of path -> agent_id
    _shared_apps = {}            # Dict of port -> FastAPI app

Why it's a problem

The philosophy is explicit: "Multi-agent + async safe by default. No global singletons. No heavy module-level work." The current state violates this in several ways:

  1. IndexRegistry is a classic __new__ singleton. Two agents in the same process that want different vector stores, different index factories, or different rerankers are forced to share one process-wide registry. Agent-local knowledge isolation is impossible without monkey-patching cls._instance = None between agents.
    # knowledge/index.py:101
    class IndexRegistry:
        _instance: Optional["IndexRegistry"] = None
        def __new__(cls) -> "IndexRegistry":
            if cls._instance is None:
                cls._instance = super().__new__(cls)
                cls._instance._indices: Dict[str, Callable[..., IndexProtocol]] = {}
            return cls._instance
  2. _memory_registry is module-level mutable state. First import wins. Tests and multi-tenant servers cannot hold two configurations side-by-side without races.
  3. _server_started / _registered_agents / _shared_apps in agent.py are process-global dicts keyed by port. The lock only protects dict mutation β€” not the FastAPI app state underneath. Two agents that both call .start_server(port=8000) end up sharing a single FastAPI instance, no TTL / cleanup exists, and route registration races are possible. This is a production foot-gun the moment anyone runs more than one agent in the same interpreter.
  4. __init__.py:93 _lazy_cache = {} is also a plain module-level dict; _lazy_cache_local below it is thread-local but unused by the main lazy path (the threading.local is only kept "for backward compatibility with tests"). The main cache is shared across threads without a lock.

Why this is critical

Multi-agent and async-safe execution are the product. If two agents in the same process can't have independent knowledge indices or independent API-server state, the "production-ready" promise of the core SDK is not met regardless of how good the single-agent path is.

Suggested fix

  1. Kill the knowledge singletons. Move IndexRegistry, QueryEngineRegistry, RerankerRegistry, VectorStoreRegistry from __new__-based singletons to plain classes. Attach instances to the owning Knowledge / Agent (or pass them explicitly). Keep the class-name unchanged for compatibility.
  2. Kill the _memory_registry module global. Instantiate it inside MemoryManager or pass it in. Expose a process-wide default as MemoryAdapterRegistry.default() only for convenience, not as the only path.
  3. Make API-server state per-app. Replace the three module dicts with a ServerRegistry class stored on either the running FastAPI app or on the Agent that created it. Add TTL / deregistration on agent.close().
  4. Wrap the lazy import cache in threading.Lock or convert it to functools.lru_cache style β€” a plain {} shared across threads is a latent data race.

Impact if these three are fixed

  • Github actions fixΒ #1 deletes a ~2000-line duplicated execution stack and unifies sync/async feature parity.
  • Merge pull request #1 from MervinPraison/developΒ #2 lets the SDK add a new provider by subclassing one adapter instead of patching 30 branches, and finally matches the "protocol-driven core" principle.
  • MainΒ #3 makes it safe to run multiple agents, multiple knowledge bases, or a long-lived API server in one process β€” the intended production deployment model.

None of these require a rewrite. Each is a focused refactor behind stable public APIs (Agent(...), agent.chat(), agent.achat(), Knowledge(...), Memory(...)).


Intentionally out of scope

Per the request, this issue deliberately does not raise: docs / examples parity, test coverage, file sizes, line counts, minor perf micro-optimisations, or the many smaller wrapper-layer gaps (multiple API servers, CLI ↔ YAML ↔ Python parity enforcement, chainlit/gradio eager imports, deprecated Agent.__init__ params). Those are real, but strictly secondary to the three above.

https://claude.ai/code/session_01BWpUwhLuW3TVcKcNtMDEKX

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingclaudeAuto-trigger Claude analysisdocumentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions