You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After an in-depth read of src/praisonai-agents/praisonaiagents/ (core SDK) and src/praisonai/ (wrapper), three architectural gaps stand out as the most impactful to fix. Each directly contradicts one of the stated MUST principles (protocol-driven core, DRY, multi-agent + async safe by default, minimal API, agent-centric).
This issue deliberately ignores docs, tests, coverage, file sizes and line counts β it focuses only on architectural/feature gaps. All three have concrete file:line references and clean fix paths that don't need a rewrite.
Agent._chat_completion() currently branches into three mutually exclusive code paths to call the model. This is the single biggest source of drift and bugs in the core SDK.
# chat_mixin.py:541 (Path A β opt-in, NEW unified dispatch)ifgetattr(self, '_use_unified_llm_dispatch', False):
final_response=self._execute_unified_chat_completion(...)
# chat_mixin.py:556 (Path B β LEGACY custom-LLM path via LLM.get_response)elifself._using_custom_llmandhasattr(self, 'llm_instance'):
final_response=self.llm_instance.get_response(...)
# chat_mixin.py:627 (Path C β LEGACY direct OpenAI client path)else:
final_response=self._openai_client.chat_completion_with_tools(**chat_kwargs)
Why it's a problem
Duplicated behaviour at massive scale. Path B lives in llm/llm.py (tool-calling loop, streaming, token tracking, retries, display). Path C lives in llm/openai_client.py (a parallel tool-calling loop, streaming, token tracking, retries, display). Every feature has to be implemented, debugged and kept consistent twice.
Async path regressions.achat() / _execute_unified_achat_completion only uses the unified path, so features added to Path B or C (e.g. provider-specific tool parsing, per-call response_format) silently do not apply in async.
Dead code.chat_mixin.py:583-607 contains an if False: block that's been abandoned mid-refactor:
else:
# Non-streaming with custom LLM - don't show streaming-like behaviorifFalse: # Don't use display_generating when stream=False ...# This block is disabled to maintain consistency with the OpenAI path fixwith_get_live()(...):
final_response=self.llm_instance.get_response(...)
else:
final_response=self.llm_instance.get_response(...)
This is exactly the "guessing, not proof of done" anti-pattern the principles warn against.
Inconsistent features between paths. e.g. Path C sets "max_iterations": 10 and "emit_events": True inline (chat_mixin.py:646-648); Path B has its own iteration cap and event-emission plumbing inside llm.py that does not honour the same kwargs. Tool approval, tool timeout (P8/G11), retry logic and token accounting diverge accordingly.
llm/llm.py defines get_response at line 1630, get_response_stream at 3127, get_response_async at 3417 β three parallel implementations of the same loop.
llm/openai_client.py is a 2200+ line second implementation of the same loop, imported only via agent._openai_client and Path C above.
The "unified" escape hatch exists (_execute_unified_chat_completion at chat_mixin.py:772) but is opt-in via _use_unified_llm_dispatch=False and therefore never runs for normal users.
Suggested fix (non-breaking)
Flip the unified dispatch on by default: make _execute_unified_chat_completion the single entry point for both sync and async (chat, achat, _chat_completion, _achat_completion).
Delete the if False: dead block at chat_mixin.py:583-607.
Route the sync path through the same loop as async via a small sync wrapper (asyncio.run / run_coroutine_threadsafe), so that any feature added in one place ships in both surfaces automatically.
2. Half-adopted provider-adapter protocol β llm.py is still full of hardcoded provider branches
Where:
Adapter scaffolding: src/praisonai-agents/praisonaiagents/llm/adapters/__init__.py (238 lines, docstring literally says "This demonstrates the protocol-driven approach for Gap 2")
The repo already has DefaultAdapter, OllamaAdapter, AnthropicAdapter, GeminiAdapter, plus a registry and get_provider_adapter(). The Agent even stores self._provider_adapter at llm.py:416. But the adapter is only consulted in six places, mostly for supports_prompt_caching, should_summarize_tools, supports_streaming[_with_tools] and get_default_settings.
The entire tool-calling loop, the streaming path, and the async path still have inline branches like:
# llm/llm.py:2438ifself._is_ollama_provider() andnottool_callsandresponse_textandformatted_tools:
# custom JSON parsing to recover tool calls from Ollama
...
# llm/llm.py:2087ifuse_streamingandformatted_toolsandself._is_gemini_model():
# Skip streaming for Gemini + tools
...
# llm/llm.py:4591-4593ifself._is_ollama_provider(): ...
ifself._is_gemini_model(): ...
This directly contradicts "Core SDK: protocols/hooks/adapters only; heavy code in wrapper/tools" and "no provider leakage into core API". Net effect:
First-class vs second-class providers. OpenAI and Gemini get bespoke support; Anthropic cache-control, Bedrock, Ollama thinking-tokens, o1/Gemini reasoning-token accounting are all either missing or inconsistent between sync and async.
New provider = patch 30 call-sites instead of implementing one protocol.
Adapter creep.DefaultAdapter hard-codes sensible-looking but wrong defaults (e.g. supports_structured_output β False, get_max_iteration_threshold β 10) that silently degrade any provider not in the registry.
Dual branching. Because the sync and async variants of get_response are separate functions, the same provider branch is duplicated in both (e.g. the Ollama tool-recovery logic exists at both 2438 and 3775).
Suggested fix
Promote the adapter protocol to cover every decision currently done inline: parse_tool_calls(raw_response), stream_tool_calls_supported(), format_tool_result_message(), inject_cache_control(), extract_reasoning_tokens(), should_skip_streaming_with_tools(), recover_tool_calls_from_text().
Replace every if self._is_ollama_provider() / if self._is_gemini_model() / if "claude" in model_lower branch with self._provider_adapter.<method>().
Move the tool-recovery-from-text Ollama logic (llm.py:2438 and 3775) into OllamaAdapter.recover_tool_calls_from_text() β it'll be called from one place, in both sync and async.
Auto-register adapters by provider prefix so a new provider only has to subclass DefaultAdapter.
This unblocks true multi-provider parity and kills roughly 30 if branches without touching business logic.
3. Multi-agent safety is broken by module-level singletons and shared mutable state
Where:
src/praisonai-agents/praisonaiagents/knowledge/index.py:101-123 β IndexRegistry with _instance: Optional["IndexRegistry"] = None and __new__ singleton
src/praisonai-agents/praisonaiagents/memory/adapters/registry.py:37 β _memory_registry = MemoryAdapterRegistry() at module import
src/praisonai-agents/praisonaiagents/knowledge/ β same singleton pattern in query_engine.py, rerankers.py, vector_store.py
# Global variables for API server (protected by _server_lock for thread safety)_server_lock=threading.Lock()
_server_started= {} # Dict of port -> started boolean_registered_agents= {} # Dict of port -> Dict of path -> agent_id_shared_apps= {} # Dict of port -> FastAPI app
Why it's a problem
The philosophy is explicit: "Multi-agent + async safe by default. No global singletons. No heavy module-level work." The current state violates this in several ways:
IndexRegistry is a classic __new__ singleton. Two agents in the same process that want different vector stores, different index factories, or different rerankers are forced to share one process-wide registry. Agent-local knowledge isolation is impossible without monkey-patching cls._instance = None between agents.
_memory_registry is module-level mutable state. First import wins. Tests and multi-tenant servers cannot hold two configurations side-by-side without races.
_server_started / _registered_agents / _shared_apps in agent.py are process-global dicts keyed by port. The lock only protects dict mutation β not the FastAPI app state underneath. Two agents that both call .start_server(port=8000) end up sharing a single FastAPI instance, no TTL / cleanup exists, and route registration races are possible. This is a production foot-gun the moment anyone runs more than one agent in the same interpreter.
__init__.py:93 _lazy_cache = {} is also a plain module-level dict; _lazy_cache_local below it is thread-local but unused by the main lazy path (the threading.local is only kept "for backward compatibility with tests"). The main cache is shared across threads without a lock.
Why this is critical
Multi-agent and async-safe execution are the product. If two agents in the same process can't have independent knowledge indices or independent API-server state, the "production-ready" promise of the core SDK is not met regardless of how good the single-agent path is.
Suggested fix
Kill the knowledge singletons. Move IndexRegistry, QueryEngineRegistry, RerankerRegistry, VectorStoreRegistry from __new__-based singletons to plain classes. Attach instances to the owning Knowledge / Agent (or pass them explicitly). Keep the class-name unchanged for compatibility.
Kill the _memory_registry module global. Instantiate it inside MemoryManager or pass it in. Expose a process-wide default as MemoryAdapterRegistry.default() only for convenience, not as the only path.
Make API-server state per-app. Replace the three module dicts with a ServerRegistry class stored on either the running FastAPI app or on the Agent that created it. Add TTL / deregistration on agent.close().
Wrap the lazy import cache in threading.Lock or convert it to functools.lru_cache style β a plain {} shared across threads is a latent data race.
Impact if these three are fixed
Github actions fixΒ #1 deletes a ~2000-line duplicated execution stack and unifies sync/async feature parity.
MainΒ #3 makes it safe to run multiple agents, multiple knowledge bases, or a long-lived API server in one process β the intended production deployment model.
None of these require a rewrite. Each is a focused refactor behind stable public APIs (Agent(...), agent.chat(), agent.achat(), Knowledge(...), Memory(...)).
Intentionally out of scope
Per the request, this issue deliberately does not raise: docs / examples parity, test coverage, file sizes, line counts, minor perf micro-optimisations, or the many smaller wrapper-layer gaps (multiple API servers, CLI β YAML β Python parity enforcement, chainlit/gradio eager imports, deprecated Agent.__init__ params). Those are real, but strictly secondary to the three above.
Summary
After an in-depth read of
src/praisonai-agents/praisonaiagents/(core SDK) andsrc/praisonai/(wrapper), three architectural gaps stand out as the most impactful to fix. Each directly contradicts one of the statedMUSTprinciples (protocol-driven core, DRY, multi-agent + async safe by default, minimal API, agent-centric).This issue deliberately ignores docs, tests, coverage, file sizes and line counts β it focuses only on architectural/feature gaps. All three have concrete file:line references and clean fix paths that don't need a rewrite.
1. Triple LLM execution paths in
Agent._chat_completion(DRY + protocol-driven violation)Where:
src/praisonai-agents/praisonaiagents/agent/chat_mixin.py:478-653Agent._chat_completion()currently branches into three mutually exclusive code paths to call the model. This is the single biggest source of drift and bugs in the core SDK.Why it's a problem
llm/llm.py(tool-calling loop, streaming, token tracking, retries, display). Path C lives inllm/openai_client.py(a parallel tool-calling loop, streaming, token tracking, retries, display). Every feature has to be implemented, debugged and kept consistent twice.achat()/_execute_unified_achat_completiononly uses the unified path, so features added to Path B or C (e.g. provider-specific tool parsing, per-callresponse_format) silently do not apply in async.chat_mixin.py:583-607contains anif False:block that's been abandoned mid-refactor:"max_iterations": 10and"emit_events": Trueinline (chat_mixin.py:646-648); Path B has its own iteration cap and event-emission plumbing insidellm.pythat does not honour the same kwargs. Tool approval, tool timeout (P8/G11), retry logic and token accounting diverge accordingly.Concrete proof
llm/llm.pydefinesget_responseat line 1630,get_response_streamat 3127,get_response_asyncat 3417 β three parallel implementations of the same loop.llm/openai_client.pyis a 2200+ line second implementation of the same loop, imported only viaagent._openai_clientand Path C above._execute_unified_chat_completionatchat_mixin.py:772) but is opt-in via_use_unified_llm_dispatch=Falseand therefore never runs for normal users.Suggested fix (non-breaking)
_execute_unified_chat_completionthe single entry point for both sync and async (chat,achat,_chat_completion,_achat_completion).openai_client.chat_completion_with_toolsinto a thin adapter behind aLLMProviderAdapter(see gap Merge pull request #1 from MervinPraison/developΒ #2) β not a parallel executor.if False:dead block atchat_mixin.py:583-607.asyncio.run/run_coroutine_threadsafe), so that any feature added in one place ships in both surfaces automatically.2. Half-adopted provider-adapter protocol β
llm.pyis still full of hardcoded provider branchesWhere:
src/praisonai-agents/praisonaiagents/llm/adapters/__init__.py(238 lines, docstring literally says "This demonstrates the protocol-driven approach for Gap 2")src/praisonai-agents/praisonaiagents/llm/llm.py:416, 525-548, 826-834, 1346-1347, 1392-1393, 3679-3680, 3905-3906llm/llm.py:503, 507, 519, 550-568, 920, 1099, 1419, 2087, 2438, 2542, 2596, 2616, 2674, 2745, 3291, 3306, 3388, 3753, 3775, 3788, 3805, 3824, 3845, 4011, 4450, 4591-4593(~30 call-sites)Why it's a problem
The repo already has
DefaultAdapter,OllamaAdapter,AnthropicAdapter,GeminiAdapter, plus a registry andget_provider_adapter(). The Agent even storesself._provider_adapteratllm.py:416. But the adapter is only consulted in six places, mostly forsupports_prompt_caching,should_summarize_tools,supports_streaming[_with_tools]andget_default_settings.The entire tool-calling loop, the streaming path, and the async path still have inline branches like:
This directly contradicts "Core SDK: protocols/hooks/adapters only; heavy code in wrapper/tools" and "no provider leakage into core API". Net effect:
DefaultAdapterhard-codes sensible-looking but wrong defaults (e.g.supports_structured_output β False,get_max_iteration_threshold β 10) that silently degrade any provider not in the registry.get_responseare separate functions, the same provider branch is duplicated in both (e.g. the Ollama tool-recovery logic exists at both 2438 and 3775).Suggested fix
parse_tool_calls(raw_response),stream_tool_calls_supported(),format_tool_result_message(),inject_cache_control(),extract_reasoning_tokens(),should_skip_streaming_with_tools(),recover_tool_calls_from_text().if self._is_ollama_provider()/if self._is_gemini_model()/if "claude" in model_lowerbranch withself._provider_adapter.<method>().llm.py:2438and3775) intoOllamaAdapter.recover_tool_calls_from_text()β it'll be called from one place, in both sync and async.DefaultAdapter.This unblocks true multi-provider parity and kills roughly 30
ifbranches without touching business logic.3. Multi-agent safety is broken by module-level singletons and shared mutable state
Where:
src/praisonai-agents/praisonaiagents/knowledge/index.py:101-123βIndexRegistrywith_instance: Optional["IndexRegistry"] = Noneand__new__singletonsrc/praisonai-agents/praisonaiagents/memory/adapters/registry.py:37β_memory_registry = MemoryAdapterRegistry()at module importsrc/praisonai-agents/praisonaiagents/knowledge/β same singleton pattern inquery_engine.py,rerankers.py,vector_store.pysrc/praisonai-agents/praisonaiagents/agent/agent.py:172-177Why it's a problem
The philosophy is explicit: "Multi-agent + async safe by default. No global singletons. No heavy module-level work." The current state violates this in several ways:
IndexRegistryis a classic__new__singleton. Two agents in the same process that want different vector stores, different index factories, or different rerankers are forced to share one process-wide registry. Agent-local knowledge isolation is impossible without monkey-patchingcls._instance = Nonebetween agents._memory_registryis module-level mutable state. First import wins. Tests and multi-tenant servers cannot hold two configurations side-by-side without races._server_started/_registered_agents/_shared_appsinagent.pyare process-global dicts keyed by port. The lock only protects dict mutation β not the FastAPI app state underneath. Two agents that both call.start_server(port=8000)end up sharing a singleFastAPIinstance, no TTL / cleanup exists, and route registration races are possible. This is a production foot-gun the moment anyone runs more than one agent in the same interpreter.__init__.py:93 _lazy_cache = {}is also a plain module-level dict;_lazy_cache_localbelow it is thread-local but unused by the main lazy path (the threading.local is only kept "for backward compatibility with tests"). The main cache is shared across threads without a lock.Why this is critical
Multi-agent and async-safe execution are the product. If two agents in the same process can't have independent knowledge indices or independent API-server state, the "production-ready" promise of the core SDK is not met regardless of how good the single-agent path is.
Suggested fix
IndexRegistry,QueryEngineRegistry,RerankerRegistry,VectorStoreRegistryfrom__new__-based singletons to plain classes. Attach instances to the owningKnowledge/Agent(or pass them explicitly). Keep the class-name unchanged for compatibility._memory_registrymodule global. Instantiate it insideMemoryManageror pass it in. Expose a process-wide default asMemoryAdapterRegistry.default()only for convenience, not as the only path.ServerRegistryclass stored on either the running FastAPI app or on theAgentthat created it. Add TTL / deregistration onagent.close().threading.Lockor convert it tofunctools.lru_cachestyle β a plain{}shared across threads is a latent data race.Impact if these three are fixed
None of these require a rewrite. Each is a focused refactor behind stable public APIs (
Agent(...),agent.chat(),agent.achat(),Knowledge(...),Memory(...)).Intentionally out of scope
Per the request, this issue deliberately does not raise: docs / examples parity, test coverage, file sizes, line counts, minor perf micro-optimisations, or the many smaller wrapper-layer gaps (multiple API servers, CLI β YAML β Python parity enforcement, chainlit/gradio eager imports, deprecated
Agent.__init__params). Those are real, but strictly secondary to the three above.https://claude.ai/code/session_01BWpUwhLuW3TVcKcNtMDEKX