Wrapper layer gaps: shadowed legacy deploy.py, mutable host-app singleton, and stale CLI integration cache

## Scope

In-depth review of `src/praisonai/praisonai` (wrapper layer only, per AGENTS.md).
Focus: real, validated, architectural gaps — **not** docs / tests / coverage / file size / LOC.

Three issues stood out as the highest-impact gaps for the wrapper. Each was reproduced and the proof is inline.

---

## 1. `deploy.py` is dead, shadowed by `deploy/` package — and pins a stale `praisonai` version

### Where

- `src/praisonai/praisonai/deploy.py` (legacy `CloudDeployer`, ~224 lines)
- `src/praisonai/praisonai/deploy/__init__.py` (new `Deploy` package — wins the import race)

Both live side-by-side. Python resolves `praisonai.deploy` to the **package** (`deploy/__init__.py`) and the module file is silently unreachable.

### Proof it's dead code

```bash
$ python -c "import sys; sys.path.insert(0, 'src/praisonai'); import praisonai.deploy as d; print(d.__file__); print('CloudDeployer reachable:', hasattr(d, 'CloudDeployer'))"
/.../src/praisonai/praisonai/deploy/__init__.py
CloudDeployer reachable: False
```

Every consumer in the codebase (`praisonai/__init__.py`, `cli/features/deploy.py`, `mcp_server/adapters/cli_tools.py`, `deploy/main.py`) routes to the package. The standalone `deploy.py` is unreachable from any normal import path. Confirmed via grep — no live code path references `CloudDeployer`.

### Why this is a real gap (not just stale code)

`deploy.py:60` hard-codes the install pin:

```python
# src/praisonai/praisonai/deploy.py:60
file.write("RUN pip install flask praisonai==4.6.52 gunicorn markdown\n")
```

The current `src/praisonai/praisonai/version.py` says `__version__ = "4.6.52"` — so the pin **looks** valid today and quietly rots on every release. This dead file:

1. **Traps engineers** — anyone editing "deploy" will edit this file (it's the obvious name) and observe zero behavioural change. The package wins.
2. **Misleads search/security tooling** — GitHub code search, SCA tools, dependency scanners, and AI assistants all pick this file up and treat its `praisonai==4.6.52` pin as authoritative.
3. **Duplicates real logic** — `deploy/docker.py:37` already does this correctly with no pin:
   ```python
   # src/praisonai/praisonai/deploy/docker.py:37
   RUN pip install --no-cache-dir praisonai flask flask-cors gunicorn
   ```

### Fix

Delete `src/praisonai/praisonai/deploy.py` entirely. The package supersedes it; no live importer needs it. If any external user is importing `praisonai.deploy:CloudDeployer` (highly unlikely — it's not exported anywhere), surface a deprecation via `deploy/__init__.py.__getattr__`. While here, audit `deploy/docker.py` to confirm no analogous pin lurks (it doesn't today — verified).

---

## 2. `integration/host_app.py` — module-level `_CONFIGURED` singleton + duplicate `except Exception` swallow real failures

### Where

`src/praisonai/praisonai/integration/host_app.py`

### Gap A — module-global state breaks multi-call reconfiguration

```python
# src/praisonai/praisonai/integration/host_app.py:12, 41, 115, 183
_CONFIGURED = False

def configure_host(*, ...):
    global _CONFIGURED
    ...
    _CONFIGURED = True   # set once per process

def create_host_app():
    from praisonaiui.server import create_app
    if not _CONFIGURED:
        configure_host()
    return create_app()
```

This breaks the **"multi-agent + async safe by default"** pillar in `AGENTS.md`.

- Two callers in the same process that need different host configurations cannot coexist — whoever calls `configure_host()` first wins, and `_CONFIGURED=True` swallows the second call's intent at the `create_host_app()` check.
- Worse: `configure_host()` itself does **not** guard on `_CONFIGURED` — so a second `configure_host(title="A")` followed by `configure_host(title="B")` re-mutates `aiui` global state and silently overwrites the first. The flag only short-circuits `create_host_app()`, never `configure_host()`. That asymmetry hides bugs.
- Tests cannot reliably reset host state between cases without monkey-patching the module.

### Gap B — duplicate `except Exception` is dead code that hides real failures

```python
# src/praisonai/praisonai/integration/host_app.py:171-176
        register_kanban_backends()
        
    except Exception as exc:
        log.debug("aiui backend injection failed: %s", exc)
    except Exception as exc:                                   # <-- UNREACHABLE
        log.warning("aiui backend injection failed: %s", exc)
```

Proof the second handler never runs:

```bash
$ python3 -c "
def f():
    try: raise RuntimeError('real failure')
    except Exception as exc: print('FIRST caught:', exc)
    except Exception as exc: print('SECOND (unreachable):', exc)
f()
"
FIRST caught: real failure
```

So the **real** behaviour is:
- Any failure in workflow/hooks/usage/kanban backend wiring is logged at **DEBUG** (`log.debug`), not `log.warning`.
- In production (LOGLEVEL=INFO/WARNING, the default), users see **nothing** when their hooks/workflows/usage/approvals/kanban backend wiring fails. The UI just silently doesn't work for those features.

### Fix

1. Replace the module global with caller-owned state. Either:
   - a `HostConfig` dataclass returned by `configure_host()` and passed into `create_host_app(config)`, or
   - a `contextvars.ContextVar[bool]` so each async context has its own configuration flag.
2. Remove the dead `except Exception` at line 175. Promote the surviving handler to `log.warning` (or split into `ImportError → debug` / `Exception → warning` to match the pattern used a few lines earlier in the same function for `usage_bridge` and `schedules_runner`).

---

## 3. `integrations/` — process-lifetime stale `is_available` cache and `ExternalAgentRegistry.create()` silently violates the parent's contract

### Where

- `src/praisonai/praisonai/integrations/base.py:63-110` — `BaseCLIIntegration._availability_cache`
- `src/praisonai/praisonai/integrations/registry.py:107-121` — `ExternalAgentRegistry.create()`

### Gap A — process-lifetime cache never invalidates

```python
# src/praisonai/praisonai/integrations/base.py:63-66
class BaseCLIIntegration(ABC):
    _availability_cache: Dict[str, bool] = {}      # class attribute, process-wide
    _availability_cache_lock = threading.Lock()

# base.py:92-110
@property
def is_available(self) -> bool:
    cmd = self.cli_command
    cache = BaseCLIIntegration._availability_cache
    if cmd in cache:
        return cache[cmd]                          # cached for entire process lifetime
    with BaseCLIIntegration._availability_cache_lock:
        if cmd not in cache:
            cache[cmd] = shutil.which(cmd) is not None
        return cache[cmd]
```

Consequence: any long-lived process (the dashboard UI host, the daemon scheduler, the MCP server, an interactive `praisonai chat` session) that checks `claude.is_available` once at startup will **never** notice when the user later runs `npm i -g @anthropic-ai/claude-code`. The integration stays "unavailable" for the rest of the process.

Compare with the sibling `_framework_availability.py`, which already does this correctly:

```python
# src/praisonai/praisonai/_framework_availability.py:53-58
def invalidate(name: str | None = None) -> None:
    with _lock:
        if name is None:
            _cache.clear()
        else:
            _cache.pop(name, None)
```

The wrapper has two near-identical caches (framework probes vs CLI probes) but only one of them has an invalidation hook. That's the gap.

### Gap A — fix

Add an `invalidate()` classmethod alongside the cache and call it from a CLI subcommand (`praisonai doctor --refresh`) and on `SIGHUP`/scheduler tick:

```python
# Proposed addition to base.py
@classmethod
def invalidate_availability(cls, command: str | None = None) -> None:
    with cls._availability_cache_lock:
        if command is None:
            cls._availability_cache.clear()
        else:
            cls._availability_cache.pop(command, None)
```

For long-running hosts, optionally cache with a short TTL (e.g., 30s) rather than process-lifetime — matches what users intuitively expect.

### Gap B — `ExternalAgentRegistry.create()` silently returns `None` and breaks the parent's typed contract

```python
# src/praisonai/praisonai/_registry.py:181-195  (parent)
def create(self, name: str, *args, **kwargs) -> T:
    cls = self.resolve(name)          # raises ValueError if missing
    return cls(*args, **kwargs)

# src/praisonai/praisonai/integrations/registry.py:107-121  (child override)
def create(self, name: str, **kwargs: Any) -> Optional[BaseCLIIntegration]:
    try:
        return super().create(name, **kwargs)
    except ValueError:
        return None                   # silent swallow
```

Proof of the divergence:

```bash
$ python3 -c "
import sys; sys.path.insert(0, 'src/praisonai')
from praisonai._registry import PluginRegistry

class C(PluginRegistry):
    def create(self, name, **kw):
        try: return super().create(name, **kw)
        except ValueError: return None

print('Parent contract -> raises; Child contract -> returns:', C(entry_point_group='x', builtins={}).create('typoed'))
"
Parent contract -> raises; Child contract -> returns: None
```

Why this matters:

- The Liskov-violating return type (`T` vs `Optional[T]`) means downstream code following the parent's contract (`agent = registry.create("clade")` — note typo) gets `None` instead of a clear "unknown integration: 'clade'. Available: [claude, gemini, codex, cursor]". The agent silently does nothing.
- The shim was added in `registry.py:118-121` purely to spare callers a `try/except`, but it costs them the diagnostic that lists available names — the most useful piece of the parent error.
- `create_integration()` (the public factory at `registry.py:208`) inherits the silent failure: a user typo in YAML produces a no-op, not an error.

### Gap B — fix

Remove the swallow. Let `ValueError` propagate (as parent does), or — if the caller really wants `Optional` semantics — expose a separate `try_create()` that returns `None` and keep `create()` honest. The narrower `try_create()` is closer to what the existing two callers in `_unified_registry.py` actually want.

```python
# Proposed
def try_create(self, name: str, **kwargs: Any) -> Optional[BaseCLIIntegration]:
    try:
        return super().create(name, **kwargs)
    except ValueError:
        return None
# remove the override of create()
```

---

## Why these three and not others

I reviewed several other candidates and rejected them as either lower-leverage or not strictly architectural:

- Lazy `__getattr__` if/elif ladder in `praisonai/__init__.py` (26 elif branches) when `_registry.create_lazy_getattr` exists — DRY issue, but cosmetic.
- `cli/main.py:225-236` configures root logging at module import — real, but well-known pattern.
- `BaseAutoGenerator` duplicates structured-completion in sync + async variants — refactor opportunity, not a gap.
- The async bridge daemon thread (`_async_bridge._BackgroundLoop`) — correctly handles cancellation; documented daemon semantics.

The three above are the ones that (a) actively trap users today, (b) directly contradict a stated pillar in `AGENTS.md` (multi-agent safe by default, no global singletons, protocol-driven core), and (c) have small, low-risk fixes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrapper layer gaps: shadowed legacy deploy.py, mutable host-app singleton, and stale CLI integration cache #1846

Scope

1. `deploy.py` is dead, shadowed by `deploy/` package — and pins a stale `praisonai` version

Where

Proof it's dead code

Why this is a real gap (not just stale code)

Fix

2. `integration/host_app.py` — module-level `_CONFIGURED` singleton + duplicate `except Exception` swallow real failures

Where

Gap A — module-global state breaks multi-call reconfiguration

Gap B — duplicate `except Exception` is dead code that hides real failures

Fix

3. `integrations/` — process-lifetime stale `is_available` cache and `ExternalAgentRegistry.create()` silently violates the parent's contract

Where

Gap A — process-lifetime cache never invalidates

Gap A — fix

Gap B — `ExternalAgentRegistry.create()` silently returns `None` and breaks the parent's typed contract

Gap B — fix

Why these three and not others

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Wrapper layer gaps: shadowed legacy deploy.py, mutable host-app singleton, and stale CLI integration cache #1846

Description

Scope

1. deploy.py is dead, shadowed by deploy/ package — and pins a stale praisonai version

Where

Proof it's dead code

Why this is a real gap (not just stale code)

Fix

2. integration/host_app.py — module-level _CONFIGURED singleton + duplicate except Exception swallow real failures

Where

Gap A — module-global state breaks multi-call reconfiguration

Gap B — duplicate except Exception is dead code that hides real failures

Fix

3. integrations/ — process-lifetime stale is_available cache and ExternalAgentRegistry.create() silently violates the parent's contract

Where

Gap A — process-lifetime cache never invalidates

Gap A — fix

Gap B — ExternalAgentRegistry.create() silently returns None and breaks the parent's typed contract

Gap B — fix

Why these three and not others

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `deploy.py` is dead, shadowed by `deploy/` package — and pins a stale `praisonai` version

2. `integration/host_app.py` — module-level `_CONFIGURED` singleton + duplicate `except Exception` swallow real failures

Gap B — duplicate `except Exception` is dead code that hides real failures

3. `integrations/` — process-lifetime stale `is_available` cache and `ExternalAgentRegistry.create()` silently violates the parent's contract

Gap B — `ExternalAgentRegistry.create()` silently returns `None` and breaks the parent's typed contract