docs: update guardrails, thread-safety, and resource-lifecycle pages for PR #1514 architectural fixes

## Context

This issue tracks documentation updates required by the architectural fixes landed in **MervinPraison/PraisonAI#1514** (merged 2026-04-22, fixes issue #1507).

PR #1514 changes runtime behavior in three areas that are user-visible and already documented. The existing docs pages now describe behavior that no longer matches the SDK. Any future agent reading the docs will write code against behaviors that changed, so these pages must be corrected.

> Placement rule reminder (per AGENTS.md §1.8): **do not edit `docs/concepts/`**. All updates below go to `docs/features/` and `docs/best-practices/`. `docs/concepts/guardrails.mdx` is mentioned only as a cross-reference — do **not** modify it without explicit human approval.

---

## Source of truth — PR #1514

**PR:** https://github.com/MervinPraison/PraisonAI/pull/1514
**Head SHA:** `971d217c44d8643b77b4aa13fc3da94f4a3da8e6`
**Files changed (in `praisonai-package/src/praisonai-agents/` → map to repo-root `praisonaiagents/` in PraisonAIDocs):**

- `praisonaiagents/agent/agent.py` (+18 / −2)
- `praisonaiagents/agents/agents.py` (+34 / −2)
- `praisonaiagents/memory/memory.py` (+9 / −0)
- `praisonaiagents/process/process.py` (+31 / −17)
- `praisonaiagents/task/task.py` (+77 / −60)
- `test_architectural_fixes.py` (tests)

Before writing any documentation content, an implementing agent **must read the SDK files above from the synced `praisonaiagents/` tree at repo root** (per AGENTS.md §1.1 and §1.3 — SDK-first verification).

---

## Gap 1 — Guardrail retry now actually runs

### What changed in the SDK

- `praisonaiagents/agents/agents.py:1017-1038` — guardrail validation was moved **out of the callback** and **into the main async execution loop** (`arun_task`).
- `praisonaiagents/task/task.py` — the guardrail branch was **removed from `execute_callback()`**. `execute_callback` now only runs memory/user callbacks. The docstring explicitly says: *"Guardrail validation has been moved to the execution path in agents.py to ensure proper retry behavior."*
- `execute_callback_sync()` no longer uses fire-and-forget `loop.create_task(...)`. It always goes through `run_coroutine_from_any_context(...)` so exceptions from the callback propagate instead of being silently swallowed.
- On guardrail failure the executor now does the real retry: increments `task.retry_count`, sets `task.status = "in progress"`, logs the retry, and `continue`s the loop. On final failure it raises `"Task failed guardrail validation after {max_retries} retries"`.
- On guardrail success with a modified result, `task_output.raw` (or the whole `TaskOutput`) is replaced and `task.result` is updated **before** the task is marked completed.

### Why this matters for docs

`docs/features/guardrails.mdx` (and `docs/best-practices/agent-retry-strategies.mdx` where it intersects with guardrails) already tell users that `max_retries` / `retry_delay` / `retry_with_feedback` drive retries on failed validations. That used to be partially true only in the sync path; in async it was bypassed. Users who wrote production async workflows before PR #1514 may have been silently losing validation failures. Docs should:

1. State plainly that guardrail retries apply to **both sync and async** execution paths.
2. Make clear the retry happens **before** the task is marked `completed` (ordering matters for memory/user callbacks — those only fire on a guardrail-passing result).
3. Clarify that when a guardrail returns a modified `TaskOutput`/`str`, the downstream `task.result` and any memory callback receive the **modified** value, not the original.
4. Remove (or correct) any older wording implying that a failed guardrail in async mode would be logged but not retried.

### Files to update

| File | Action |
|---|---|
| `docs/features/guardrails.mdx` | Update. Audit the "Retry behaviour" / "How retries work" sections (around lines 257–266, 465–474, 682, 893). Add a short "Execution order" subsection: guardrail → retry-or-pass → memory/user callbacks → `completed`. |
| `docs/best-practices/agent-retry-strategies.mdx` | Update. Add a short subsection or callout clarifying that `Task(guardrail=..., max_retries=...)` drives first-class retries inside the executor, distinct from the generic `ExponentialBackoffRetry` patterns the page already teaches. |

### Minimum content to add (agent-centric, beginner-friendly, per AGENTS.md §1.1 rule 9)

```python
from praisonaiagents import Agent, Task, PraisonAIAgents

def must_mention_price(output):
    ok = "$" in output.raw
    return (ok, output if ok else "Rewrite and include a price in USD.")

agent = Agent(name="Writer", instructions="Write a one-line product blurb.")

task = Task(
    description="Write a blurb for a coffee mug.",
    agent=agent,
    guardrail=must_mention_price,
    max_retries=3,
)

PraisonAIAgents(agents=[agent], tasks=[task]).start()
```

Do **not** introduce a new page — extend the existing one.

---

## Gap 2 — Thread-safe state: `_set_workflow_finished`, `_execution_context`, and locked memory init

### What changed in the SDK

- `praisonaiagents/process/process.py`
  - New **async-locked** setter `_set_workflow_finished(value)` backed by `self._get_state_lock()`.
  - `_check_all_tasks_completed()` is now **async** and uses the setter; a new sync sibling `_check_all_tasks_completed_sync()` is used from the sync workflow path.
  - Both the async (`aworkflow`) and sync (`workflow`) paths **no longer mutate `task.description`**. The per-execution context is now stored in a dedicated `current_task._execution_context` field. The previous destructive `task.description.split('Input data from previous tasks:')[0]` reset and the `task.description = task._original_description + context` concatenation have been removed.
- `praisonaiagents/task/task.py`
  - `initialize_memory()` now uses a **threading.Lock with double-checked locking**.
  - **New** `async def initialize_memory_async(self)` uses an **asyncio.Lock** and offloads `Memory(...)` construction with `asyncio.to_thread(...)` so it doesn't block the event loop.
  - Config access is defensive: `self.config.get('memory_config', {}).get('storage', {}).get('path')` instead of the old `self.config['memory_config']['storage']['path']`.
  - `execute_callback()` now calls `await self.initialize_memory_async()` (not the sync one) when running in async context.

### Why this matters for docs

`docs/features/thread-safety.mdx` and `docs/best-practices/memory-cleanup.mdx` already describe concurrent use, but neither mentions:

- That concurrent tasks sharing a `memory_config` are now safe to initialize from many threads/coroutines (previously a benign-looking race).
- That user code **should not** read `task.description` expecting per-execution context to be appended to it — that mutation is gone. The per-run context lives on `_execution_context`. This is the documented boundary: user-facing `task.description` is now stable across runs.
- That the sync and async workflow paths use different completion-check methods (informational, but relevant for anyone subclassing `Process`).

### Files to update

| File | Action |
|---|---|
| `docs/features/thread-safety.mdx` | Update. Add a subsection **"What changed in PR #1514"** mirroring the existing "What changed in PR #1488" callout at line 46. Cover: locked memory init (sync + async variants), async-locked `workflow_finished`, and non-mutating per-run task context. |
| `docs/best-practices/memory-cleanup.mdx` | Update. In section "2. Agent Memory Management" (line 90), add a short note that `Memory` construction is now thread-/async-safe and that concurrent `Task`s sharing a `memory_config` will coordinate through the lock rather than each creating a duplicate store. |

### Minimum content to add

A thread-safety snippet showing the supported case (multiple tasks, one shared `memory_config`, concurrent init):

```python
from praisonaiagents import Agent, Task, PraisonAIAgents

memory_config = {"storage": {"path": "./shared.db"}, "provider": "file"}

agents  = [Agent(name=f"A{i}", instructions="Summarize one line.") for i in range(4)]
tasks   = [Task(description=f"Summarize doc {i}.", agent=agents[i], config={"memory_config": memory_config}) for i in range(4)]

PraisonAIAgents(agents=agents, tasks=tasks).start()
```

Do **not** document the private names (`_set_workflow_finished`, `_execution_context`, `_memory_init_lock`) as user-facing API — they are internal. Just describe the user-observable guarantees.

---

## Gap 3 — Agent/Memory lifecycle: MongoDB cleanup + lightweight `__del__`

### What changed in the SDK

- `praisonaiagents/memory/memory.py` → `close_connections()` now also closes the MongoDB client if one exists:
  ```python
  if hasattr(self, 'mongo_client') and self.mongo_client:
      self.mongo_client.close()
      self.mongo_client = None
  ```
- `praisonaiagents/agent/agent.py`
  - New instance flag `self._closed = False` initialized in `__init__`.
  - `__del__` changed from a no-op ("*Destructor safely does nothing to avoid GC pollution in test loops*") to a **lightweight finalizer** that, if `_closed` is still `False`, calls `self._memory_instance.close_connections()` inside a try/except and then sets `_closed = True`. Exceptions during GC are swallowed (finalizers must not raise).

### Why this matters for docs

`docs/features/resource-lifecycle.mdx` and `docs/best-practices/memory-cleanup.mdx` already teach `async with` / `with` / explicit `.close()`. What they do not yet cover:

1. **MongoDB users specifically** now get their client closed when `Memory.close_connections()` runs. Previously this was a real leak in long-running apps using Mongo-backed memory.
2. GC-time cleanup is now a **safety net**, not the recommended path. Docs must still tell users to use context managers or call `.close()` explicitly — the `__del__` fallback is intentionally minimal and silent, and is not a substitute for explicit cleanup.
3. Calling `.close_connections()` twice is safe (idempotent via `_closed` / `mongo_client = None`).

### Files to update

| File | Action |
|---|---|
| `docs/features/resource-lifecycle.mdx` | Update. Extend "How It Works" (line 69) and "Best Practices" (line 182) with: (a) MongoDB included in `close_connections`, (b) `__del__` as a safety net only, (c) idempotency note. Keep the existing `async with` / explicit `.close()` guidance front-and-center. |
| `docs/best-practices/memory-cleanup.mdx` | Update. In "3. Resource Pool Management" (line 156), add MongoDB to the list of connections cleaned up. In "Automatic Garbage Collection" (line 323), clarify that the new `Agent.__del__` runs a best-effort `close_connections()` but may be skipped by the interpreter and must not be relied on. |

### Minimum content to add

Explicit-cleanup example (preferred):

```python
from praisonaiagents import Agent

with Agent(name="Analyst", instructions="Analyze quarterly numbers.") as agent:
    agent.start("Summarize Q1 revenue.")
# MongoDB / SQLite / registered connections closed here.
```

Async form (mention that `aclose()` and `async with` both route through the same cleanup):

```python
async with Agent(name="Analyst", instructions="...") as agent:
    await agent.astart("...")
```

---

## Out of scope / do not touch

- `docs/concepts/guardrails.mdx` — concepts folder is human-approved only (AGENTS.md §1.8).
- `docs/js/**`, `docs/rust/**`, and anything under `docs/sdk/reference/**` — auto-generated, do not edit by hand (AGENTS.md §1.7).
- `docs.json` — only update if a new page is added. This issue requests **updates to existing pages**, so no `docs.json` changes should be needed. If a new sub-page is considered, it must be placed under `docs/features/` and added to the "Features" group only (not "Concepts").

---

## Acceptance checklist for the implementing agent

Before opening the PR, confirm:

- [ ] Read the 5 SDK files listed under "Source of truth" from repo-root `praisonaiagents/` (not `src/`). Verify every behavior claim in the updated docs against that source. No guessing.
- [ ] `docs/features/guardrails.mdx` updated: retries work in sync + async; execution order (guardrail → retry/pass → memory/user callbacks → completed); modified `TaskOutput` propagates.
- [ ] `docs/best-practices/agent-retry-strategies.mdx` updated: short callout tying `Task(guardrail=..., max_retries=...)` to the built-in executor-level retry.
- [ ] `docs/features/thread-safety.mdx` updated: new "What changed in PR #1514" subsection in the style of the existing PR #1488 one.
- [ ] `docs/best-practices/memory-cleanup.mdx` updated: concurrent memory init is safe; GC cleanup is a safety net, not the path.
- [ ] `docs/features/resource-lifecycle.mdx` updated: MongoDB included in `close_connections`; `__del__` described as best-effort; idempotency noted.
- [ ] No changes to `docs/concepts/`, `docs/js/`, `docs/rust/`, or `docs/sdk/reference/`.
- [ ] All code examples are copy-paste runnable, use the friendly `from praisonaiagents import ...` imports (AGENTS.md §6.1), and are agent-centric (top of any new section).
- [ ] Every new/updated section has a Mermaid diagram using the standard color scheme where the content teaches a flow or decision (AGENTS.md §3).
- [ ] No private names (`_set_workflow_finished`, `_execution_context`, `_memory_init_lock`, `_closed`) surfaced as user API.
- [ ] Frontmatter (`title`, `sidebarTitle`, `description`, `icon`) preserved; existing `<Steps>`, `<AccordionGroup>`, `<CardGroup>` structure respected.

cc @MervinPraison

File	Action
`docs/features/guardrails.mdx`	Update. Audit the "Retry behaviour" / "How retries work" sections (around lines 257–266, 465–474, 682, 893). Add a short "Execution order" subsection: guardrail → retry-or-pass → memory/user callbacks → `completed`.
`docs/best-practices/agent-retry-strategies.mdx`	Update. Add a short subsection or callout clarifying that `Task(guardrail=..., max_retries=...)` drives first-class retries inside the executor, distinct from the generic `ExponentialBackoffRetry` patterns the page already teaches.

File	Action
`docs/features/thread-safety.mdx`	Update. Add a subsection "What changed in PR #1514" mirroring the existing "What changed in PR #1488" callout at line 46. Cover: locked memory init (sync + async variants), async-locked `workflow_finished`, and non-mutating per-run task context.
`docs/best-practices/memory-cleanup.mdx`	Update. In section "2. Agent Memory Management" (line 90), add a short note that `Memory` construction is now thread-/async-safe and that concurrent `Task`s sharing a `memory_config` will coordinate through the lock rather than each creating a duplicate store.

File	Action
`docs/features/resource-lifecycle.mdx`	Update. Extend "How It Works" (line 69) and "Best Practices" (line 182) with: (a) MongoDB included in `close_connections`, (b) `__del__` as a safety net only, (c) idempotency note. Keep the existing `async with` / explicit `.close()` guidance front-and-center.
`docs/best-practices/memory-cleanup.mdx`	Update. In "3. Resource Pool Management" (line 156), add MongoDB to the list of connections cleaned up. In "Automatic Garbage Collection" (line 323), clarify that the new `Agent.__del__` runs a best-effort `close_connections()` but may be skipped by the interpreter and must not be relied on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update guardrails, thread-safety, and resource-lifecycle pages for PR #1514 architectural fixes #228

Context

Source of truth — PR #1514

Gap 1 — Guardrail retry now actually runs

What changed in the SDK

Why this matters for docs

Files to update

Minimum content to add (agent-centric, beginner-friendly, per AGENTS.md §1.1 rule 9)

Gap 2 — Thread-safe state: `_set_workflow_finished`, `_execution_context`, and locked memory init

What changed in the SDK

Why this matters for docs

Files to update

Minimum content to add

Gap 3 — Agent/Memory lifecycle: MongoDB cleanup + lightweight `del`

What changed in the SDK

Why this matters for docs

Files to update

Minimum content to add

Out of scope / do not touch

Acceptance checklist for the implementing agent

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

docs: update guardrails, thread-safety, and resource-lifecycle pages for PR #1514 architectural fixes #228

Description

Context

Source of truth — PR #1514

Gap 1 — Guardrail retry now actually runs

What changed in the SDK

Why this matters for docs

Files to update

Minimum content to add (agent-centric, beginner-friendly, per AGENTS.md §1.1 rule 9)

Gap 2 — Thread-safe state: _set_workflow_finished, _execution_context, and locked memory init

What changed in the SDK

Why this matters for docs

Files to update

Minimum content to add

Gap 3 — Agent/Memory lifecycle: MongoDB cleanup + lightweight __del__

What changed in the SDK

Why this matters for docs

Files to update

Minimum content to add

Out of scope / do not touch

Acceptance checklist for the implementing agent

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Gap 2 — Thread-safe state: `_set_workflow_finished`, `_execution_context`, and locked memory init

Gap 3 — Agent/Memory lifecycle: MongoDB cleanup + lightweight `del`