Skip to content

Add new capabilities abstraction + make agents serializable#4640

Draft
DouweM wants to merge 121 commits intomainfrom
capabilities
Draft

Add new capabilities abstraction + make agents serializable#4640
DouweM wants to merge 121 commits intomainfrom
capabilities

Conversation

@DouweM
Copy link
Collaborator

@DouweM DouweM commented Mar 13, 2026

Introduces ExecutionEnvironment ABC and three implementations
(LocalEnvironment, DockerEnvironment, MemoryEnvironment) along with
ExecutionEnvironmentToolset for exposing coding-agent-style tools
(ls, shell, read_file, write_file, replace_str, glob, grep).

This is the foundation for building coding agents and other agents
that need shell and filesystem access, split out from the broader
code-mode work for independent review and merge.
When multiple agent.run() calls execute concurrently, a shared environment
means they all operate on the same filesystem and processes. The new
environment_factory parameter creates a fresh, isolated environment per
async-with entry using ContextVar-scoped state.

Also renames environment → shared_environment to make concurrency semantics
explicit (positional arg, so existing callers still work).
Mark huggingface and outlines-vllm-offline extras as conflicting in uv,
and exclude outlines-vllm-offline from --all-extras in CI and Makefile.
- Fix _recv_stream EOF check to distinguish zero-size frames from actual EOF
- Make MemoryEnvironment.capabilities dynamic: include 'shell' when command_handler is set
- Fix LocalEnvironment.grep to use rglob for recursive file search with glob_pattern
- Fix glob_match to use regex for all patterns (fnmatch incorrectly matches '/' with '*')
- Fix build_glob_cmd: add parentheses for correct find operator precedence, fix ./ prefix for -path
- Add double-enter guard in DockerEnvironment._setup to prevent container leak
- Add DockerEnvironment.hardened() convenience constructor for security best practices
- Rename docker-sandbox optional dependency to docker-environment
- Rename 'env' variable to 'environment' in docs to avoid confusion with env vars
- Add lifecycle tip about pre-starting the toolset in docs
Tools are now registered unconditionally at init time and filtered in
get_tools() based on the current environment's capabilities. This fixes
the issue where environment_factory or use_environment() could expose
tools unsupported by the runtime environment.

Also unifies the Capability type — removes the toolset-level Capability
(with edit_file) and EditStrategy types, using the environment-level
Capability (with replace_str/apply_patch) everywhere.
- Add `ToolName` literal type for tool-level names exposed to the model
  (`edit_file` instead of `edit_file:replace_str`/`edit_file:apply_patch`)
- `include`/`exclude` now accept `ToolName` values (e.g. `edit_file`)
  instead of env-level `Capability` values
- Rename `_resolve_capabilities` → `_resolve_tool_names`, which maps env
  capabilities to tool names then applies include/exclude filtering
- Rename `replace_str` tool → `edit_file` (the function exposed to models)
- Update `Capability` values: `replace_str` → `edit_file:replace_str`,
  `apply_patch` → `edit_file:apply_patch` in all environments
- Update docs and tests
…rep glob filtering

- Rename `Capability` to `EnvCapability` for clarity
- Remove unused `instructions()` method from base class
- Fix `_resolve_edit_tool` to fall back to auto-detection when env doesn't support the explicit strategy
- Fix `MemoryEnvironment.grep` to skip glob filtering for exact file paths, matching `LocalEnvironment` behavior
- Rename `Capability` → `EnvCapability` to free up the name for other use
- `_resolve_edit_tool` now falls back to auto-detection when the explicit
  `edit_strategy` isn't supported by the environment
- Remove `instructions` method from base class and DockerEnvironment,
  along with associated tests
- Update all imports and type annotations across environments and tests
Collapse the two separate Literal types (EnvCapability for what environments
can do, ToolName for what's exposed to models) into a single EnvToolName,
since they now map 1:1. Remove the premature apply_patch method, the
edit_strategy parameter, and the _resolve_edit_tool() machinery.
- Move shell_escape, build_read_file_cmd, build_grep_cmd, build_glob_cmd,
  filter_grep_count_output, parse_glob_output from _base.py to docker.py
  as private helpers (_shell_escape, etc.)
- Fix grep skipping explicitly-specified hidden files in LocalEnvironment
  and MemoryEnvironment (e.g. grep(pattern, path='.env') now works)
Docker's grep defaults to BRE where |, +, ? are literal characters.
Local/Memory environments use Python's re.compile() which is closer to
ERE. Adding -E makes Docker grep behavior consistent.
- Add tests for Docker process wait polling, recv_stderr, stream
  buffering, hardened constructor, setup early return, is_alive,
  read_file binary fallback, ls edge cases
- Add tests for Local recv without timeout, EndOfStream, binary
  read_file, grep truncation
- Add tests for Memory ls dedup, grep truncation
- Mark defensive Docker branches with # pragma: no cover
- Mark Docker __aenter__/__aexit__ with # pragma: lax no cover
Aligns edit_file exception handling with read_file and write_file,
which already catch these errors for path traversal and OS-level failures.
- Raise ValueError when offset exceeds file length, matching Local/Memory
- Catch docker.errors.NotFound in _read_file_bytes_sync, convert to FileNotFoundError
- Update MockContainer awk handler to simulate offset/limit behavior
ExecutionEnvironmentToolset.get_tools() now pulls tool descriptions from
the active environment's method docstrings when present, replacing the
generic defaults. This lets each environment document its specific behavior
for the LLM (e.g. regex syntax for grep).

- DockerEnvironment.grep: documents POSIX ERE (grep -E) limitations
- LocalEnvironment.grep / MemoryEnvironment.grep: notes Python re syntax
Avoids a subtle interaction where use_environment() override could be
entered into the shared exit stack instead of the actual shared
environment.
find's -path treats the literal / in **/ as requiring at least one
directory level. Generalize the existing startswith('**/') handling
to cover **/ appearing anywhere in the pattern by generating all
collapsed variants.
…, API docs

- Mock DockerEnvironment with LocalEnvironment in test harness so 11 of 15
  environment doc examples now run in CI (up from 2)
- Add public `files` property to MemoryEnvironment for test assertions
- Add EnvToolName to API reference members list
… handling

- Rename DockerEnvironmentProcess → _DockerEnvironmentProcess (internal impl detail)
- Rename LocalEnvironmentProcess → _LocalEnvironmentProcess (internal impl detail)
- Rename .container → ._required_container (avoid coupling users to docker-py)
- Narrow except Exception → except (DockerException, OSError) in teardown/is_alive
- Remove unnecessary r-prefix from ExecutionProcess docstring
# thinking settings. Cast needed because ModelSettings is a TypedDict and
# these provider-specific keys aren't in the base type.
# Providers covered: OpenAI, Anthropic, Google (google.genai SDK), Gemini (direct API)
super().__init__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking extends ModelSettings (a @dataclass) but bypasses the dataclass-generated __init__ with a custom __init__ that calls super().__init__(cast(..., {...})). This is fragile: if ModelSettings gains additional fields in the future, this __init__ won't set them.

A cleaner approach would be to just call the dataclass __init__ directly: super().__init__(settings=cast(_ModelSettings, {...})). That way the dataclass machinery handles field initialization properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: now uses super().__init__(settings=cast(...)) with the keyword arg.

'Thinking() does not accept arguments yet — configurable parameters will be available once'
' #3894 lands. Use ModelSettings capability for custom thinking settings.'
)
return cls()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The from_spec error message references a GitHub issue number (#3894) which is opaque to users who encounter this error. Consider replacing it with a user-friendly message that just says configurable thinking parameters aren't supported yet, and suggests using ModelSettings as a workaround, without the issue reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: removed issue reference from user-facing error message.

assert r is not None
return r

_wrap_task = asyncio.create_task(run_capability.wrap_run(run_ctx, handler=_do_run))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asyncio.create_task + asyncio.Event cooperative hand-off pattern here (and in the streaming wrap_model_request path in _agent_graph.py) is quite subtle. A comment block explaining the protocol would help future readers:

# wrap_run cooperative hand-off:
# 1. _do_run runs before_run, signals readiness, waits for completion
# 2. wrap_run wraps _do_run via capability middleware chain
# 3. Caller waits for readiness, yields agent_run, then signals completion
# 4. On error: cancels the wrap task; on success: awaits wrap result + after_run

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments already added in a previous round.


By default, a capability instance is shared across all runs of an agent. If your capability accumulates mutable state that should not leak between runs, override [`for_run`][pydantic_ai.capabilities.AbstractCapability.for_run] to return a fresh instance:

```python {title="per_run_state.py" test="skip"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per_run_state.py example uses test="skip" but there's nothing here that requires skipping — it's a pure in-memory example with no external dependencies. Per docs guidelines, test="skip" should be avoided unless unavoidable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0881700.


# Short name is intentional — passing a dict is enough to get type checking,
# and users rarely need both this and settings.ModelSettings in the same scope.
from .model_settings import ModelSettings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been discussed before and the short name is kept intentionally, but I want to note one concrete risk: from pydantic_ai.capabilities import ModelSettings and from pydantic_ai import ModelSettings resolve to different types (a @dataclass capability class vs a TypedDict). A user who does from pydantic_ai.capabilities import * alongside from pydantic_ai import ModelSettings (or vice versa) will silently get the wrong one. At minimum, consider adding a # noqa comment that explains the shadowing is intentional, and ensure the capabilities docs page explicitly warns about this when showing the import.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — discussed with maintainer. The short name is kept for ergonomics.

DouweM and others added 2 commits March 21, 2026 14:44
…able

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
agent = Agent(
'anthropic:claude-sonnet-4-20250514',
capabilities=[
Instructions('You are a research assistant. Be thorough and cite sources.'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we expect to happen if someone sends in two Instructions capabilities contradicting each other? This example is a little non-deterministic but the point is, for capabilities that interact more with the internals of the code instead of a prompt but enforce different constraints?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just concatenate all instructions with \n\n, so I guess it's up to the user not to be contradictory, and up to capabilities to only add instructions that relate to them and are unlikely to affect other capabilities' instructions and user instructions. Of course the user could add 2 capabilities that are fundamentally incompatible -- is that what you're thinking of?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course the user could add 2 capabilities that are fundamentally incompatible -- is that what you're thinking of?

Yes we should at from_spec() time detect incompatible capabilities. I think capabilities can have a field where they can mark which ones they are incompatible with although I am not sure how that would scale but I would rather the agent crash before rather than do weird stuff on prod. Throwing it out there.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 10 additional findings in Devin Review.

Open in Devin Review

Comment on lines +56 to +62
def resolve(ctx: RunContext[AgentDepsT]) -> ModelSettings:
merged = static_settings
for func in dynamic_settings:
merged = merge_model_settings(merged, func(ctx))
return merged if merged is not None else ModelSettings()

return resolve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 CombinedCapability dynamic model settings don't update ctx.model_settings between capability callables

In CombinedCapability.get_model_settings() (combined.py:56-60), when multiple capabilities provide dynamic (callable) settings, the resolve() closure calls each function sequentially but does NOT update ctx.model_settings between calls. Compare with the agent-level resolver (agent/__init__.py:990-1013) which explicitly sets run_context.model_settings = merged between each layer.

This means if capabilities A and B both provide dynamic settings within the same CombinedCapability, B's callable will see ctx.model_settings from the layer before the entire capability group — it won't see A's contribution. The docs at docs/capabilities.md:176 say the callable sees "the merged result of all layers resolved before this capability", which is ambiguous about whether "this capability" means the individual capability or the combined group. Current behavior treats all capabilities as a single layer, which is internally consistent but could surprise users.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +1121 to +1166
async def _do_run() -> AgentRunResult[Any]:
await run_capability.before_run(run_ctx)
_run_ready.set()
await _run_done.wait()
if _run_error is not None:
raise _run_error
r = agent_run.result
assert r is not None
return r

_wrap_task = asyncio.create_task(run_capability.wrap_run(run_ctx, handler=_do_run))

# Wait for handler to start or wrap_run to complete (short-circuit)
_ready_waiter = asyncio.create_task(_run_ready.wait())
await asyncio.wait({_ready_waiter, _wrap_task}, return_when=asyncio.FIRST_COMPLETED)
_ready_waiter.cancel()

_short_circuited = _wrap_task.done() and not _run_ready.is_set()
if _short_circuited:
_result = _wrap_task.result()
_result = await run_capability.after_run(run_ctx, result=_result)
agent_run._result_override = _result # pyright: ignore[reportPrivateUsage]

try:
yield agent_run
except BaseException as _exc:
_run_error = _exc
raise
finally:
if agent_run.result is not None:
run_metadata = self._resolve_and_store_metadata(agent_run.ctx, metadata)
else:
run_metadata = graph_run.state.metadata

if not _short_circuited:
_run_done.set()
if _run_error is None and agent_run.result is not None:
_result = await _wrap_task
_result = await run_capability.after_run(run_ctx, result=_result)
agent_run._result_override = _result # pyright: ignore[reportPrivateUsage]
elif not _wrap_task.done():
_wrap_task.cancel()
try:
await _wrap_task
except (asyncio.CancelledError, BaseException):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 wrap_run task coordination in iter() handles error propagation correctly but doesn't support wrap_run error recovery

The iter() method's _do_run / _wrap_task coordination (agent/__init__.py:1115-1166) correctly propagates user exceptions: when the user's code raises inside async with agent.iter(...) as agent_run:, _run_error is set and re-raised, then _wrap_task is cancelled in the finally block.

However, if a wrap_run implementation catches the error from handler() and returns a recovery result, that result is silently discarded — the user's original exception always propagates. This is because the finally block only awaits _wrap_task for its result when _run_error is None. Whether this is intentional depends on whether wrap_run error recovery is a supported use case. The docs don't mention it, and the test suite doesn't test for it. Worth documenting this limitation if wrap_run is intended to support try/catch patterns around handler().

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM and others added 8 commits March 21, 2026 20:17
- HistoryProcessorCapability → HistoryProcessor (brevity)
- _instructions.Instructions → AgentInstructions (like AgentModelSettings, AgentMetadata)
- BeforeModelRequestContext → ModelRequestContext (used in wrap too, not just before)
- wrap_run_step → wrap_node_run (distinguishes from ctx.run_step which counts model requests)
- Add AgentToolset type alias (AbstractToolset | ToolsetFunc, like AgentModelSettings pattern)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Port serializes_as_string_keyed_dict guard from pydantic_evals to
  _spec.py so NamedSpec.serialize() doesn't misinterpret a dict with
  all-string keys as kwargs on round-trip (affects ModelSettings etc.)
- Add PrepareTools capability that wraps a ToolsPrepareFunc callable,
  like Toolset wraps AbstractToolset. Not spec-serializable.
- Deduplicate the helper: pydantic_evals now imports from pydantic_ai._spec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace MathTools with pre-built toolset (not dynamically generated)
- Make template_instructions.py testable (no test=skip)
- Replace AdaptiveTokenLimit with ThinkingOnRetry (more realistic)
- Clearer hook tables with full type signatures and validation timing
- Add wrap_node_run example (NodeLogger)
- Add wrap_run_event_stream example (StreamLogger), reference UI docs
- Replace tool approval guardrail with PII redaction guardrail
- Move Skip exceptions to their own section before hook tables
- Replace cost tracker with logging middleware example using wrap_*
- Fix AgentSpec instructions field to show TemplateStr type
- Remove unnecessary from_spec override in RateLimit example
- Add PrepareTools to built-in capabilities table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When multiple capabilities provide dynamic (callable) model settings,
update ctx.model_settings between each callable so later capabilities
can see earlier capabilities' contributions, matching the agent-level
resolver behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change GraphRun._run_tracked_task to catch exceptions from node
execution and send them through the memory stream as error results,
instead of letting them propagate into the anyio TaskGroup (which
transforms them into CancelledError/ExceptionGroup). The original
exception is re-raised in iter_graph on the caller's side.

This preserves the original exception through the entire chain, allowing
Agent.iter()'s wrap_run hook to catch and recover from errors. If
wrap_run catches the error from handler() and returns a recovery result,
the exception is suppressed. If not, it propagates normally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
instrument: bool | None = None
metadata: dict[str, Any] | None = None
if capability_schema_types: # pragma: no branch
capabilities: list[Union[tuple(capability_schema_types)]] = [] # pyright: ignore # noqa: UP007
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_AgentSpecSchema manually duplicates every field from AgentSpec, and even drops TemplateStr from several fields (e.g. description is str | None here but TemplateStr | str | None in AgentSpec, and same for instructions). This has been flagged in multiple previous review rounds but remains unaddressed.

The schema divergence means the generated JSON schema doesn't reflect the actual validation behavior — users who rely on the schema for IDE autocompletion or validation will get incorrect type information, and any future field additions to AgentSpec that aren't mirrored here will silently produce an incomplete schema.

Consider deriving _AgentSpecSchema programmatically from AgentSpec (e.g. AgentSpec.model_fields), or at minimum add a test that asserts the field names match between the two classes to catch drift. @DouweM

@description.setter
@abstractmethod
def description(self, value: str | None) -> None:
def description(self, value: TemplateStr[AgentDepsT] | str | None) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract setter declares str | None but the concrete Agent.description.setter at agent/__init__.py:844 accepts TemplateStr[AgentDepsT] | str | None. This means a WrapperAgent or other AbstractAgent subclass that follows the abstract contract would reject TemplateStr values, and Pyright will flag the override as incompatible.

The abstract setter should be updated to match: TemplateStr[AgentDepsT] | str | None.

ctx.model_settings = merge_model_settings(ctx.model_settings, merged)
resolved = func(ctx)
merged = merge_model_settings(merged, resolved)
return merged if merged is not None else ModelSettings()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged if merged is not None else ModelSettings()ModelSettings is a TypedDict, so ModelSettings() creates an empty dict which is falsy. This is fine for the is not None check, but the returned callable's return type annotation is ModelSettings while the overall get_model_settings return type includes | None. When there are no dynamic settings and static_settings is None, the method correctly returns None from line 54.

However, inside this closure, if every dynamic callable returns None-ish settings and static_settings is also None, merged will be None, and this produces an empty ModelSettings() dict. That empty dict is then merged at the agent level as a no-op, which is fine functionally, but it means the callable never returns None — it always returns a dict. This is subtly different from static_settings returning None on line 54. Consider using return merged or None to be consistent (a no-op merge target is indistinguishable from None).

ModelSettings,
Thinking,
WebSearch,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_CAPABILITY_TYPES (used for spec schema generation) includes only Instructions, ModelSettings, Thinking, and WebSearch, while CAPABILITY_TYPES (used for the registry) also includes HistoryProcessor and Toolset. Since DEFAULT_CAPABILITY_TYPES drives the JSON schema output in model_json_schema_with_capabilities, HistoryProcessor and Toolset won't appear in the generated schema unless passed as custom_capability_types.

If the intent is that these two are not useful in YAML/JSON specs (since they take non-serializable callables/objects), that's reasonable but should be documented. A comment explaining why DEFAULT_CAPABILITY_TYPES is a subset of CAPABILITY_TYPES would help future readers.

)
return cls()

def __init__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining __init__ on a @dataclass subclass bypasses the dataclass-generated __init__, which means Thinking doesn't participate in the standard dataclass field protocol. This works because super().__init__(settings=cast(...)) calls ModelSettings.__init__ directly, but it's fragile — if ModelSettings gains additional fields, Thinking.__init__ won't pass them.

Since Thinking is essentially ModelSettings with hardcoded settings, consider making it a classmethod factory instead, or using field(default_factory=...) on the settings field to avoid the manual __init__ override entirely:

@dataclass
class Thinking(ModelSettings[AgentDepsT]):
    settings: _ModelSettings = field(default_factory=lambda: cast(_ModelSettings, {
        'openai_reasoning_effort': 'high',
        ...
    }))

assert r is not None
return r

_wrap_task = asyncio.create_task(run_capability.wrap_run(run_ctx, handler=_do_run))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asyncio.create_task + Event cooperative hand-off pattern for wrap_run is quite complex and has been flagged repeatedly. The pattern has at least one subtle edge case: if wrap_run raises before calling handler() (e.g. a precondition check), _run_ready is never set, and _wrap_task.done() becomes True with _run_ready.is_set() False, triggering the short-circuit path that calls _wrap_task.result() — which will re-raise the exception from wrap_run. That exception isn't caught here, so it'll propagate out of the iter() context manager.

This seems correct but is very hard to reason about. A thorough comment block explaining the state machine (which combinations of _run_ready, _run_done, _run_error, _short_circuited, and _wrap_task.done() are possible and what each means) would make this much more maintainable.

for template compilation, schema validation, and rendering.

Example:
```python {test="skip"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring example uses test="skip" which is understandable since pydantic-handlebars is an optional dependency. However, the entire TemplateStr class (including __init__, render, __call__, and the Pydantic schema hook) calls _import_pydantic_handlebars() or uses compiled templates that require it. This means these code paths have zero test coverage in CI unless pydantic-handlebars is installed in the test environment.

Looking at pyproject.toml, handlebars = ["pydantic-handlebars>=0.1.0"] is added as an optional dependency group but I don't see it included in the test dependencies. The test_template.py tests presumably need this. Can you confirm that pydantic-handlebars is installed in CI for the test runs that exercise TemplateStr?

from pydantic_ai.capabilities import Instructions, ModelSettings, Thinking, WebSearch

agent = Agent(
'anthropic:claude-sonnet-4-20250514',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example uses 'anthropic:claude-sonnet-4-20250514' which is not a frontier model. Per the docs guidelines, use the latest frontier model — e.g. 'anthropic:claude-opus-4-6' or 'openai:gpt-5.2'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request, or PR implementing a feature (enhancement) size: XL Extra large PR (>1500 weighted lines)

Projects

None yet

3 participants