Tool framework extension#22995
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 58c3ecc24d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
AAraKKe
left a comment
There was a problem hiding this comment.
Hi @luisorofino, this review was generated with Claude Code and has been looked over and approved before posting. I believe the comments below are valid — feel free to discuss any you disagree with.
Comment legend
praise: no action needed, just celebrate!
note: informational, no action required.
question: seeking clarification or understanding your approach.
nit: minor, non-blocking (style, typo). Feel free to ignore.
suggestion: optional improvement, recommended but not required.
request: a change I believe is necessary before merging.
| @@ -0,0 +1,392 @@ | |||
| # (C) Datadog, Inc. 2026-present | |||
There was a problem hiding this comment.
Request: Please split this file into one test file per tool, matching the module structure under ddev/src/ddev/ai/tools/fs/ (e.g. test_read_file.py, test_create_file.py, test_edit_file.py, test_append_file.py). This makes tests easy to locate and keeps each file focused.
If any fixtures are shared across multiple tool test files, extract them to a conftest.py in ddev/tests/ai/tools/fs/.
Suggestion: As part of this split, make sure each tool's test file covers filesystem errors. The write tools (create_file, edit_file, append_file) currently have no tests for OSError scenarios (permission denied, disk full, etc.) — once error handling is added to those tools (see comment on create_file.py:38), tests should cover those paths.
| hint=result.meta.hint, | ||
| ) | ||
|
|
||
| return ToolResult(success=response.is_success, data=output) |
There was a problem hiding this comment.
Request: success=response.is_success is incorrect here. ToolResult.success should indicate whether the tool execution itself succeeded — i.e., whether the HTTP call completed and a response was received — not whether the response carried a 2xx status code. A 404 or 500 is a fully valid tool outcome: the network call worked, the server responded, and the agent can inspect the status code from data to decide what to do next. Only actual failures where the tool could not complete (httpx.TimeoutException, httpx.RequestError) should set success=False. Both return paths (this line and line 51 inside the truncated branch) should use success=True.
Suggestion: The if result.truncated and result.meta is not None branching to build the final ToolResult is already duplicated verbatim in shell/base.py:55–67. Consider extracting a helper in core/truncation.py that encapsulates this pattern. One clean approach using a walrus operator:
if truncated := _truncated_result(success=True, data=output, meta=result):
return truncated
return ToolResult(success=True, data=output)where _truncated_result returns a ToolResult if the result was truncated, or None otherwise.
Nit: The annotation result: TruncateResult = truncate(body) on line 44 is redundant — truncate() is already typed to return TruncateResult. Drop the explicit annotation: result = truncate(body).
| with patch_httpx(fake_response(status_code, "error body")): | ||
| result = asyncio.run(http_tool.run({"url": "http://localhost:9090/metrics"})) | ||
|
|
||
| assert result.success is False |
There was a problem hiding this comment.
Request: assert result.success is False codifies the wrong behavior — it will prevent a correct implementation from passing. Once the implementation is fixed, update this to assert result.success is True and verify the status code appears in result.data.
Suggestion: There's no test covering the truncated response path for a non-2xx status. Both return branches in http_get.py currently use response.is_success, so the truncated error-status path is never exercised. Add a case to the truncation test (or a new parametrize entry) with a non-2xx status code.
| except (OSError, UnicodeDecodeError) as e: | ||
| return ToolResult(success=False, error=f"{tool_input.path}: {e}") | ||
|
|
||
| self._refresh_if_known(tool_input.path, content) |
There was a problem hiding this comment.
Request: The intended behavior is read-before-write: an agent should be able to modify any file it has previously read. Currently _refresh_if_known is a no-op for files not already in the registry, so reading a pre-existing file grants no write access — only files the agent created can ever be modified.
Replace _refresh_if_known with _record here so that reading a file unconditionally registers it:
self._record(tool_input.path, content)This also simplifies the code — _refresh_if_known exists solely to avoid registering unread files, but with _record here that concern goes away.
| return "create_file" | ||
|
|
||
| async def __call__(self, tool_input: CreateFileInput) -> ToolResult: | ||
| path = Path(tool_input.path) |
There was a problem hiding this comment.
Suggestion: path = Path(tool_input.path) does not resolve the path. FileRegistry._normalize always calls .resolve() and stores an absolute POSIX key, but path.exists(), path.parent.mkdir(), and path.write_text() all operate on the unresolved path. If the cwd ever changes between a create_file and a subsequent edit_file call for the same relative path, the registry lookup will resolve to a different absolute path than the file was written to.
Resolving up front is a free defensive improvement:
path = Path(tool_input.path).resolve()Applies to create_file.py, edit_file.py, append_file.py, and read_file.py.
| assert f.read_text(encoding="utf-8") == "nested" | ||
|
|
||
|
|
||
| def test_create_file_fails_if_file_already_exists(create_tool: CreateFileTool, tmp_path) -> None: |
There was a problem hiding this comment.
Nit: The test verifies the file content is untouched but doesn't assert the file is not registered. Add:
assert not registry.is_known(str(f))This guards against a regression that accidentally calls _record on the failure path.
| ("line two\n", "", None, "line two"), | ||
| ], | ||
| ) | ||
| def test_edit_file_success( |
There was a problem hiding this comment.
Nit: test_edit_file_success uses if expected_in is not None inside the assertion body to distinguish replace-vs-delete cases. Two explicitly named test functions would produce clearer failure output and remove the conditional from the assertion body.
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| def test_request_timeout(http_tool: HttpGetTool) -> None: |
There was a problem hiding this comment.
Nit: assert "timed out" in result.error is a looser check than the actual message f"Request timed out after {timeout}s". Tighten to catch format regressions: assert "timed out after 1.0s" in result.error.
| if self._registry.is_known(path): | ||
| self._registry.record(path, content) | ||
|
|
||
| def _record(self, path: str, content: str) -> None: |
There was a problem hiding this comment.
Nit: _record is a one-line passthrough to self._registry.record(...). Renaming it to _register would better convey its intent (registering agent ownership/access for a file path) and distinguish it from the underlying registry method.
| def is_known(self, path: str) -> bool: | ||
| return self._normalize(path) in self._hashes | ||
|
|
||
| def lock_for(self, path: str) -> asyncio.Lock: |
There was a problem hiding this comment.
Nit: dict.setdefault with asyncio.Lock() is correct for single-threaded asyncio but worth a brief inline comment noting the assumption, since asyncio.Lock binds to the running event loop and is not thread-safe.
What does this PR do?
Extends the AI tool framework with two new tool modules:
1.
ddev/src/ddev/ai/tools/fs/- Filesystem tools backed by aFileRegistry:ReadFileTool: reads file contents with line numbers and offset/limit paginationCreateFileTool: creates a new file and registers it; auto-creates parent directoriesEditFileTool: replaces an exact string within a registered file (fails if ambiguous or if the file was modified externally)AppendFileTool: appends content to the end of a registered file2.
ddev/src/ddev/ai/tools/http/- HTTP Tools:HttpGetTool: performs an HTTP GET request and returns the status code and body (truncated if large)The
FileRegistryis the key addition: it tracks a SHA-256 hash of each file's last-known content and issues a per-fileasyncio.Lock. Write tools (EditFileTool,AppendFileTool) verify the hash before writing — if the file was modified since the last read, the tool fails and asks the agent to re-read before retrying. This prevents lost writes when the agent's view of a file is stale. Also, it lets us prevent the agent from writing on a file that was not previously created by him, avoiding security problems.Motivation
The agent currently only has shell and grep tools — no way to safely create or modify files. Adding write tools naively creates a risk of the agent overwriting concurrent changes or working off a stale mental model of the file. The registry+hash pattern addresses this: the agent must read a file (or create it) before it can write it, and any external modification is detected before the write is committed. Also, only files created by the agent can be written to.
The
HttpGetTooltool enables the agent to probe metrics endpoints directly, which is useful for validating that the service is reachable and inspecting its raw output during integration development.Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged