fix: preserve tool-call history across thread hydration to prevent model re-attempts (#568) by reidliu41 · Pull Request #670 · nearai/ironclaw

reidliu41 · 2026-03-07T16:20:09Z

Fixes #568

Thread hydration from DB was discarding tool-call history, causing the
LLM to re-attempt prior tool calls on page reload. This fix ensures the
full assistant_with_tool_calls → tool_result sequence is preserved.

Changes:

session.rs:

messages() now emits tool-call sequences (assistant_with_tool_calls +
tool_result per call) instead of bare user/assistant pairs
Tool results truncated to 1000 chars via truncate_preview()
Error branch passes .error through directly (no double "Error: " wrap)
restore_from_messages() handles multi-stage tool calls via
call_base_idx offset; stores all content as result (no "Error: "
prefix inference)

thread_ops.rs:

persist_tool_calls() enriches JSON with call_id, parameters, result
fields for LLM context rebuild
maybe_hydrate_thread() uses new rebuild_chat_messages_from_db() to
parse enriched tool_calls rows into full LLM message sequence
Legacy rows (without call_id) gracefully skipped
Approval and deferred tool paths: sanitize before recording, record
before auth intercept early return, use is_err() boolean instead of
string prefix matching

dispatcher.rs:

Main tool path: record sanitized result after safety layer processing
(not raw output), use is_err() for error field consistency

gemini-code-assist · 2026-03-07T16:20:28Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where the LLM would re-attempt tool calls after a page reload due to lost tool-call history during thread hydration. The changes ensure that the complete interaction sequence, including tool calls and their results, is faithfully reconstructed from the database, providing a consistent and accurate conversational context for the LLM. This significantly improves the reliability and efficiency of agent interactions involving tools.

Highlights

Tool Call History Preservation: The system now correctly preserves the full sequence of assistant_with_tool_calls and tool_result messages across thread hydration from the database, preventing the LLM from re-attempting previously completed tool calls.
Enhanced Message Reconstruction: A new function, rebuild_chat_messages_from_db, was introduced to accurately parse and reconstruct complex tool call sequences from database entries into LLM-compatible chat messages, including handling legacy data gracefully.
Sanitized Tool Result Recording: Tool results are now sanitized and recorded in the thread immediately after safety layer processing, ensuring consistency and proper context for the LLM, and error states are tracked using is_err() booleans.
Tool Result Truncation: Tool results emitted for context building are now truncated to 1000 characters to manage context size effectively.

Changelog

src/agent/dispatcher.rs
- Removed redundant tool result recording logic.
- Introduced a boolean flag to explicitly track tool error states.
- Relocated and updated tool result recording to occur after safety layer processing, ensuring sanitized content is stored.
src/agent/session.rs
- Imported the truncate_preview utility for content shortening.
- Updated the messages() method to generate full LLM-compatible tool call sequences, including assistant tool calls and their results, with truncation for large results.
- Refactored restore_from_messages() to correctly interpret and reconstruct multi-stage tool call sequences from stored chat messages.
- Added new unit tests to validate the correct inclusion and restoration of tool call history.
src/agent/thread_ops.rs
- Imported the ToolCall struct for tool message handling.
- Modified thread hydration to use a new rebuild_chat_messages_from_db function for comprehensive message reconstruction.
- Enhanced persist_tool_calls() to store additional metadata like call_id, parameters, and a truncated full result for improved context rebuilding.
- Adjusted tool result handling in approval and deferred paths to sanitize and record results earlier and use is_err() for error detection.
- Removed duplicate tool result processing logic in approval and deferred paths.
- Introduced a new private helper function, rebuild_chat_messages_from_db, with extensive tests, to reconstruct LLM chat messages from database records, including enriched tool call data.

Activity

No specific activity (comments, reviews, etc.) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Fixes nearai#568 Thread hydration discarded tool-call history, causing the LLM to re-attempt completed tool calls on page reload. Emit full tool-call sequences in messages(), enrich persisted JSON for DB rebuild, sanitize before recording in all paths, fix multi-stage alignment. No schema migration; backward compatible. 13 regression tests added.

gemini-code-assist

Code Review

This pull request aims to fix an issue where tool call history is not preserved during thread hydration. However, it introduces several security vulnerabilities, including an IDOR vulnerability in maybe_hydrate_thread that could allow unauthorized access to other users' conversation history. Furthermore, truncated XML-wrapped tool outputs in session.rs and thread_ops.rs may lead to context breakout in the LLM prompt, and unsanitized tool error messages pose a risk of prompt injection. A separate issue was also identified in the database hydration logic, potentially causing doubly-prefixed error messages.

gemini-code-assist · 2026-03-07T16:31:59Z

src/agent/session.rs

+                        truncate_preview(err, 1000)
+                    } else if let Some(ref res) = tc.result {
+                        let raw = match res {
+                            serde_json::Value::String(s) => s.clone(),
+                            other => other.to_string(),
+                        };
+                        truncate_preview(&raw, 1000)


Improper Output Handling (Context Breakout via Truncated XML)

The system truncates tool results to 1000 characters to limit context size. However, this truncation is performed on the already XML-wrapped string (e.g., <tool_output>...</tool_output>). If the content is longer than 1000 characters, the closing </tool_output> tag is lost. This results in malformed XML in the prompt, which can cause the LLM to misinterpret subsequent messages (such as the next user message) as being part of the "sanitized" tool output. This context confusion can be exploited to bypass security boundaries or manipulate the LLM's behavior.

Remediation: Truncate the tool output content before wrapping it in XML tags, or ensure that the closing tag is always appended after truncation.

References

Always truncate tool output for previews or status updates to a reasonable maximum length to prevent excessive memory/bandwidth usage and reduce the risk of leaking sensitive information.

When truncating a UTF-8 string, use character-aware methods (e.g., char_indices() or iterating over chars()) to avoid panics caused by slicing in the middle of a multi-byte character.

When truncating a UTF-8 string at a byte boundary, walk backwards from the desired length until a valid character boundary is found using is_char_boundary to prevent panics.

gemini-code-assist · 2026-03-07T16:31:59Z

src/agent/thread_ops.rs

-                    _ => None,
-                })
-                .collect();
+            chat_messages = rebuild_chat_messages_from_db(&db_messages);


Broken Access Control (IDOR in Thread Hydration)

The maybe_hydrate_thread function loads conversation history from the database using a user-supplied external_thread_id (UUID) without verifying that the thread belongs to the authenticated user. An attacker can provide the UUID of a thread belonging to another user to have that thread's history hydrated into their own session, granting them unauthorized access to private conversations and allowing them to continue the session as the victim.

Remediation: Before loading messages from the database in maybe_hydrate_thread, verify that the thread_uuid belongs to the message.user_id by calling store.conversation_belongs_to_user.

References

Tools that interact with user-owned resources must verify that the authenticated user ID matches the resource owner's ID before performing any read or write operations to prevent unauthorized cross-user access.

gemini-code-assist · 2026-03-07T16:31:59Z

src/agent/thread_ops.rs

+                        serde_json::Value::String(s) => truncate_preview(s, 1000),
+                        other => truncate_preview(&other.to_string(), 1000),
+                    };
+                    obj["result"] = serde_json::Value::String(full_result);


Improper Output Handling (Context Breakout via Truncated XML)

Similar to the issue in session.rs, persist_tool_calls truncates the XML-wrapped tool result to 1000 characters before storing it in the database. This leads to malformed XML being stored and subsequently used to rebuild the LLM context, potentially causing context confusion.

Remediation: Truncate the content before wrapping it in XML, or ensure the closing tag is preserved.

References

Always truncate tool output for previews or status updates to a reasonable maximum length to prevent excessive memory/bandwidth usage and reduce the risk of leaking sensitive information.

When truncating a UTF-8 string, use character-aware methods (e.g., char_indices() or iterating over chars()) to avoid panics caused by slicing in the middle of a multi-byte character.

When truncating a UTF-8 string at a byte boundary, walk backwards from the desired length until a valid character boundary is found using is_char_boundary to prevent panics.

gemini-code-assist · 2026-03-07T16:32:00Z

src/agent/thread_ops.rs

+                            let name = c["name"].as_str().unwrap_or("unknown").to_string();
+                            let content = if let Some(err) = c.get("error").and_then(|v| v.as_str())
+                            {
+                                format!("Error: {}", err)


Prepending "Error: " here can lead to a double prefix (e.g., "Error: Error: ..."). The error strings stored in the database from process_approval already include an "Error: " prefix, and the string from run_agentic_loop is also fully formatted. The error content from the database should be used directly to avoid this duplication, which is consistent with how session.rs::messages() handles errors.

Suggested change

format!("Error: {}", err)

err.to_string()

gemini-code-assist · 2026-03-07T16:32:00Z

src/agent/dispatcher.rs

+                                        && let Some(turn) = thread.last_turn_mut()
+                                    {
+                                        if is_tool_error {
+                                            turn.record_tool_error(result_content.clone());


Improper Output Handling (Unsanitized Tool Errors)

Tool execution error messages are recorded in the thread without being sanitized. While successful tool outputs are sanitized using the SafetyLayer (line 733), error messages are formatted as simple strings (line 740) and pushed directly to the context. If a tool (e.g., an HTTP tool) includes untrusted data from an external service in its error message, an attacker could perform a prompt injection attack when this history is fed back to the LLM.

Remediation: Ensure that all tool outputs, including error messages, are passed through the SafetyLayer sanitization process before being recorded in the thread or added to the LLM context.

References

Sanitization should only be applied to data paths sent to external services, such as an LLM, to prevent issues like prompt injection.

gemini-code-assist · 2026-03-07T16:32:00Z

src/agent/thread_ops.rs

+                        sanitized.was_modified,
+                    )
+                }
+                Err(e) => format!("Error: {}", e),


Improper Output Handling (Unsanitized Tool Errors)

Tool execution error messages are not sanitized before being recorded in the thread. If an error message contains untrusted data from an external service, it could lead to prompt injection when the thread history is sent back to the LLM.

Remediation: Sanitize error messages using the SafetyLayer before recording them.

References

Sanitization should only be applied to data paths sent to external services, such as an LLM, to prevent issues like prompt injection.

github-actions bot added scope: agent Agent core (agent loop, router, scheduler) size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: new First-time contributor labels Mar 7, 2026

reidliu41 force-pushed the fix/568-thread-hydration-tool-history branch from e1e0676 to bb8871a Compare March 7, 2026 16:24

gemini-code-assist bot reviewed Mar 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve tool-call history across thread hydration to prevent model re-attempts (#568)#670

fix: preserve tool-call history across thread hydration to prevent model re-attempts (#568)#670
reidliu41 wants to merge 1 commit intonearai:mainfrom
reidliu41:fix/568-thread-hydration-tool-history

reidliu41 commented Mar 7, 2026

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 7, 2026

Uh oh!

gemini-code-assist bot Mar 7, 2026

Uh oh!

gemini-code-assist bot Mar 7, 2026

Uh oh!

gemini-code-assist bot Mar 7, 2026

Uh oh!

gemini-code-assist bot Mar 7, 2026

Uh oh!

gemini-code-assist bot Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

reidliu41 commented Mar 7, 2026

Uh oh!

gemini-code-assist bot commented Mar 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 7, 2026

Choose a reason for hiding this comment

Improper Output Handling (Context Breakout via Truncated XML)

Uh oh!

gemini-code-assist bot Mar 7, 2026

Choose a reason for hiding this comment

Broken Access Control (IDOR in Thread Hydration)

Uh oh!

gemini-code-assist bot Mar 7, 2026

Choose a reason for hiding this comment

Improper Output Handling (Context Breakout via Truncated XML)

Uh oh!

gemini-code-assist bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 7, 2026

Choose a reason for hiding this comment

Improper Output Handling (Unsanitized Tool Errors)

Uh oh!

gemini-code-assist bot Mar 7, 2026

Choose a reason for hiding this comment

Improper Output Handling (Unsanitized Tool Errors)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant