Skip to content

fix: preserve tool-call history across thread hydration to prevent model re-attempts (#568)#670

Open
reidliu41 wants to merge 1 commit intonearai:mainfrom
reidliu41:fix/568-thread-hydration-tool-history
Open

fix: preserve tool-call history across thread hydration to prevent model re-attempts (#568)#670
reidliu41 wants to merge 1 commit intonearai:mainfrom
reidliu41:fix/568-thread-hydration-tool-history

Conversation

@reidliu41
Copy link
Contributor

Fixes #568

Thread hydration from DB was discarding tool-call history, causing the
LLM to re-attempt prior tool calls on page reload. This fix ensures the
full assistant_with_tool_calls → tool_result sequence is preserved.

Changes:

session.rs:

  • messages() now emits tool-call sequences (assistant_with_tool_calls +
    tool_result per call) instead of bare user/assistant pairs
  • Tool results truncated to 1000 chars via truncate_preview()
  • Error branch passes .error through directly (no double "Error: " wrap)
  • restore_from_messages() handles multi-stage tool calls via
    call_base_idx offset; stores all content as result (no "Error: "
    prefix inference)

thread_ops.rs:

  • persist_tool_calls() enriches JSON with call_id, parameters, result
    fields for LLM context rebuild
  • maybe_hydrate_thread() uses new rebuild_chat_messages_from_db() to
    parse enriched tool_calls rows into full LLM message sequence
  • Legacy rows (without call_id) gracefully skipped
  • Approval and deferred tool paths: sanitize before recording, record
    before auth intercept early return, use is_err() boolean instead of
    string prefix matching

dispatcher.rs:

  • Main tool path: record sanitized result after safety layer processing
    (not raw output), use is_err() for error field consistency

@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: new First-time contributor labels Mar 7, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where the LLM would re-attempt tool calls after a page reload due to lost tool-call history during thread hydration. The changes ensure that the complete interaction sequence, including tool calls and their results, is faithfully reconstructed from the database, providing a consistent and accurate conversational context for the LLM. This significantly improves the reliability and efficiency of agent interactions involving tools.

Highlights

  • Tool Call History Preservation: The system now correctly preserves the full sequence of assistant_with_tool_calls and tool_result messages across thread hydration from the database, preventing the LLM from re-attempting previously completed tool calls.
  • Enhanced Message Reconstruction: A new function, rebuild_chat_messages_from_db, was introduced to accurately parse and reconstruct complex tool call sequences from database entries into LLM-compatible chat messages, including handling legacy data gracefully.
  • Sanitized Tool Result Recording: Tool results are now sanitized and recorded in the thread immediately after safety layer processing, ensuring consistency and proper context for the LLM, and error states are tracked using is_err() booleans.
  • Tool Result Truncation: Tool results emitted for context building are now truncated to 1000 characters to manage context size effectively.
Changelog
  • src/agent/dispatcher.rs
    • Removed redundant tool result recording logic.
    • Introduced a boolean flag to explicitly track tool error states.
    • Relocated and updated tool result recording to occur after safety layer processing, ensuring sanitized content is stored.
  • src/agent/session.rs
    • Imported the truncate_preview utility for content shortening.
    • Updated the messages() method to generate full LLM-compatible tool call sequences, including assistant tool calls and their results, with truncation for large results.
    • Refactored restore_from_messages() to correctly interpret and reconstruct multi-stage tool call sequences from stored chat messages.
    • Added new unit tests to validate the correct inclusion and restoration of tool call history.
  • src/agent/thread_ops.rs
    • Imported the ToolCall struct for tool message handling.
    • Modified thread hydration to use a new rebuild_chat_messages_from_db function for comprehensive message reconstruction.
    • Enhanced persist_tool_calls() to store additional metadata like call_id, parameters, and a truncated full result for improved context rebuilding.
    • Adjusted tool result handling in approval and deferred paths to sanitize and record results earlier and use is_err() for error detection.
    • Removed duplicate tool result processing logic in approval and deferred paths.
    • Introduced a new private helper function, rebuild_chat_messages_from_db, with extensive tests, to reconstruct LLM chat messages from database records, including enriched tool call data.
Activity
  • No specific activity (comments, reviews, etc.) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

  Fixes nearai#568

  Thread hydration discarded tool-call history, causing the LLM to
  re-attempt completed tool calls on page reload. Emit full tool-call
  sequences in messages(), enrich persisted JSON for DB rebuild, sanitize
  before recording in all paths, fix multi-stage alignment. No schema
  migration; backward compatible. 13 regression tests added.
@reidliu41 reidliu41 force-pushed the fix/568-thread-hydration-tool-history branch from e1e0676 to bb8871a Compare March 7, 2026 16:24
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue where tool call history is not preserved during thread hydration. However, it introduces several security vulnerabilities, including an IDOR vulnerability in maybe_hydrate_thread that could allow unauthorized access to other users' conversation history. Furthermore, truncated XML-wrapped tool outputs in session.rs and thread_ops.rs may lead to context breakout in the LLM prompt, and unsanitized tool error messages pose a risk of prompt injection. A separate issue was also identified in the database hydration logic, potentially causing doubly-prefixed error messages.

Comment on lines +354 to +360
truncate_preview(err, 1000)
} else if let Some(ref res) = tc.result {
let raw = match res {
serde_json::Value::String(s) => s.clone(),
other => other.to_string(),
};
truncate_preview(&raw, 1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Improper Output Handling (Context Breakout via Truncated XML)

The system truncates tool results to 1000 characters to limit context size. However, this truncation is performed on the already XML-wrapped string (e.g., <tool_output>...</tool_output>). If the content is longer than 1000 characters, the closing </tool_output> tag is lost. This results in malformed XML in the prompt, which can cause the LLM to misinterpret subsequent messages (such as the next user message) as being part of the "sanitized" tool output. This context confusion can be exploited to bypass security boundaries or manipulate the LLM's behavior.

Remediation: Truncate the tool output content before wrapping it in XML tags, or ensure that the closing tag is always appended after truncation.

References
  1. Always truncate tool output for previews or status updates to a reasonable maximum length to prevent excessive memory/bandwidth usage and reduce the risk of leaking sensitive information.
  2. When truncating a UTF-8 string, use character-aware methods (e.g., char_indices() or iterating over chars()) to avoid panics caused by slicing in the middle of a multi-byte character.
  3. When truncating a UTF-8 string at a byte boundary, walk backwards from the desired length until a valid character boundary is found using is_char_boundary to prevent panics.

_ => None,
})
.collect();
chat_messages = rebuild_chat_messages_from_db(&db_messages);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Broken Access Control (IDOR in Thread Hydration)

The maybe_hydrate_thread function loads conversation history from the database using a user-supplied external_thread_id (UUID) without verifying that the thread belongs to the authenticated user. An attacker can provide the UUID of a thread belonging to another user to have that thread's history hydrated into their own session, granting them unauthorized access to private conversations and allowing them to continue the session as the victim.

Remediation: Before loading messages from the database in maybe_hydrate_thread, verify that the thread_uuid belongs to the message.user_id by calling store.conversation_belongs_to_user.

References
  1. Tools that interact with user-owned resources must verify that the authenticated user ID matches the resource owner's ID before performing any read or write operations to prevent unauthorized cross-user access.

Comment on lines +477 to +480
serde_json::Value::String(s) => truncate_preview(s, 1000),
other => truncate_preview(&other.to_string(), 1000),
};
obj["result"] = serde_json::Value::String(full_result);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Improper Output Handling (Context Breakout via Truncated XML)

Similar to the issue in session.rs, persist_tool_calls truncates the XML-wrapped tool result to 1000 characters before storing it in the database. This leads to malformed XML being stored and subsequently used to rebuild the LLM context, potentially causing context confusion.

Remediation: Truncate the content before wrapping it in XML, or ensure the closing tag is preserved.

References
  1. Always truncate tool output for previews or status updates to a reasonable maximum length to prevent excessive memory/bandwidth usage and reduce the risk of leaking sensitive information.
  2. When truncating a UTF-8 string, use character-aware methods (e.g., char_indices() or iterating over chars()) to avoid panics caused by slicing in the middle of a multi-byte character.
  3. When truncating a UTF-8 string at a byte boundary, walk backwards from the desired length until a valid character boundary is found using is_char_boundary to prevent panics.

let name = c["name"].as_str().unwrap_or("unknown").to_string();
let content = if let Some(err) = c.get("error").and_then(|v| v.as_str())
{
format!("Error: {}", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Prepending "Error: " here can lead to a double prefix (e.g., "Error: Error: ..."). The error strings stored in the database from process_approval already include an "Error: " prefix, and the string from run_agentic_loop is also fully formatted. The error content from the database should be used directly to avoid this duplication, which is consistent with how session.rs::messages() handles errors.

Suggested change
format!("Error: {}", err)
err.to_string()

&& let Some(turn) = thread.last_turn_mut()
{
if is_tool_error {
turn.record_tool_error(result_content.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Improper Output Handling (Unsanitized Tool Errors)

Tool execution error messages are recorded in the thread without being sanitized. While successful tool outputs are sanitized using the SafetyLayer (line 733), error messages are formatted as simple strings (line 740) and pushed directly to the context. If a tool (e.g., an HTTP tool) includes untrusted data from an external service in its error message, an attacker could perform a prompt injection attack when this history is fed back to the LLM.

Remediation: Ensure that all tool outputs, including error messages, are passed through the SafetyLayer sanitization process before being recorded in the thread or added to the LLM context.

References
  1. Sanitization should only be applied to data paths sent to external services, such as an LLM, to prevent issues like prompt injection.

sanitized.was_modified,
)
}
Err(e) => format!("Error: {}", e),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Improper Output Handling (Unsanitized Tool Errors)

Tool execution error messages are not sanitized before being recorded in the thread. If an error message contains untrusted data from an external service, it could lead to prompt injection when the thread history is sent back to the LLM.

Remediation: Sanitize error messages using the SafetyLayer before recording them.

References
  1. Sanitization should only be applied to data paths sent to external services, such as an LLM, to prevent issues like prompt injection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: new First-time contributor risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: thread hydration loses tool-call history on page reload — model re-attempts prior tool calls

1 participant