fix: preserve tool-call history across thread hydration to prevent model re-attempts (#568)#670
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where the LLM would re-attempt tool calls after a page reload due to lost tool-call history during thread hydration. The changes ensure that the complete interaction sequence, including tool calls and their results, is faithfully reconstructed from the database, providing a consistent and accurate conversational context for the LLM. This significantly improves the reliability and efficiency of agent interactions involving tools. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Fixes nearai#568 Thread hydration discarded tool-call history, causing the LLM to re-attempt completed tool calls on page reload. Emit full tool-call sequences in messages(), enrich persisted JSON for DB rebuild, sanitize before recording in all paths, fix multi-stage alignment. No schema migration; backward compatible. 13 regression tests added.
e1e0676 to
bb8871a
Compare
There was a problem hiding this comment.
Code Review
This pull request aims to fix an issue where tool call history is not preserved during thread hydration. However, it introduces several security vulnerabilities, including an IDOR vulnerability in maybe_hydrate_thread that could allow unauthorized access to other users' conversation history. Furthermore, truncated XML-wrapped tool outputs in session.rs and thread_ops.rs may lead to context breakout in the LLM prompt, and unsanitized tool error messages pose a risk of prompt injection. A separate issue was also identified in the database hydration logic, potentially causing doubly-prefixed error messages.
| truncate_preview(err, 1000) | ||
| } else if let Some(ref res) = tc.result { | ||
| let raw = match res { | ||
| serde_json::Value::String(s) => s.clone(), | ||
| other => other.to_string(), | ||
| }; | ||
| truncate_preview(&raw, 1000) |
There was a problem hiding this comment.
Improper Output Handling (Context Breakout via Truncated XML)
The system truncates tool results to 1000 characters to limit context size. However, this truncation is performed on the already XML-wrapped string (e.g., <tool_output>...</tool_output>). If the content is longer than 1000 characters, the closing </tool_output> tag is lost. This results in malformed XML in the prompt, which can cause the LLM to misinterpret subsequent messages (such as the next user message) as being part of the "sanitized" tool output. This context confusion can be exploited to bypass security boundaries or manipulate the LLM's behavior.
Remediation: Truncate the tool output content before wrapping it in XML tags, or ensure that the closing tag is always appended after truncation.
References
- Always truncate tool output for previews or status updates to a reasonable maximum length to prevent excessive memory/bandwidth usage and reduce the risk of leaking sensitive information.
- When truncating a UTF-8 string, use character-aware methods (e.g.,
char_indices()or iterating overchars()) to avoid panics caused by slicing in the middle of a multi-byte character. - When truncating a UTF-8 string at a byte boundary, walk backwards from the desired length until a valid character boundary is found using
is_char_boundaryto prevent panics.
| _ => None, | ||
| }) | ||
| .collect(); | ||
| chat_messages = rebuild_chat_messages_from_db(&db_messages); |
There was a problem hiding this comment.
Broken Access Control (IDOR in Thread Hydration)
The maybe_hydrate_thread function loads conversation history from the database using a user-supplied external_thread_id (UUID) without verifying that the thread belongs to the authenticated user. An attacker can provide the UUID of a thread belonging to another user to have that thread's history hydrated into their own session, granting them unauthorized access to private conversations and allowing them to continue the session as the victim.
Remediation: Before loading messages from the database in maybe_hydrate_thread, verify that the thread_uuid belongs to the message.user_id by calling store.conversation_belongs_to_user.
References
- Tools that interact with user-owned resources must verify that the authenticated user ID matches the resource owner's ID before performing any read or write operations to prevent unauthorized cross-user access.
| serde_json::Value::String(s) => truncate_preview(s, 1000), | ||
| other => truncate_preview(&other.to_string(), 1000), | ||
| }; | ||
| obj["result"] = serde_json::Value::String(full_result); |
There was a problem hiding this comment.
Improper Output Handling (Context Breakout via Truncated XML)
Similar to the issue in session.rs, persist_tool_calls truncates the XML-wrapped tool result to 1000 characters before storing it in the database. This leads to malformed XML being stored and subsequently used to rebuild the LLM context, potentially causing context confusion.
Remediation: Truncate the content before wrapping it in XML, or ensure the closing tag is preserved.
References
- Always truncate tool output for previews or status updates to a reasonable maximum length to prevent excessive memory/bandwidth usage and reduce the risk of leaking sensitive information.
- When truncating a UTF-8 string, use character-aware methods (e.g.,
char_indices()or iterating overchars()) to avoid panics caused by slicing in the middle of a multi-byte character. - When truncating a UTF-8 string at a byte boundary, walk backwards from the desired length until a valid character boundary is found using
is_char_boundaryto prevent panics.
| let name = c["name"].as_str().unwrap_or("unknown").to_string(); | ||
| let content = if let Some(err) = c.get("error").and_then(|v| v.as_str()) | ||
| { | ||
| format!("Error: {}", err) |
There was a problem hiding this comment.
Prepending "Error: " here can lead to a double prefix (e.g., "Error: Error: ..."). The error strings stored in the database from process_approval already include an "Error: " prefix, and the string from run_agentic_loop is also fully formatted. The error content from the database should be used directly to avoid this duplication, which is consistent with how session.rs::messages() handles errors.
| format!("Error: {}", err) | |
| err.to_string() |
| && let Some(turn) = thread.last_turn_mut() | ||
| { | ||
| if is_tool_error { | ||
| turn.record_tool_error(result_content.clone()); |
There was a problem hiding this comment.
Improper Output Handling (Unsanitized Tool Errors)
Tool execution error messages are recorded in the thread without being sanitized. While successful tool outputs are sanitized using the SafetyLayer (line 733), error messages are formatted as simple strings (line 740) and pushed directly to the context. If a tool (e.g., an HTTP tool) includes untrusted data from an external service in its error message, an attacker could perform a prompt injection attack when this history is fed back to the LLM.
Remediation: Ensure that all tool outputs, including error messages, are passed through the SafetyLayer sanitization process before being recorded in the thread or added to the LLM context.
References
- Sanitization should only be applied to data paths sent to external services, such as an LLM, to prevent issues like prompt injection.
| sanitized.was_modified, | ||
| ) | ||
| } | ||
| Err(e) => format!("Error: {}", e), |
There was a problem hiding this comment.
Improper Output Handling (Unsanitized Tool Errors)
Tool execution error messages are not sanitized before being recorded in the thread. If an error message contains untrusted data from an external service, it could lead to prompt injection when the thread history is sent back to the LLM.
Remediation: Sanitize error messages using the SafetyLayer before recording them.
References
- Sanitization should only be applied to data paths sent to external services, such as an LLM, to prevent issues like prompt injection.
Fixes #568
Thread hydration from DB was discarding tool-call history, causing the
LLM to re-attempt prior tool calls on page reload. This fix ensures the
full assistant_with_tool_calls → tool_result sequence is preserved.
Changes:
session.rs:
tool_result per call) instead of bare user/assistant pairs
call_base_idx offset; stores all content as result (no "Error: "
prefix inference)
thread_ops.rs:
fields for LLM context rebuild
parse enriched tool_calls rows into full LLM message sequence
before auth intercept early return, use is_err() boolean instead of
string prefix matching
dispatcher.rs:
(not raw output), use is_err() for error field consistency