feat(tools): add scratchpad tool for agent working memory#1397
feat(tools): add scratchpad tool for agent working memory#1397ezelanza wants to merge 32 commits intoi-am-bee:mainfrom
Conversation
Add a new scratchpad tool that provides agents with a persistent working memory to track their reasoning, actions, and observations. This helps agents avoid repeating actions and maintain context across interactions. Features: - Read, write, append, and clear operations - Key-value pair merging to prevent duplicates - Session-based storage with default persistence - Comprehensive unit tests with 21 test cases The tool follows the existing code style and includes proper copyright headers, type annotations, and linting compliance. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
Summary of ChangesHello @ezelanza, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a fundamental enhancement for AI agents by providing a dedicated scratchpad tool. This tool acts as a persistent working memory, allowing agents to store and retrieve information about their ongoing tasks, decisions, and outcomes. The primary goal is to improve agent efficiency and coherence by enabling them to recall past actions and observations, thereby reducing redundant efforts and maintaining a consistent operational context. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a useful ScratchpadTool for agents. The implementation is well-structured and comes with a comprehensive test suite. However, I've identified a critical concurrency issue due to the use of a shared class variable without locking, which could lead to race conditions in a production environment. I've also found a high-severity issue in the key-value parsing logic that makes it brittle. I've provided detailed suggestions to address these points. Additionally, I've suggested two new test cases to cover the parsing edge case and to verify the concurrency fix. Overall, this is a great feature addition, and with these changes, it will be robust and safe for concurrent use.
Add a test to verify that concurrent write operations to the scratchpad do not corrupt state. This test helps ensure thread safety and proper handling of race conditions in the key-value merging logic. The test runs 10 concurrent writes and verifies that the final state contains a single consolidated entry, as expected with the merging behavior. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
Add a test to verify that the scratchpad correctly handles key-value pairs where the value contains commas. This is an important edge case for the parsing logic. The test writes "item: milk, bread, eggs, priority: high" and verifies that both key-value pairs are correctly parsed and stored, with the comma-containing value preserved intact. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
Replace the simple comma-split parsing logic with a regex-based approach that correctly handles values containing commas. The previous implementation would incorrectly split on commas within values, leading to data loss. The new regex pattern matches key-value pairs by looking for the pattern: - word characters followed by a colon (the key) - everything until the next key-value pair or end of string (the value) This ensures that "item: milk, bread, eggs, priority: high" is correctly parsed as two separate key-value pairs instead of being incorrectly split. Fixes the edge case identified in the test suite where values can contain commas as part of their data. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
Add proper concurrency protection to prevent race conditions when multiple concurrent requests access the shared scratchpad state. Changes: - Add asyncio.Lock as a class variable - Wrap all scratchpad operations in 'async with _lock' context - Move session initialization inside the lock - Ensure atomic read-modify-write operations This is critical for production environments (e.g., web servers) where multiple concurrent requests could corrupt the shared _scratchpads dictionary. The issue is especially severe since all sessions use the hardcoded "default" session ID, meaning all requests share the same data. Without this lock, concurrent writes could lead to: - Lost updates (last write wins) - Corrupted key-value pairs - Race conditions in list operations The concurrent writes test validates this behavior. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a ScratchpadTool to provide agents with a working memory. The implementation is well-structured, covering read, write, append, and clear operations with a key-value merging mechanism, and is accompanied by a comprehensive test suite that includes concurrency tests. My review focuses on a critical issue with the tool's state management, which currently uses a global singleton pattern instead of session-based storage, posing a risk in multi-user environments. I've also identified a couple of medium-severity issues related to improving the robustness of the key-value parsing logic and removing some redundant code.
Address critical and medium-severity issues identified in code review: 1. CRITICAL - Session Isolation: - Fix hardcoded "default" session ID that created a global singleton - Now uses context.group_id for proper session isolation - Prevents data leakage between users/sessions in multi-user environments - Falls back to "default" only when context is unavailable (e.g., tests) 2. MEDIUM - Improve Key Pattern Regex: - Change pattern from (\w+): to ([^:]+): - Now supports keys with hyphens (e.g., Content-Type, user-id) - More robust parsing for real-world key formats - Add test case for hyphenated keys 3. MEDIUM - Remove Redundant Validation: - Remove unnecessary 'if not operation' check - Pydantic already validates operation field before _run is called - Operation field is required and enum-constrained - Eliminates dead code These fixes ensure the scratchpad is production-ready with proper session isolation and robust key-value parsing. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
Significantly improve session ID management with better validation, caching, and multi-attribute fallback support. Implementation Changes: - Add _cached_session_id instance variable for session persistence - Cache session ID on first extraction for consistency - Try multiple context attributes in preference order: 1. run_id (persists across tool calls in same run) 2. conversation_id (persists across conversation) 3. agent_id (unique per agent instance) - Add comprehensive error handling with descriptive messages - Add debug/info logging for session initialization Breaking Changes: - Now requires valid RunContext with session identifier - Raises ValueError if no context or no valid identifier found - Removes "default" fallback to prevent unintended data sharing Test Updates: - Add mock_context fixture with test run_id - Update tool fixture to initialize session with context - Add 4 new tests for session ID behavior: * test_session_id_requires_context * test_session_id_requires_valid_identifier * test_session_id_caching * test_session_id_preference_order - Fix tests that accessed _get_session_id directly - Total: 22 comprehensive tests Benefits: - Stronger session isolation guarantees - More predictable behavior with caching - Better error messages for debugging - Supports multiple context patterns - Prevents accidental data leakage Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new ScratchpadTool for agents, providing persistent working memory. The implementation includes operations like read, write, append, and clear, with robust session management and key-value pair merging. The code adheres to modern Python practices, including type annotations and comprehensive unit tests. Overall, the feature is well-designed and tested, enhancing agent capabilities significantly. I've identified a few areas for improvement related to API clarity, error handling consistency, and a minor correctness concern in key-value parsing, along with a test assertion that could be more specific.
Address code review feedback to improve API design, error handling,
and test quality.
API Improvements:
1. Remove deprecated session_id parameter from __init__
- Was marked as deprecated and unused
- Simplifies API and prevents confusion
- Session ID now exclusively comes from RunContext
2. Remove rstrip(",") from value parsing
- Previously could remove legitimate trailing commas from values
- Example: "item: apple," would become "item: apple"
- Now preserves all value content as-is
3. Improve write operation feedback message
- Before: "Updated scratchpad: key1: val1, key2: val2" (only new pairs)
- After: "Updated scratchpad to: key1: val1, key2: val2, key3: val3" (full state)
- Provides clearer picture of current scratchpad state
Error Handling Improvements:
4. Raise ToolInputValidationError for missing content
- Before: Returned error string for write/append without content
- After: Raises ToolInputValidationError like other validation errors
- More consistent and structured error handling
5. Add _raise_input_validation_error helper method
- Centralizes error raising logic
- Maintains consistent error handling pattern
- Easier to extend with custom error handling
Test Improvements:
6. Update tests to expect ToolInputValidationError
- test_write_without_content_fails now uses pytest.raises
- test_append_without_content_fails now uses pytest.raises
- Consistent with test_invalid_operation pattern
7. Improve concurrent writes test assertion
- Now verifies the actual value is one of 0-9
- Extracts and validates the numeric value
- Provides stronger assurance of correct merging behavior
All changes maintain backward compatibility except for the error handling
improvement, which is a breaking change that improves API quality.
Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new ScratchpadTool that provides agents with a persistent working memory. The implementation includes operations like read, write, append, and clear, along with key-value pair merging and session-based storage. The code generally follows good practices, including type annotations, linting, and comprehensive unit tests. However, there are several areas identified for improvement related to clarity of behavior, encapsulation of internal state, and concurrency safety, particularly concerning the _scratchpads data structure and session management. Additionally, some test assertions need to be aligned with the actual error handling logic.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Address code review feedback about regex complexity by significantly enhancing documentation and code clarity. Documentation Improvements: 1. Expanded docstring with 4 concrete examples: - Simple key-value pair - Multiple pairs - Comma in value (the critical case) - Hyphenated keys 2. Added "Implementation Note" explaining why simple splitting fails - Shows the problem: "item: milk, bread" would be incorrectly split - Explains the regex solution 3. Added detailed regex pattern breakdown: - Line-by-line explanation of each regex component - Comments showing what each part matches - Explains the lookahead mechanism 4. Improved inline comments for better readability Bug Fixes: 5. Fixed duplicate function definition in test_invalid_operation - Removed duplicate line that was causing syntax issues - Restored proper exception testing with pytest.raises 6. Updated test_clear_session_class_method to use await - clear_session is now async (uses the lock) - Test now properly awaits the async method The regex pattern remains unchanged (it's correct and necessary), but is now much easier to understand and maintain with comprehensive documentation explaining why this approach is required. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
Address code review feedback about the unexpected consolidation behavior when using key-value pairs in the scratchpad. Documentation Improvements: 1. Class-level docstring: - Added clear explanation of two content types - Documented that key-value pairs consolidate to ONE entry - Explained that plain text creates separate entries - Provided design rationale 2. _merge_entries method: - Added "IMPORTANT BEHAVIOR" section highlighting consolidation - Added "Design Rationale" explaining why this is intentional - Provided concrete example of consolidation behavior - Clarified difference from plain text entries - Enhanced return value documentation 3. _write_scratchpad method: - Restructured docstring with numbered sections - Documented behavior for both key-value and plain text - Added example showing consolidation in action - Clarified that duplicate keys are overridden - Explained design rationale Key Behavioral Clarifications: - Key-Value Pairs: "city: Boston" + "date: 2025-01-28" → ONE entry - Plain Text: "Note 1" + "Note 2" → TWO separate entries - This ensures structured state remains consolidated while allowing free-form notes to accumulate The behavior is intentional and now clearly documented to prevent confusion about why the _scratchpads list contains a single entry when using key-value pairs. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
Simplify the result message logic in _write_scratchpad. The 'else' branch handling "no content" was unreachable because the block is only entered if new_pairs is not empty, and _merge_entries guarantees an entry is created in that case. This removes unnecessary conditional logic and makes the success message generation cleaner. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
Refactor the _run method in ScratchpadTool to perform content validation for 'write' and 'append' operations explicitly before handler dispatch. This removes complex conditional logic from the lambda functions in the handlers dictionary, making the code more readable and maintainable. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
The pull request introduces a new ScratchpadTool that provides agents with a persistent working memory. The implementation includes robust handling for key-value pair merging, plain text appending, and session management, protected by an asyncio.Lock for concurrency. The comprehensive test suite covers various operations, edge cases, and session ID extraction logic, which is commendable. However, there is a significant issue with the caching of the session ID within the ScratchpadTool instance, which can lead to incorrect session isolation.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a ScratchpadTool, providing a valuable working memory for agents. The implementation is well-structured, with clear separation of concerns for reading, writing, appending, and clearing the scratchpad. The handling of both plain text and key-value pairs with automatic merging is a great feature. The test suite is comprehensive and covers a wide range of scenarios.
My review identifies a couple of issues. The most significant is that the session ID caching mechanism is currently commented out, which will cause several tests to fail and leads to inefficient re-computation of the session ID. I've also noted a minor code style issue regarding a local import. Once these points are addressed, this will be a solid contribution.
Re-enable the session ID caching mechanism that was temporarily commented out. This ensures correct test behavior and efficient session management. Also moved the local import of ToolInputValidationError to the top-level imports to follow PEP 8 and avoid potential circular dependencies. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new scratchpad tool for agents, providing persistent working memory. The implementation includes read, write, append, and clear operations, with robust handling for key-value pair merging and session-based storage. The code adheres to existing style conventions, includes comprehensive type annotations, and is covered by a thorough suite of unit tests. Overall, the feature is well-designed and implemented.
Refactor ScratchpadTool based on code review: 1. Improve Type Annotation: Update _scratchpads to list[str] for better type safety. 2. Remove Dead Code: Remove the unreachable error handling branch for unknown operations in _run. The operation field is validated by the Pydantic schema (enum), guaranteeing it matches one of the expected values. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new ScratchpadTool for agents to manage their working memory, supporting read, write, append, and clear operations. The implementation includes robust handling for key-value pair merging and plain text entries, along with comprehensive unit tests. The session management prioritizes run_id, conversation_id, and agent_id for session identification, and concurrent access to the scratchpad is protected by an asyncio.Lock. The test suite is thorough, covering various scenarios including edge cases and concurrency, which is excellent.
However, there are a few areas for improvement, primarily concerning the global nature of the scratchpad storage and error handling consistency.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Remove the redundant call to self._ensure_session(session_id) in the _run method. This check is already performed by all operation handlers (read, write, etc.) via their call to self._get_entries(), making the top-level check unnecessary duplicate work. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
Replace generic ValueError with ToolInputValidationError when a valid session identifier cannot be found in the RunContext. This maintains consistency with other tool input validation errors. Updated corresponding tests to expect ToolInputValidationError. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
Add a critical design note to the ScratchpadTool docstring. Since storage is in-memory and process-global, it clarifies that consumers must explicitly clear sessions to prevent unbounded memory growth in long-running applications. Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a ScratchpadTool, a valuable addition for providing agents with working memory. The implementation is robust, featuring good concurrency handling with locks and a clear separation between structured (key-value) and unstructured data. The accompanying test suite is comprehensive, covering a wide range of scenarios, including edge cases and concurrency tests. My feedback consists of a few suggestions to enhance code clarity and maintainability, primarily by refining type hints and using more idiomatic decorators.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Eze Lanza (Eze) <40643766+ezelanza@users.noreply.github.com>
Tomas2D
left a comment
There was a problem hiding this comment.
Thank @ezelanza for your contribution.
Would you mind adding some practical examples showcasing the tool in action? Probably in some long-running process or in a multi-agent setup, as I don't currently see a motivation for scratchpads to be bound to the class as one can share the tool instance among multiple agents during initialization. In addition, some attributes don't even exist on the RunContext. Please see my comments.
Thanks @Tomas2D! I've found Scratchpad to solve a problem in my scenario where I have multiple agents behind a supervisor, and I'm using a relatively Small local model (8B), which most of the times get confused to follow the rules of the prompt (a 200-line one), and forcing it (the supervisor) to read the scratchpad was a game-changer. I didn't use it to share info between agents; I've used it to improve an agent's understanding and memory in that scenario. Will update the fields you highlighted |
Signed-off-by: Ezequiel Lanza <ezequiel.lanza@gmail.com>
40e3ba1 to
91aaf4b
Compare
9590342 to
cea6d3d
Compare
|
@Tomas2D I've updated the code based on your comments. Could you take a look and let me know if there is anything else to change or modify? Thanks! |
|
Hello @ezelanza . I don't see any changes relevant to my latest comment, which addresses the problems mentioned. |
Summary
This PR adds a new scratchpad tool that provides agents with a persistent working memory to track their reasoning, actions, and observations. This helps agents avoid repeating actions and maintain context across interactions.
Features
Implementation Details
Testing
All tests pass locally. The implementation includes:
Checklist