feat(gemini): add rate limit retry and improve thought signature persistence

Mateusz · factory-droid[bot] · Mateusz · commit 93537abfb763 · 2025-12-11T21:16:25.000+01:00
- Implemented automatic retry for rate limit errors in Gemini connector
- Added secondary index to ThoughtSignatureManager for better persistence
- Updated CBOR capture documentation

Co-authored-by: factory-droid[bot] &lt;138933559+factory-droid[bot]@users.noreply.github.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -13,11 +13,15 @@
   - **Unit Tests**: 18 new unit tests in `test_gemini_oauth_auth_retry.py`
   - **Behavioral Tests**: 12 new behavioral tests in `test_gemini_oauth_auth_retry_behavior.py`
 
+- **CBOR Documentation**: Comprehensive guide for CBOR wire capture in `docs/user_guide/debugging/cbor-capture.md`
+
 - New Gemini connector modules: `generation_config_builder.py`, `model_validation.py`, `response_accumulator.py`, `response_text_extractor.py`, `retry_delay_parser.py`, `thought_signature_manager.py`, `user_prompt_id_generator.py`
 - Database schema and usage tracking documentation in `docs/database-*.md`
 
 ### Fixed
 
+- **Gemini Rate Limits**: Implemented automatic retry for rate limit errors with `retry-after` header support in `GeminiOAuthBaseConnector`
+- **Thought Signatures**: Added secondary index by tool call ID to `ThoughtSignatureManager` to persist signatures across session ID changes
 - Extensive fixes in debugging scripts for streaming, tool calls, CBOR, and Gemini issues
 - Gemini connector refactoring for improved reliability and error handling
 - Core services enhancements for backend request management, response processing, tool call reactor middleware, and translation
diff --git a/docs/user_guide/debugging/cbor-capture.md b/docs/user_guide/debugging/cbor-capture.md
@@ -1,3 +1,105 @@
-# CBOR Wire Capture
-
-CBOR (Concise Binary
+# CBOR Wire Capture
+
+CBOR (Concise Binary Object Representation) wire capture provides a high-performance, compact alternative to JSON-based wire capture. It records full HTTP requests and responses in a binary format that is faster to write and takes up less space.
+
+## Overview
+
+CBOR wire capture is designed for high-throughput environments where minimizing I/O overhead is critical. It captures the same detailed information as the JSON format but uses the efficient CBOR binary standard.
+
+## Enabling CBOR Capture
+
+You can enable CBOR capture via the CLI or configuration file.
+
+### Via CLI
+
+```bash
+python -m src.core.cli --cbor-capture-file var/wire_captures_cbor/session.cbor
+```
+
+### Via Configuration
+
+```yaml
+logging:
+  cbor_capture_file: "var/wire_captures_cbor/session.cbor"
+```
+
+## Inspecting CBOR Captures
+
+Since CBOR is a binary format, you cannot read it directly with a text editor. The project provides a dedicated inspection tool: `scripts/inspect_cbor_capture.py`.
+
+### Basic Usage
+
+```bash
+python scripts/inspect_cbor_capture.py var/wire_captures_cbor/session.cbor
+```
+
+This will print a summary of the capture file, including session ID, duration, and entry counts.
+
+### Filtering Entries
+
+You can filter entries to find specific requests or time ranges.
+
+#### By Time Range
+
+Filter entries based on timestamps. You can use Unix timestamps, ISO datetime strings, or time-only strings (assumes today's date).
+
+```bash
+# Filter by Unix timestamp
+python scripts/inspect_cbor_capture.py session.cbor --start-time 1702300000 --end-time 1702400000
+
+# Filter by ISO datetime
+python scripts/inspect_cbor_capture.py session.cbor --start-time "2024-01-15T10:00:00"
+
+# Filter by time of day
+python scripts/inspect_cbor_capture.py session.cbor --start-time "10:30:00" --end-time "11:00:00"
+```
+
+#### By Backend
+
+```bash
+# Show only entries for the OpenAI backend
+python scripts/inspect_cbor_capture.py session.cbor --backend openai
+```
+
+#### By Direction
+
+```bash
+# Show only backend responses
+python scripts/inspect_cbor_capture.py session.cbor --direction backend_to_proxy
+```
+
+### Advanced Analysis
+
+The tool includes powerful analysis features to help debug issues.
+
+#### Detect Issues
+
+Automatically scan for common problems like errors, slow responses, or rate limits.
+
+```bash
+python scripts/inspect_cbor_capture.py session.cbor --detect-issues
+```
+
+#### Request/Response Analysis
+
+Analyze paired requests and responses to see latency, token usage, and content.
+
+```bash
+python scripts/inspect_cbor_capture.py session.cbor --analyze
+```
+
+#### Timeline View
+
+Visualize traffic over time to identify gaps or latency spikes.
+
+```bash
+python scripts/inspect_cbor_capture.py session.cbor --timeline
+```
+
+### Exporting to JSON
+
+If you need to process the data with other tools (like `jq`), you can export it to JSON.
+
+```bash
+python scripts/inspect_cbor_capture.py session.cbor --json > export.json
+```
diff --git a/src/connectors/gemini_base/connector.py b/src/connectors/gemini_base/connector.py
@@ -1432,6 +1432,7 @@ async def _chat_completions_code_assist(
         effective_model: str,
         _in_graceful_degradation: bool = False,
         _auth_retry_attempted: bool = False,
+        _rate_limit_retry_attempted: bool = False,
         **kwargs: Any,
     ) -> ResponseEnvelope | StreamingResponseEnvelope:
         """Handle chat completions using the Code Assist API.
@@ -1521,6 +1522,22 @@ def thought_signature_callback(
         except BackendError as e:
             if self._is_rate_limit_like_error(e):
                 logger.info("Backend rate limited during API call: %s", e)
+                retry_after = self._extract_retry_delay(e)
+                if retry_after and retry_after > 0 and not _rate_limit_retry_attempted:
+                    logger.info(
+                        "Retrying after rate limit in %.2fs (non-streaming)",
+                        retry_after,
+                    )
+                    await asyncio.sleep(retry_after)
+                    return await self._chat_completions_code_assist(
+                        request_data=request_data,
+                        processed_messages=processed_messages,
+                        effective_model=effective_model,
+                        _in_graceful_degradation=_in_graceful_degradation,
+                        _auth_retry_attempted=_auth_retry_attempted,
+                        _rate_limit_retry_attempted=True,
+                        **kwargs,
+                    )
             else:
                 logger.error(f"Backend error during API call: {e}", exc_info=True)
             raise
@@ -1536,6 +1553,7 @@ async def _chat_completions_code_assist_streaming(
         request_data: Any,
         processed_messages: list[Any],
         effective_model: str,
+        _rate_limit_retry_attempted: bool = False,
         **kwargs: Any,
     ) -> StreamingResponseEnvelope:
         """Handle streaming chat completions using the Code Assist API.
@@ -1707,6 +1725,20 @@ async def auth_error_stream() -> AsyncGenerator[ProcessedResponse, None]:
         except BackendError as e:
             if self._is_rate_limit_like_error(e):
                 logger.info("Backend rate limited during streaming API call: %s", e)
+                retry_after = self._extract_retry_delay(e)
+                if retry_after and retry_after > 0 and not _rate_limit_retry_attempted:
+                    logger.info(
+                        "Retrying streaming call after %.2fs due to rate limit",
+                        retry_after,
+                    )
+                    await asyncio.sleep(retry_after)
+                    return await self._chat_completions_code_assist_streaming(
+                        request_data=request_data,
+                        processed_messages=processed_messages,
+                        effective_model=effective_model,
+                        _rate_limit_retry_attempted=True,
+                        **kwargs,
+                    )
             else:
                 logger.error(
                     f"Backend error during streaming API call: {e}", exc_info=True
diff --git a/src/connectors/gemini_base/thought_signature_manager.py b/src/connectors/gemini_base/thought_signature_manager.py
@@ -23,8 +23,10 @@ class ThoughtSignatureManager:
     Key format: "session_id:tool_call_id" -> thought_signature
     """
 
-    def __init__(self) -> None:
-        self._cache: dict[str, str] = {}
+    def __init__(self) -> None:
+        self._cache: dict[str, str] = {}
+        # Secondary index by tool_call_id to survive session-id changes
+        self._by_tool_call: dict[str, str] = {}
 
     @property
     def cache(self) -> dict[str, str]:
@@ -81,10 +83,13 @@ def _inject_signature_for_tool_call(self, tc: Any, session_id: str) -> None:
 
         # Look up in cache
         cache_key = f"{session_id}:{tc_id}"
-        sig = self._cache.get(cache_key)
+        sig = self._cache.get(cache_key)
         if not sig:
             # Try anonymous cache if session_id was missing at store time
             sig = self._cache.get(f"anon:{tc_id}")
+        if not sig:
+            # Fallback to global index by tool_call_id (handles session re-keying)
+            sig = self._by_tool_call.get(tc_id)
         if not sig:
             # No cached signature available; avoid injecting placeholders that
             # can trigger "corrupted thought signature" errors.
@@ -134,14 +139,15 @@ def store_signatures_from_tool_calls(
             cache_key = (
                 f"{session_id}:{tc_id}" if session_id else f"{anonymous_key}:{tc_id}"
             )
-            if cache_key:
-                self._cache[cache_key] = sig
-                if logger.isEnabledFor(logging.DEBUG):
-                    logger.debug(
-                        "Stored thought_signature for tool_call_id=%s (key=%s)",
-                        tc_id,
-                        cache_key[:16],
-                    )
+            if cache_key:
+                self._cache[cache_key] = sig
+                self._by_tool_call[tc_id] = sig
+                if logger.isEnabledFor(logging.DEBUG):
+                    logger.debug(
+                        "Stored thought_signature for tool_call_id=%s (key=%s)",
+                        tc_id,
+                        cache_key[:16],
+                    )
 
     def log_signature_state(
         self,