Skip to content

Commit 93537ab

Browse files
Mateuszfactory-droid[bot]
andcommitted
feat(gemini): add rate limit retry and improve thought signature persistence
- Implemented automatic retry for rate limit errors in Gemini connector - Added secondary index to ThoughtSignatureManager for better persistence - Updated CBOR capture documentation Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
1 parent b129adc commit 93537ab

File tree

4 files changed

+158
-14
lines changed

4 files changed

+158
-14
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,15 @@
1313
- **Unit Tests**: 18 new unit tests in `test_gemini_oauth_auth_retry.py`
1414
- **Behavioral Tests**: 12 new behavioral tests in `test_gemini_oauth_auth_retry_behavior.py`
1515

16+
- **CBOR Documentation**: Comprehensive guide for CBOR wire capture in `docs/user_guide/debugging/cbor-capture.md`
17+
1618
- New Gemini connector modules: `generation_config_builder.py`, `model_validation.py`, `response_accumulator.py`, `response_text_extractor.py`, `retry_delay_parser.py`, `thought_signature_manager.py`, `user_prompt_id_generator.py`
1719
- Database schema and usage tracking documentation in `docs/database-*.md`
1820

1921
### Fixed
2022

23+
- **Gemini Rate Limits**: Implemented automatic retry for rate limit errors with `retry-after` header support in `GeminiOAuthBaseConnector`
24+
- **Thought Signatures**: Added secondary index by tool call ID to `ThoughtSignatureManager` to persist signatures across session ID changes
2125
- Extensive fixes in debugging scripts for streaming, tool calls, CBOR, and Gemini issues
2226
- Gemini connector refactoring for improved reliability and error handling
2327
- Core services enhancements for backend request management, response processing, tool call reactor middleware, and translation
Lines changed: 105 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,105 @@
1-
# CBOR Wire Capture
2-
3-
CBOR (Concise Binary
1+
# CBOR Wire Capture
2+
3+
CBOR (Concise Binary Object Representation) wire capture provides a high-performance, compact alternative to JSON-based wire capture. It records full HTTP requests and responses in a binary format that is faster to write and takes up less space.
4+
5+
## Overview
6+
7+
CBOR wire capture is designed for high-throughput environments where minimizing I/O overhead is critical. It captures the same detailed information as the JSON format but uses the efficient CBOR binary standard.
8+
9+
## Enabling CBOR Capture
10+
11+
You can enable CBOR capture via the CLI or configuration file.
12+
13+
### Via CLI
14+
15+
```bash
16+
python -m src.core.cli --cbor-capture-file var/wire_captures_cbor/session.cbor
17+
```
18+
19+
### Via Configuration
20+
21+
```yaml
22+
logging:
23+
cbor_capture_file: "var/wire_captures_cbor/session.cbor"
24+
```
25+
26+
## Inspecting CBOR Captures
27+
28+
Since CBOR is a binary format, you cannot read it directly with a text editor. The project provides a dedicated inspection tool: `scripts/inspect_cbor_capture.py`.
29+
30+
### Basic Usage
31+
32+
```bash
33+
python scripts/inspect_cbor_capture.py var/wire_captures_cbor/session.cbor
34+
```
35+
36+
This will print a summary of the capture file, including session ID, duration, and entry counts.
37+
38+
### Filtering Entries
39+
40+
You can filter entries to find specific requests or time ranges.
41+
42+
#### By Time Range
43+
44+
Filter entries based on timestamps. You can use Unix timestamps, ISO datetime strings, or time-only strings (assumes today's date).
45+
46+
```bash
47+
# Filter by Unix timestamp
48+
python scripts/inspect_cbor_capture.py session.cbor --start-time 1702300000 --end-time 1702400000
49+
50+
# Filter by ISO datetime
51+
python scripts/inspect_cbor_capture.py session.cbor --start-time "2024-01-15T10:00:00"
52+
53+
# Filter by time of day
54+
python scripts/inspect_cbor_capture.py session.cbor --start-time "10:30:00" --end-time "11:00:00"
55+
```
56+
57+
#### By Backend
58+
59+
```bash
60+
# Show only entries for the OpenAI backend
61+
python scripts/inspect_cbor_capture.py session.cbor --backend openai
62+
```
63+
64+
#### By Direction
65+
66+
```bash
67+
# Show only backend responses
68+
python scripts/inspect_cbor_capture.py session.cbor --direction backend_to_proxy
69+
```
70+
71+
### Advanced Analysis
72+
73+
The tool includes powerful analysis features to help debug issues.
74+
75+
#### Detect Issues
76+
77+
Automatically scan for common problems like errors, slow responses, or rate limits.
78+
79+
```bash
80+
python scripts/inspect_cbor_capture.py session.cbor --detect-issues
81+
```
82+
83+
#### Request/Response Analysis
84+
85+
Analyze paired requests and responses to see latency, token usage, and content.
86+
87+
```bash
88+
python scripts/inspect_cbor_capture.py session.cbor --analyze
89+
```
90+
91+
#### Timeline View
92+
93+
Visualize traffic over time to identify gaps or latency spikes.
94+
95+
```bash
96+
python scripts/inspect_cbor_capture.py session.cbor --timeline
97+
```
98+
99+
### Exporting to JSON
100+
101+
If you need to process the data with other tools (like `jq`), you can export it to JSON.
102+
103+
```bash
104+
python scripts/inspect_cbor_capture.py session.cbor --json > export.json
105+
```

src/connectors/gemini_base/connector.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1432,6 +1432,7 @@ async def _chat_completions_code_assist(
14321432
effective_model: str,
14331433
_in_graceful_degradation: bool = False,
14341434
_auth_retry_attempted: bool = False,
1435+
_rate_limit_retry_attempted: bool = False,
14351436
**kwargs: Any,
14361437
) -> ResponseEnvelope | StreamingResponseEnvelope:
14371438
"""Handle chat completions using the Code Assist API.
@@ -1521,6 +1522,22 @@ def thought_signature_callback(
15211522
except BackendError as e:
15221523
if self._is_rate_limit_like_error(e):
15231524
logger.info("Backend rate limited during API call: %s", e)
1525+
retry_after = self._extract_retry_delay(e)
1526+
if retry_after and retry_after > 0 and not _rate_limit_retry_attempted:
1527+
logger.info(
1528+
"Retrying after rate limit in %.2fs (non-streaming)",
1529+
retry_after,
1530+
)
1531+
await asyncio.sleep(retry_after)
1532+
return await self._chat_completions_code_assist(
1533+
request_data=request_data,
1534+
processed_messages=processed_messages,
1535+
effective_model=effective_model,
1536+
_in_graceful_degradation=_in_graceful_degradation,
1537+
_auth_retry_attempted=_auth_retry_attempted,
1538+
_rate_limit_retry_attempted=True,
1539+
**kwargs,
1540+
)
15241541
else:
15251542
logger.error(f"Backend error during API call: {e}", exc_info=True)
15261543
raise
@@ -1536,6 +1553,7 @@ async def _chat_completions_code_assist_streaming(
15361553
request_data: Any,
15371554
processed_messages: list[Any],
15381555
effective_model: str,
1556+
_rate_limit_retry_attempted: bool = False,
15391557
**kwargs: Any,
15401558
) -> StreamingResponseEnvelope:
15411559
"""Handle streaming chat completions using the Code Assist API.
@@ -1707,6 +1725,20 @@ async def auth_error_stream() -> AsyncGenerator[ProcessedResponse, None]:
17071725
except BackendError as e:
17081726
if self._is_rate_limit_like_error(e):
17091727
logger.info("Backend rate limited during streaming API call: %s", e)
1728+
retry_after = self._extract_retry_delay(e)
1729+
if retry_after and retry_after > 0 and not _rate_limit_retry_attempted:
1730+
logger.info(
1731+
"Retrying streaming call after %.2fs due to rate limit",
1732+
retry_after,
1733+
)
1734+
await asyncio.sleep(retry_after)
1735+
return await self._chat_completions_code_assist_streaming(
1736+
request_data=request_data,
1737+
processed_messages=processed_messages,
1738+
effective_model=effective_model,
1739+
_rate_limit_retry_attempted=True,
1740+
**kwargs,
1741+
)
17101742
else:
17111743
logger.error(
17121744
f"Backend error during streaming API call: {e}", exc_info=True

src/connectors/gemini_base/thought_signature_manager.py

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,10 @@ class ThoughtSignatureManager:
2323
Key format: "session_id:tool_call_id" -> thought_signature
2424
"""
2525

26-
def __init__(self) -> None:
27-
self._cache: dict[str, str] = {}
26+
def __init__(self) -> None:
27+
self._cache: dict[str, str] = {}
28+
# Secondary index by tool_call_id to survive session-id changes
29+
self._by_tool_call: dict[str, str] = {}
2830

2931
@property
3032
def cache(self) -> dict[str, str]:
@@ -81,10 +83,13 @@ def _inject_signature_for_tool_call(self, tc: Any, session_id: str) -> None:
8183

8284
# Look up in cache
8385
cache_key = f"{session_id}:{tc_id}"
84-
sig = self._cache.get(cache_key)
86+
sig = self._cache.get(cache_key)
8587
if not sig:
8688
# Try anonymous cache if session_id was missing at store time
8789
sig = self._cache.get(f"anon:{tc_id}")
90+
if not sig:
91+
# Fallback to global index by tool_call_id (handles session re-keying)
92+
sig = self._by_tool_call.get(tc_id)
8893
if not sig:
8994
# No cached signature available; avoid injecting placeholders that
9095
# can trigger "corrupted thought signature" errors.
@@ -134,14 +139,15 @@ def store_signatures_from_tool_calls(
134139
cache_key = (
135140
f"{session_id}:{tc_id}" if session_id else f"{anonymous_key}:{tc_id}"
136141
)
137-
if cache_key:
138-
self._cache[cache_key] = sig
139-
if logger.isEnabledFor(logging.DEBUG):
140-
logger.debug(
141-
"Stored thought_signature for tool_call_id=%s (key=%s)",
142-
tc_id,
143-
cache_key[:16],
144-
)
142+
if cache_key:
143+
self._cache[cache_key] = sig
144+
self._by_tool_call[tc_id] = sig
145+
if logger.isEnabledFor(logging.DEBUG):
146+
logger.debug(
147+
"Stored thought_signature for tool_call_id=%s (key=%s)",
148+
tc_id,
149+
cache_key[:16],
150+
)
145151

146152
def log_signature_state(
147153
self,

0 commit comments

Comments
 (0)