Skip to content

Conversation

contextablemark
Copy link
Contributor

Addressing #400 and adding additional tests to prevent against such issues in the future.
Note: Despite all attempts to resolve quota issues, I was unable to get access to the Claude Sonnet LLM running on Vertex. However, one of the new test cases replicates this sequence.

contextablemark and others added 10 commits October 2, 2025 00:00
Replace fragile usage_metadata-based logic with robust streaming detection
that checks multiple explicit streaming indicators.

**Problem:**
The original logic relied on `not adk_event.usage_metadata` to determine
if an event should be processed as streaming. This was fragile because
Claude models can include usage_metadata even in streaming chunks,
causing responses to disappear.

**Solution:**
Implement comprehensive streaming detection that checks:
- `partial` attribute (explicitly marked as partial)
- `turn_complete` attribute (live streaming completion status)
- `is_final_response()` method (final response indicator)
- `finish_reason` attribute (fallback for content without finish reason)

This ensures all streaming content is captured regardless of usage_metadata
presence, fixing compatibility with Claude Sonnet 4 and other models.

**Testing:**
✅ All 277 tests pass
✅ Streaming detection works across different model providers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…k-agent

Add regression test for partial final ADK chunks
Change TextMessageContentEvent to TextMessageChunkEvent in test to match
actual AG-UI protocol event types.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…ttranslator

Add test for ADK streaming fallback branch
@contextablemark contextablemark changed the title Fix/issue 400 clean Fix/issue 400 Oct 3, 2025
@tylerslaton
Copy link
Contributor

Thanks Mark! Going to create a fork so our tests can run :)

@tylerslaton
Copy link
Contributor

tylerslaton commented Oct 3, 2025

Seeing some failures in the ADK e2e - could you give that a look @contextablemark

@contextablemark
Copy link
Contributor Author

@tylerslaton I'm a little bit confused here... I tried running the same tests (using "pnpm test -- tests/adkMiddlewareTests") on my local machine, but I'm seeing only one failure: :
🎭 Running 11 tests...

✅ Agentic Chat Feature: [ADK Middleware] Agentic Chat sends and receives a message
⏭ Predictive State Updates Feature: [ADK Middleware] should interact with agent and approve asked changes (skipped)
⏭ Predictive State Updates Feature: [ADK Middleware] should interact with agent and reject asked changes (skipped)
✅ Agentic Chat Feature: [ADK Middleware] Agentic Chat changes background on message and reset
✅ Agentic Chat Feature: [ADK Middleware] Agentic Chat retains memory of user messages during a conversation
✅ Shared State Feature: [ADK Middleware] should interact with the chat to get a recipe on prompt
✅ Shared State Feature: [ADK Middleware] should share state between UI and chat
✅ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps
✅ Human in the Loop Feature: [ADK Middleware] should interact with the chat and perform steps
✅ Tool Based Generative UI Feature: [ADK Middleware] Haiku generation and display verification
❌ Tool Based Generative UI Feature: [ADK Middleware] Haiku generation and UI consistency for two different prompts

Drilling into the actual failure yields :
Error: expect(received).toBe(expected) // Object.is equality

Expected: "我は勝つ常に勝利を掴み取るのみ"
Received: "月は輝き夜空を照らす静寂の光"

which boils down to - Expected "I will win, I will always seize victory" and Received "The moon shines, illuminating the night sky with a silent light" which um... yeah... carries two entirely different sentiments, so I'll look into that, but it's still different from what I'm seeing on the CI Build (which shows a number of additional failures) :
18:33:20 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat and perform steps
💥 Error: expect.toBeVisible: Error: strict mode violation: getByTestId('select-steps') resolved to 2 elements:
🔑 Likely cause: AI service down or API key issue

18:33:49 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat and perform steps
💥 Error: expect.toBeVisible: Error: strict mode violation: getByTestId('select-steps') resolved to 2 elements:
🔑 Likely cause: AI service down or API key issue

18:34:14 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat and perform steps
💥 Error: expect.toBeVisible: Error: strict mode violation: getByTestId('select-steps') resolved to 2 elements:
🔑 Likely cause: AI service down or API key issue

18:34:59 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat and perform steps
💥 Element not found: UI element

18:35:49 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps
💥 Element not found: UI element

18:36:39 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps
💥 Element not found: UI element

18:37:28 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps
💥 Element not found: UI element

18:38:18 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps
💥 Element not found: UI element

⏭ Predictive State Updates Feature: [ADK Middleware] should interact with agent and approve asked changes (skipped)
⏭ Predictive State Updates Feature: [ADK Middleware] should interact with agent and reject asked changes (skipped)
18:40:20 ❌ Shared State Feature: [ADK Middleware] should interact with the chat to get a recipe on prompt
💥 Element not found: .ingredient-card:has(input.ingredient-name-input[value="Pasta"])

Any suggestions on what I could try changing in my local environment to try reproducing these issues?

  • Mark

The Tool Based Generative UI haiku test was exhibiting flaky behavior
where it would sometimes pass and sometimes fail with the same test
conditions. The test was more reliable when run with --headed than
when run headless, suggesting a timing-related issue.

Root cause: The extractMainDisplayHaikuContent() method was concatenating
ALL visible haiku lines from the main display, while the chat extraction
only captured the most recent haiku. When multiple haikus were displayed
simultaneously (due to rendering timing), this caused mismatches.

Fix: Modified extractMainDisplayHaikuContent() to extract only the last
3 lines (the most recent haiku), matching the behavior of the chat
extraction and eliminating timing-related flakiness.

This affects all 10 platform integration tests that use ToolBaseGenUIPage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@contextablemark
Copy link
Contributor Author

@tylerslaton I still was unable to reproduce any of the test failures other than "❌ Tool Based Generative UI Feature: [ADK Middleware] Haiku generation and UI consistency for two different prompts". Even this one exhibited flakiness; running the test with "--headed" would result in it passing almost all of the time whereas running with it headless would result in it almost (but not always) failing.

So I updated the behavior of "extractMainDisplayHaikuContent" to match that of "extractChatHaikuContent" and only extract the most recent haiku rather than all of them concatenated. This results in a less flaky test that consistently passed.

@contextablemark
Copy link
Contributor Author

@tylerslaton So I got setup with Depot so that I could replicate the CI environment and turn on the debug logs to try and figure out some of these issues.

Turns out the "AI service down or API Key issues" was mapping in my logs to :

2025-10-04T21:49:08.9049539Z [ADK Middleware] Background execution error: 429 Too Many Requests. {'message': '{\n "error": {\n "code": 429,\n "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.\\n* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 15\\nPlease retry in 2.742536337s.",\n "status": "RESOURCE_EXHAUSTED",\n "details": [\n {\n "@type": "type.googleapis.com/google.rpc.QuotaFailure",\n "violations": [\n {\n "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",\n "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",\n "quotaDimensions": {\n "location": "global",\n "model": "gemini-2.0-flash"\n },\n "quotaValue": "15"\n }\n ]\n },\n {\n "@type": "type.googleapis.com/google.rpc.Help",\n "links": [\n {\n "description": "Learn more about Gemini API quotas",\n "url": "https://ai.google.dev/gemini-api/docs/rate-limits"\n }\n ]\n },\n {\n "@type": "type.googleapis.com/google.rpc.RetryInfo",\n "retryDelay": "2s"\n }\n ]\n }\n}\n', 'status': 'Too Many Requests'}

In my case, this was because the project where I had created the API key for the CI Build was on the "free tier" because it didn't have a billing account associated with it and gemini-2.0-flash, therefore, was limited to 30 requests per minute and Gemini 2.5 flash was limited to ten requests per minute - easily exhausted in the Dojo tests. As soon as I attached it to a billing account, I was able to get to this :

🎭 Running 11 tests...
22:43:59 ✅ Agentic Chat Feature: [ADK Middleware] Agentic Chat sends and receives a message
22:44:05 ✅ Agentic Chat Feature: [ADK Middleware] Agentic Chat changes background on message and reset
22:44:15 ✅ Agentic Chat Feature: [ADK Middleware] Agentic Chat retains memory of user messages during a conversation
22:44:28 ✅ Human in the Loop Feature: [ADK Middleware] should interact with the chat and perform steps
22:44:41 ✅ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps
⏭ Predictive State Updates Feature: [ADK Middleware] should interact with agent and approve asked changes (skipped)
⏭ Predictive State Updates Feature: [ADK Middleware] should interact with agent and reject asked changes (skipped)
22:44:50 ✅ Shared State Feature: [ADK Middleware] should interact with the chat to get a recipe on prompt
22:44:57 ✅ Shared State Feature: [ADK Middleware] should share state between UI and chat
22:45:09 ✅ Tool Based Generative UI Feature: [ADK Middleware] Haiku generation and display verification
22:45:19 ❌ Tool Based Generative UI Feature: [ADK Middleware] Haiku generation and UI consistency for two different prompts
💥 TimeoutError: locator.waitFor: Timeout 10000ms exceeded.
22:45:41 ✅ Tool Based Generative UI Feature: [ADK Middleware] Haiku generation and UI consistency for two different prompts

So I suggest you take a look at the billing account (or lack thereof) associated with the API key that the tests are running against to see whether that's the issue you're running into as well.

  • Mark

contextablemark and others added 3 commits October 4, 2025 22:33
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add fallback logic to detect streaming completion using finish_reason
when is_final_response returns False but finish_reason is set.

**Problem:**
Gemini returns events with partial=True and is_final_response()=False
even on the final chunk that contains finish_reason="STOP". This caused
streaming messages to remain open and require force-closing, resulting
in warnings.

**Solution:**
Enhanced should_send_end logic to check for finish_reason as a fallback:
- Check if finish_reason attribute exists and is truthy
- If streaming is active and finish_reason is present, emit TEXT_MESSAGE_END
- Formula: should_send_end = (is_final_response and not is_partial) or
           (has_finish_reason and self._is_streaming)

**Testing:**
✅ All 277 tests pass
✅ Added test_partial_with_finish_reason to verify the fix
✅ Eliminates "Force-closing unterminated streaming message" warnings
✅ Properly emits TEXT_MESSAGE_END for events with finish_reason

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Prefer LRO routing in ADKAgent when long‑running tool call IDs are present in event.content.parts (prevents misrouting into streaming path and tool loops; preserves HITL pause)
- Force‑close any active streaming text before emitting LRO tool events (guarantees TEXT_MESSAGE_END precedes TOOL_CALL_START)
- Harden EventTranslator.translate to filter out long‑running tool calls from the general path; only emit non‑LRO calls (avoids duplicate tool events)
- Add tests:
  * test_lro_filtering.py (translator‑level filtering + LRO‑only emission)
  * test_integration_mixed_partials.py (streaming → non‑LRO → final LRO: order, no duplicates, correct IDs)
@contextablemark
Copy link
Contributor Author

I think I might have been running against the wrong commit when I generated the test run above, especially after looking more closely at #474. In any case, the tests should now be passing reliably :

🎭 Running 11 tests...
07:16:54 ✅ Agentic Chat Feature: [ADK Middleware] Agentic Chat sends and receives a message
07:17:00 ✅ Agentic Chat Feature: [ADK Middleware] Agentic Chat changes background on message and reset
07:17:11 ✅ Agentic Chat Feature: [ADK Middleware] Agentic Chat retains memory of user messages during a conversation
07:17:25 ✅ Human in the Loop Feature: [ADK Middleware] should interact with the chat and perform steps
07:17:38 ✅ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps
⏭ Predictive State Updates Feature: [ADK Middleware] should interact with agent and approve asked changes (skipped)
⏭ Predictive State Updates Feature: [ADK Middleware] should interact with agent and reject asked changes (skipped)
07:17:49 ✅ Shared State Feature: [ADK Middleware] should interact with the chat to get a recipe on prompt
07:17:57 ✅ Shared State Feature: [ADK Middleware] should share state between UI and chat
07:18:04 ✅ Tool Based Generative UI Feature: [ADK Middleware] Haiku generation and display verification
07:18:18 ✅ Tool Based Generative UI Feature: [ADK Middleware] Haiku generation and UI consistency for two different prompts
Notice: 2 skipped
9 passed (1.5m)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants