-
Notifications
You must be signed in to change notification settings - Fork 750
Fix/issue 400 #471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix/issue 400 #471
Conversation
Replace fragile usage_metadata-based logic with robust streaming detection that checks multiple explicit streaming indicators. **Problem:** The original logic relied on `not adk_event.usage_metadata` to determine if an event should be processed as streaming. This was fragile because Claude models can include usage_metadata even in streaming chunks, causing responses to disappear. **Solution:** Implement comprehensive streaming detection that checks: - `partial` attribute (explicitly marked as partial) - `turn_complete` attribute (live streaming completion status) - `is_final_response()` method (final response indicator) - `finish_reason` attribute (fallback for content without finish reason) This ensures all streaming content is captured regardless of usage_metadata presence, fixing compatibility with Claude Sonnet 4 and other models. **Testing:** ✅ All 277 tests pass ✅ Streaming detection works across different model providers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…k-agent Add regression test for partial final ADK chunks
Change TextMessageContentEvent to TextMessageChunkEvent in test to match actual AG-UI protocol event types. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…ttranslator Add test for ADK streaming fallback branch
…e for streaming event
Thanks Mark! Going to create a fork so our tests can run :) |
Seeing some failures in the ADK e2e - could you give that a look @contextablemark |
@tylerslaton I'm a little bit confused here... I tried running the same tests (using "pnpm test -- tests/adkMiddlewareTests") on my local machine, but I'm seeing only one failure: : ✅ Agentic Chat Feature: [ADK Middleware] Agentic Chat sends and receives a message Drilling into the actual failure yields : Expected: "我は勝つ常に勝利を掴み取るのみ" which boils down to - Expected "I will win, I will always seize victory" and Received "The moon shines, illuminating the night sky with a silent light" which um... yeah... carries two entirely different sentiments, so I'll look into that, but it's still different from what I'm seeing on the CI Build (which shows a number of additional failures) : 18:33:49 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat and perform steps 18:34:14 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat and perform steps 18:34:59 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat and perform steps 18:35:49 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps 18:36:39 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps 18:37:28 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps 18:38:18 ❌ Human in the Loop Feature: [ADK Middleware] should interact with the chat using predefined prompts and perform steps ⏭ Predictive State Updates Feature: [ADK Middleware] should interact with agent and approve asked changes (skipped) Any suggestions on what I could try changing in my local environment to try reproducing these issues?
|
The Tool Based Generative UI haiku test was exhibiting flaky behavior where it would sometimes pass and sometimes fail with the same test conditions. The test was more reliable when run with --headed than when run headless, suggesting a timing-related issue. Root cause: The extractMainDisplayHaikuContent() method was concatenating ALL visible haiku lines from the main display, while the chat extraction only captured the most recent haiku. When multiple haikus were displayed simultaneously (due to rendering timing), this caused mismatches. Fix: Modified extractMainDisplayHaikuContent() to extract only the last 3 lines (the most recent haiku), matching the behavior of the chat extraction and eliminating timing-related flakiness. This affects all 10 platform integration tests that use ToolBaseGenUIPage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
@tylerslaton I still was unable to reproduce any of the test failures other than "❌ Tool Based Generative UI Feature: [ADK Middleware] Haiku generation and UI consistency for two different prompts". Even this one exhibited flakiness; running the test with "--headed" would result in it passing almost all of the time whereas running with it headless would result in it almost (but not always) failing. So I updated the behavior of "extractMainDisplayHaikuContent" to match that of "extractChatHaikuContent" and only extract the most recent haiku rather than all of them concatenated. This results in a less flaky test that consistently passed. |
Setup Workload Identity Federation (cherry picked from commit 979b3dc)
@tylerslaton So I got setup with Depot so that I could replicate the CI environment and turn on the debug logs to try and figure out some of these issues. Turns out the "AI service down or API Key issues" was mapping in my logs to :
In my case, this was because the project where I had created the API key for the CI Build was on the "free tier" because it didn't have a billing account associated with it and gemini-2.0-flash, therefore, was limited to 30 requests per minute and Gemini 2.5 flash was limited to ten requests per minute - easily exhausted in the Dojo tests. As soon as I attached it to a billing account, I was able to get to this : 🎭 Running 11 tests... So I suggest you take a look at the billing account (or lack thereof) associated with the API key that the tests are running against to see whether that's the issue you're running into as well.
|
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add fallback logic to detect streaming completion using finish_reason when is_final_response returns False but finish_reason is set. **Problem:** Gemini returns events with partial=True and is_final_response()=False even on the final chunk that contains finish_reason="STOP". This caused streaming messages to remain open and require force-closing, resulting in warnings. **Solution:** Enhanced should_send_end logic to check for finish_reason as a fallback: - Check if finish_reason attribute exists and is truthy - If streaming is active and finish_reason is present, emit TEXT_MESSAGE_END - Formula: should_send_end = (is_final_response and not is_partial) or (has_finish_reason and self._is_streaming) **Testing:** ✅ All 277 tests pass ✅ Added test_partial_with_finish_reason to verify the fix ✅ Eliminates "Force-closing unterminated streaming message" warnings ✅ Properly emits TEXT_MESSAGE_END for events with finish_reason 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Prefer LRO routing in ADKAgent when long‑running tool call IDs are present in event.content.parts (prevents misrouting into streaming path and tool loops; preserves HITL pause) - Force‑close any active streaming text before emitting LRO tool events (guarantees TEXT_MESSAGE_END precedes TOOL_CALL_START) - Harden EventTranslator.translate to filter out long‑running tool calls from the general path; only emit non‑LRO calls (avoids duplicate tool events) - Add tests: * test_lro_filtering.py (translator‑level filtering + LRO‑only emission) * test_integration_mixed_partials.py (streaming → non‑LRO → final LRO: order, no duplicates, correct IDs)
I think I might have been running against the wrong commit when I generated the test run above, especially after looking more closely at #474. In any case, the tests should now be passing reliably : 🎭 Running 11 tests... |
Addressing #400 and adding additional tests to prevent against such issues in the future.
Note: Despite all attempts to resolve quota issues, I was unable to get access to the Claude Sonnet LLM running on Vertex. However, one of the new test cases replicates this sequence.