Skip to content

Tracking: Flaky compaction E2E tests skipped in 3 of 4 SDKs with no tracking issue #1227

@edburns

Description

@edburns

Summary

Compaction E2E tests are skipped in 3 of 4 SDKs with the comment "Compaction tests are skipped due to flakiness — re-enable once stabilized", but no tracking issue was filed.

This was discovered while running the Java SDK port's CompactionTest on Windows vs macOS — the test produced different results on each platform. Investigation revealed that the reference implementation had already skipped these tests in Node.js, Python, and Go, but left .NET (and by extension the Java port) still running them.

Current status by SDK

SDK Status File Skip mechanism
Node.js ❌ Skipped nodejs/test/e2e/compaction.e2e.test.ts describe.skip(...)
Python ❌ Skipped python/e2e/test_compaction_e2e.py pytest.mark.skip(reason=...)
Go ❌ Skipped go/internal/e2e/compaction_e2e_test.go t.Skip(...)
.NET ✅ Active dotnet/test/E2E/CompactionE2ETests.cs [Fact] (no skip)

Root cause

The compaction snapshot replay is non-deterministic — after compaction, the CLI reconstructs user messages with a summary whose exact content can vary between runs. This causes the replaying proxy to fail to match post-compaction messages against the recorded snapshot.

Impact

  • The Java SDK port (github/copilot-sdk-java) inherited the .NET test and experienced intermittent CI failures on main (e.g., CompactionTest.testShouldTriggerCompactionWithLowThresholdAndEmitEvents).
  • PR Merge reference implementation SDK changes (2026-05-06) copilot-sdk-java#166 worked around this by adding rerunFailingTestsCount=2 to Maven Surefire, which masks the flakiness rather than fixing it.

Request

  1. File this as the tracking issue so the skip comments can reference it.
  2. Stabilize the compaction test snapshot or make the assertion resilient to non-deterministic compaction output.
  3. Re-enable the tests across all SDKs once stabilized.
  4. Decide whether .NET should also skip the test for consistency until this is resolved.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions