Skip to content

Conversation

@manerow
Copy link
Contributor

@manerow manerow commented Jan 27, 2026

Describe your changes

Fixes https://github.com/open-metadata/openmetadata-collate/issues/2769

TestSuite testCaseResultSummary suffered from two independent data loss bugs, causing customers to see only ~10% of their test results (e.g., 8 out of 78):

Root causes

  1. Lost-update race condition
    updateTestSuiteSummary() in TestCaseResultRepository performed concurrent read-modify-write operations on the TestSuite entity without locking.
    With 78 parallel test results, most writes were overwritten by the last thread to save.

  2. Elasticsearch terms aggregation truncation
    getResultSummary() queried Elasticsearch with a terms aggregation hardcoded to size=100.
    With more than 100 unique test cases globally, only the top 100 by document count were returned.
    The target suite’s test cases competed for those slots, yielding partial results and preventing the DB fallback from triggering.

Changes

  • Removed updateTestSuiteSummary() from TestCaseResultRepository
    Eliminates the race condition entirely (78 concurrent writes per pipeline → 0).

  • Switched getResultSummary() to DB-only in TestSuiteRepository
    Uses listLastTestCaseResultsForTestSuite, which has:

    • no size limit
    • no eventual consistency issues
    • direct joins via entity_relationship
  • Persist entity at pipeline completion in onTestSuiteExecutionComplete()
    Fetches the entity with a fresh DB-computed summary, bumps the version, calls storeEntity(), and updates the search index once.

  • Refactored createTestSuiteCompletionChangeEvent()

    • Reads the summary from the already-computed entity
    • Tracks the correct version transition
    • Uses the pipeline user instead of hardcoded "admin"
  • Added integration test test_pipelineCompletionUpdatesSearchIndex in TestSuiteResourceIT
    Verifies that after pipeline completion:

    • the TestSuite version is bumped
    • DB summary counts (total/success/failed) are correct
    • the Elasticsearch search index document reflects the updated summary

Type of change

  • Bug fix

Checklist

Bug fix

  • I have added a test that covers the exact scenario we are fixing. For complex issues, the issue number is referenced in the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

alerts and notifications backend safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants