Fix: Flaky Workflow Test cross polluted by SystemResource test by yan-3005 · Pull Request #25789 · open-metadata/OpenMetadata

yan-3005 · 2026-02-10T11:49:39Z

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

Fixed cross-test pollution:
- test_assetCertificationSettings in SystemResourceIT.java now resets AssetCertificationSettings to defaults after test completion
Defensive setup:
- setupCertificationTags_SDK in WorkflowDefinitionResourceIT.java ensures Certification.Gold tag exists and resets settings before workflow tests
Test isolation improvement:
- Prevents flaky failures caused by shared state modifications between test classes

_{This will update automatically on new commits.}

gitar-bot · 2026-02-10T13:32:45Z

🔍 CI failure analysis for 0c617ff: CI failures span backend integration tests (search indexing timeouts) and frontend Playwright tests (UI timing issues). All failures are unrelated to this PR's AssetCertificationSettings cleanup changes.

Issue

The CI failures across multiple test suites are pre-existing flaky tests unrelated to this PR's changes. This PR only modifies test cleanup logic for AssetCertificationSettings in SystemResourceIT and WorkflowDefinitionResourceIT, while failures occur in completely different areas.

Root Cause

Two Categories of Unrelated Failures:

1. Backend Integration Tests: Search Indexing Infrastructure Issues

All failing backend tests involve Elasticsearch/OpenSearch search indexing operations experiencing timing issues and race conditions:

AppsResourceIT.test_triggerApp_200 (Lines 382-384): SearchIndexingApplication conflict - "Job is already running" after 2-minute timeout
ColumnGridResourceIT.test_getColumnGrid_withServiceFilter (Line 244): Created table columns not appearing in search results
ColumnGridResourceIT.test_getColumnGrid_withGlossaryTermsFilter (Line 1037): Glossary-filtered columns missing from search results
APICollectionResourceIT.updateDescriptionAndCheckInSearch (Line 4739): Entity not appearing in OpenSearch index within 60s timeout
ClassificationResourceIT.updateDescriptionAndCheckInSearch (BaseEntityIT:4739): Entity not appearing in search index within 60s timeout
SearchResourceIT.testEntityTypeCountsWithEmptyResults (Line 794): OpenSearch client error - "Missing required property 'ShardFailure.primary'"
DataProductDomainMigrationIT tests (Lines 302, 1072): Assets not appearing in domain endpoint - expected 2, found 0

2. Frontend E2E Tests: Playwright Browser/Timing Issues

Playwright tests failing due to UI timing and browser stability issues:

Hard Failures:

Roles.spec.ts:512: Strict mode violation - 2 loader elements visible when expecting 0
CustomizeWidgets.spec.ts: Test timeout - browser/page closed unexpectedly while waiting for KPI API responses

Flaky Tests (8 passed on retry):

HyperlinkCustomProperty, Tag, Users, AutoPilot, ColumnBulkOperations, SysadminUsers specs
All exhibit element timing issues, navigation timeouts, or UI state inconsistencies

Details

Infrastructure Problems:

Elasticsearch/OpenSearch Shard Failures: Recurring search_phase_execution_exception: all shards failed errors throughout test runs at: 12:03, 12:05, 12:06, 12:07, 12:14

OpenSearch Mapping Exception:

mapper_parsing_exception: failed to parse field [lifeCycle] of type [keyword]
in document with id '2c1af314-c907-4f35-a902-9d052a0ecd3c'

Search index schema mismatch - expects keyword string but receives object.

Thread Interruption: Event processing threads forcefully interrupted during shutdown (LMAX Disruptor cleanup issues)

Browser Instability: Playwright tests experiencing unexpected page closures and element rendering delays in CI environment

Why Unrelated to This PR:

This PR modifies: Only test cleanup in SystemResourceIT.java and WorkflowDefinitionResourceIT.java for AssetCertificationSettings
Failing tests involve: Search indexing (AppsResource, ColumnGrid, Classification, APICollection, SearchResource, DataProductDomainMigration) and UI components (Roles, Widgets, Users, Tags, AutoPilot)
No functional overlap: PR adds defensive test cleanup code; does not touch search infrastructure, UI components, or any failing test areas
High pass rates: Backend ~99.97% (10,619/10,622 tests), Frontend ~93-96% (981/1032 tests) - indicates environmental flakiness, not systematic bugs
Consistent pattern across environments: Same failures in both MySQL+Elasticsearch and PostgreSQL+OpenSearch configurations

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Good fix for cross-test pollution with a two-pronged approach (source cleanup + defensive initialization). One minor suggestion to use try/finally for more robust cleanup.

💡 Edge Case: Reset in SystemResourceIT skipped if assertions fail

📄 openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SystemResourceIT.java:1031

The cleanup code that resets AssetCertificationSettings back to defaults (lines 1031-1044) runs after the assertions on lines 1028-1029. If either assertEquals fails (e.g., due to a regression in the settings update API), the reset code is never reached, and the cross-test pollution persists.

A more robust pattern would wrap the test logic in a try/finally block:

try {
    // ... update settings and assert ...
    assertEquals("NewCertification", updatedCertificationConfig.getAllowedClassification());
    assertEquals("P60D", updatedCertificationConfig.getValidityPeriod());
} finally {
    // Reset to original values to avoid cross-test pollution
    certificationConfig.setAllowedClassification("Certification");
    certificationConfig.setValidityPeriod("P30D");
    // ... PUT reset settings ...
}

This is minor because the defensive setup in WorkflowDefinitionResourceIT.setupCertificationTags_SDK() already guards against this scenario from the consumer side. However, the try/finally pattern would make the source-side cleanup more robust and protect any other tests that might depend on these settings in the future.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

github-actions · 2026-02-10T15:38:13Z

Failed to cherry-pick changes to the 1.11.9 branch.
Please cherry-pick the changes manually.
You can find more details here.

Fix: Flaky Workflow Test cross polluted by SystemResource test

0c617ff

yan-3005 self-assigned this Feb 10, 2026

yan-3005 added safe to test Add this label to run secure Github workflows on PRs backend labels Feb 10, 2026

yan-3005 had a problem deploying to test February 10, 2026 11:49 — with GitHub Actions Error

yan-3005 temporarily deployed to test February 10, 2026 11:49 — with GitHub Actions Inactive

yan-3005 added the To release Will cherry-pick this PR into the release branch label Feb 10, 2026

yan-3005 temporarily deployed to test February 10, 2026 13:18 — with GitHub Actions Inactive

yan-3005 had a problem deploying to test February 10, 2026 13:18 — with GitHub Actions Failure

yan-3005 temporarily deployed to test February 10, 2026 13:18 — with GitHub Actions Inactive

harshach approved these changes Feb 10, 2026

View reviewed changes

yan-3005 merged commit d8265b7 into main Feb 10, 2026
31 of 51 checks passed

yan-3005 deleted the ram/fix-workflows-flaky-test branch February 10, 2026 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Flaky Workflow Test cross polluted by SystemResource test#25789

Fix: Flaky Workflow Test cross polluted by SystemResource test#25789
yan-3005 merged 1 commit intomainfrom
ram/fix-workflows-flaky-test

yan-3005 commented Feb 10, 2026 •

edited by gitar-bot bot

Loading

Uh oh!

gitar-bot bot commented Feb 10, 2026 •

edited

Loading

Issue

Root Cause

1. Backend Integration Tests: Search Indexing Infrastructure Issues

2. Frontend E2E Tests: Playwright Browser/Timing Issues

Details

Infrastructure Problems:

Why Unrelated to This PR:

Uh oh!

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yan-3005 commented Feb 10, 2026 • edited by gitar-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Summary by Gitar

Uh oh!

gitar-bot bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Root Cause

1. Backend Integration Tests: Search Indexing Infrastructure Issues

2. Frontend E2E Tests: Playwright Browser/Timing Issues

Details

Infrastructure Problems:

Why Unrelated to This PR:

Uh oh!

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yan-3005 commented Feb 10, 2026 •

edited by gitar-bot bot

Loading

gitar-bot bot commented Feb 10, 2026 •

edited

Loading