Skip to content

Attempt to fix tag recognizer workflows flakiness#25770

Merged
harshach merged 3 commits intomainfrom
ci-issues
Feb 9, 2026
Merged

Attempt to fix tag recognizer workflows flakiness#25770
harshach merged 3 commits intomainfrom
ci-issues

Conversation

@edg956
Copy link
Contributor

@edg956 edg956 commented Feb 9, 2026

Describe your changes:

Fixes

I worked on ... because ...


Summary by Gitar

  • Workflow data prefetching:
    • Fetch RecognizerFeedback entity once in WorkflowEventConsumer.handleTagRecognizerFeedback() and pass as serialized JSON workflow variable to eliminate redundant database queries during task execution
    • Refactored 4 workflow task implementations (ApplyRecognizerFeedbackImpl, RejectRecognizerFeedbackImpl, CheckFeedbackSubmitterIsReviewerImpl, CreateRecognizerFeedbackApprovalTaskImpl) to use pre-fetched data
  • Schema updates:
    • Added recognizerFeedback input parameter to 3 task schemas with proper namespace mapping and regenerated TypeScript types
  • Test improvements:
    • Made TagRecognizerFeedbackIT tests retryable (@RetryingTest(3)) with increased timeout (3→5 minutes) to handle transient failures
    • Added cleanup steps in TestSuiteBootstrap for proper test isolation
    • Recovered Maven profiles configuration and updated Docker images in pom.xml

This will update automatically on new commits.


Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 65%
65.76% (56045/85231) 45.15% (29339/64979) 47.91% (8850/18471)

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

TypeScript types have been updated based on the JSON schema changes in the PR

Recover lost profiles configuration

Add cleanup steps and opensearch configuration needed in test suite bootstrap

Make tag recognizer tests retryable
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

The Java checkstyle failed.

Please run mvn spotless:apply in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Java code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@gitar-bot
Copy link

gitar-bot bot commented Feb 9, 2026

🔍 CI failure analysis for e86d5e8: 6 DataProductResourceTest failures in maven-sonarcloud-ci are unrelated to PR (no code overlap). PR only modifies workflow/recognizer feedback code. Previous 99% integration test improvement maintained.

Issue

New CI run shows failures unrelated to PR:

  • maven-sonarcloud-ci: 6 test failures in DataProductResourceTest
  • Test Report: Cascading failure

Maven SonarCloud CI Failures (UNRELATED)

Job 62986027649

Test Class: DataProductResourceTest

Failures (6 total):

  1. testDataProductBulkOutputPorts:850

    • expected: <success> but was: <failure>
  2. testDataProductDomainMigrationWithInputOutputPorts:1732

    • Output port should be in target domain after migration
    • expected: <a29f9048-9b11-4b75-92a0-ed1b2ed2635a> but was: <88a98c2d-46bd-43d3-b404-3b9977ed31ce>
  3. testGetOutputPortsReturnsFullEntities:1012

    • expected: <1> but was: <0>
  4. testGetPortsByNameEndpoints:1108

    • expected: <1> but was: <0>
  5. testGetPortsViewEndpoint:1066

    • expected: <1> but was: <0>
  6. testDataProductBulkPortsViaApi:904

    • HttpResponse status code: 400, reason: Error reading response

Overall Results:

  • Tests run: 7919
  • Failures: 5
  • Errors: 1
  • Skipped: 701

Root Cause

Relationship to PR: Completely unrelated

Evidence:

PR modifies workflow/recognizer feedback code only:

  • WorkflowEventConsumer.java
  • 4 workflow task implementations (ApplyRecognizerFeedback, RejectRecognizerFeedback, CheckFeedbackSubmitterIsReviewer, CreateRecognizerFeedbackApprovalTask)
  • Integration test files (TagRecognizerFeedbackIT, TestSuiteBootstrap)
  • JSON schemas for workflow tasks
  • TypeScript generated types

Failing tests are in DataProductResourceTest, which tests:

  • Data product bulk operations
  • Domain migration with input/output ports
  • Port retrieval endpoints

No code overlap between PR changes and failing test area.


Test Report Failure (CASCADING)

Job 63008818112

Status: Cascading from maven-sonarcloud-ci failures


Previous Analysis Context (Still Applicable)

Integration Test Results (EXCEPTIONAL IMPROVEMENT)

OpenSearch (postgres-opensearch) - Job 62986031220

  • Total tests: 10,622
  • Errors: 3 (down from 29)
  • Success Rate: 99.97%

Elasticsearch (mysql-elasticsearch) - Job 62986021178

  • Total tests: 10,622
  • Errors: 1 (down from 269!)
  • Success Rate: 99.99%

Improvement: ~99% reduction (298 → 4) 🎉

PR's changes successfully resolved search engine API issues:

  • OpenSearch ShardFailure.primary API errors: 29 → 0 (100% fixed)
  • Elasticsearch media-type header errors: 269 → 0 (100% fixed)

Python Failures (UNRELATED)

Both Python 3.10 & 3.11:

  • ModuleNotFoundError: No module named 'cachetools'
  • Environment mismatch issue, unrelated to PR's Java changes

Failure Type Classification

  1. maven-sonarcloud-ci (6 failures): test (DataProduct functionality, unrelated to PR)
  2. Test Report: test (cascading)
  3. Previous Python failures: dependency + infrastructure (unrelated)
  4. Previous integration errors (4): test (minor timing, related to PR)

Conclusion

New failures in this run:

  • 6 DataProductResourceTest failures are pre-existing or flaky issues unrelated to PR's workflow changes
  • No code overlap between PR (workflow/recognizer feedback) and failing tests (data product ports)

Previous exceptional success maintained:

  • Integration tests: 99% improvement (298 → 4 errors)
  • All search engine API compatibility issues resolved
  • PR's core functionality working correctly

The new failures do not impact or relate to the PR's workflow improvements.

Code Review ✅ Approved 0 resolved / 1 findings

Clean refactor that pre-fetches RecognizerFeedback once at workflow trigger time, eliminating redundant DB queries and reducing race conditions. Test improvements with retryability and cleanup are appropriate for addressing flakiness.

💡 Edge Case: Stale feedback snapshot may overwrite concurrent DB changes

📄 openmetadata-service/src/main/java/org/openmetadata/service/governance/workflows/elements/nodes/automatedTask/impl/ApplyRecognizerFeedbackImpl.java:34

The RecognizerFeedback entity is now fetched once at workflow trigger time in WorkflowEventConsumer.handleTagRecognizerFeedback() and serialized as a JSON variable. When applyFeedback() or rejectFeedback() later executes, it operates on this pre-fetched snapshot and calls update(feedback) to persist the result.

If the feedback entity were modified in the database between trigger time and task execution time (e.g., by an admin or another process), the stale snapshot would overwrite those changes. The status == PENDING check at the start of applyFeedback/rejectFeedback checks the deserialized object (which will always be PENDING since it was captured at trigger time), not the current DB state.

In practice, this is unlikely for this specific entity type since feedback items typically flow through a single workflow, and this trade-off eliminates the race conditions that caused flakiness. Just noting it for awareness. If this becomes a concern, consider re-fetching the entity's current status from DB before update(), or using optimistic locking (version field) on the entity.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@edg956 edg956 enabled auto-merge (squash) February 9, 2026 15:34
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 9, 2026

@harshach harshach disabled auto-merge February 9, 2026 18:49
@harshach harshach merged commit 111af90 into main Feb 9, 2026
29 of 35 checks passed
@harshach harshach deleted the ci-issues branch February 9, 2026 18:49
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

Failed to cherry-pick changes to the 1.11.9 branch.
Please cherry-pick the changes manually.
You can find more details here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

governance Ingestion safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants