Optimised Internal methods for adding relationships for import, fixed circular dependency, generated changeEvents #25582

yan-3005 · 2026-01-28T09:08:00Z

Tag clearing fix: Add deleteTagsByTarget before applying new tags in batch imports to match single entity import behavior, ensuring empty CSV fields properly clear existing tags
Circular dependency detection fix: Pre-track entities in dryRunCreatedEntities before parent resolution to enable proper circular reference validation during CSV team imports
Resolves test failures in TeamResourceIT.test_importCsv_circularDependency_trueRun and tag-related import issues
Maintains batch import performance while restoring pre-batch-import validation contracts

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

- **Tag clearing fix**: Add deleteTagsByTarget before applying new tags in batch imports to match single entity import behavior, ensuring empty CSV fields properly clear existing tags - **Circular dependency detection fix**: Pre-track entities in dryRunCreatedEntities before parent resolution to enable proper circular reference validation during CSV team imports - Resolves test failures in TeamResourceIT.test_importCsv_circularDependency_trueRun and tag-related import issues - Maintains batch import performance while restoring pre-batch-import validation contracts

gitar-bot · 2026-01-28T09:16:28Z

openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java

      List<EntityInterface> toUpdate = new ArrayList<>();
+      List<EntityInterface> originals = new ArrayList<>();

      for (PendingEntityOperation op : ops) {


⚠️ Edge Case: Missing originalEntity in update silently drops operation

Details

When an update operation has a null originalEntity, the code logs a warning but doesn't actually handle the entity:

} else { // Verify we have the original entity for update if (op.originalEntity != null) { toUpdate.add(op.entity); originals.add(op.originalEntity); } else { // Should not happen if createEntity logic is correct, but fallback safely LOG.warn( "Missing original entity for update operation: {}", op.entity.getFullyQualifiedName()); // Treat as potential create or individual fallback? // Safest is to let it fail or try individual update fallback } }

The entity is silently dropped from processing - no exception is thrown, no fallback is executed, and no failure is recorded to the CSV import results. This could lead to data loss where users think their update succeeded (since no error is reported in the import results) but the entity was never actually updated.

Consider either:

Adding the entity to a separate fallback list for individual processing

Recording an import failure for this row

Throwing an exception if this is truly unexpected

Was this helpful? React with 👍 / 👎

gitar-bot · 2026-01-28T09:16:29Z

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TeamRepository.java

+
+      // Validate hierarchy now that entity is pre-tracked
+      if (processRecord) {
+        TeamRepository repository = (TeamRepository) Entity.getEntityRepository(TEAM);


⚠️ Bug: Redundant processRecord check in overridden createEntity

Details

In the overridden createEntity method, processRecord is checked but this check is likely incorrect:

@Override protected void createEntity(CSVPrinter resultsPrinter, CSVRecord csvRecord, Team entity) throws IOException { // Validate hierarchy now that entity is pre-tracked if (processRecord) { // This check is redundant/suspicious TeamRepository repository = (TeamRepository) Entity.getEntityRepository(TEAM);

The createEntity method is only called when processRecord is already true (from the createRecord method). Checking processRecord again inside createEntity is redundant and could be confusing. Additionally, if processRecord is modified elsewhere between the call and this check, it could lead to inconsistent state.

Consider removing the redundant check or documenting why it's necessary.

Was this helpful? React with 👍 / 👎

… batched operations

- Add batch lock check to HierarchicalLockManager - Add batch cache write to EntityRepository - Fix createManyEntitiesForImport with batched operations - Fix updateManyEntitiesForImport with batched operations - Add change event creation in flushPendingEntityOperations

gitar-bot · 2026-01-29T03:27:12Z

🔍 CI failure analysis for c417d98: CSV import regression persists across backend (Java) and frontend (Playwright) tests. Python S3 test failure is infrastructure-related (AWS STS API version mismatch), NOT caused by PR changes. PR remains critically blocked by CSV import bug.

Issue

The CSV import regression PERSISTS across ALL TEST LEVELS after commit c417d98

Root Cause

Commit c417d98 added storeEntities() override to 50+ repositories but did NOT resolve the underlying entity tracking issue in CSV imports.

COMPREHENSIVE TEST FAILURE ANALYSIS

Integration Tests - Backend (Java)

PostgreSQL OpenSearch (Job 61823228366):

❌ 3 test failures (8459 tests run, 433 skipped)
❌ DatabaseSchemaResourceIT.testImportExportWithTableConstraints:1108 - CRITICAL
❌ 2 testCase CSV limitations (pre-existing)

MySQL Elasticsearch (Job 61823228381):

❌ 4 test failures (8459 tests run, 433 skipped)
❌ DatabaseSchemaResourceIT.testImportExportWithTableConstraints:1108 - CRITICAL
❌ 2 testCase CSV limitations (pre-existing)
❌ 1 DashboardResourceIT search index timeout (possible batch storage side effect)

E2E Tests - Frontend (Playwright)

Shard 2 (Job 61823228355):

❌ Bulk Import Export › Database service - FAILED (5.0m)
❌ Bulk Import Export › Database - FAILED (5.0m)
❌ Bulk Import Export › Database Schema - FAILED (4.0m)
❌ Test Case Bulk Import validation errors - FAILED
All failures occurred AFTER RETRIES

Shards 4 & 6 (Jobs 61823228347, 61823228346):

❌ Multiple generic test timeouts and element visibility failures
No CSV-specific failures in these shards

Shards 1 & 5:

✅ PASSED

Python Tests (Job 61823228333)

Python 3.10:

❌ 1 test failed: test_s3_storage.py::test_s3_ingestion
✅ 532 tests passed
⏭️ 21 tests skipped
Duration: 1h 17m

Error:

An error occurred (MissingParameter) when calling the ListMetrics operation: Invalid STS API version 2010-08-01, expecting 2011-06-15

Analysis: This is an AWS SDK/STS API version compatibility issue in the test infrastructure, NOT related to the PR changes:

The PR modifies only Java backend files (EntityCsv.java, EntityRepository.java, etc.)
No Python files are modified
The error is about AWS STS API version mismatch (boto3/botocore version issue)
This is an infrastructure/CI environment issue

Status: ⚠️ Infrastructure issue - NOT blocking (not caused by PR changes)

CRITICAL FINDING: CSV Import Regression Confirmed at ALL Java/Frontend Test Levels

Backend Integration Tests:

InvalidRequest: Table not found: postgresService_[...].constraint_test_schema.source_table Updated via CSV import.user_ref

Frontend E2E Tests:

Same CSV bulk import operations failing through the UI
Tests: Database service, Database, Database Schema
All involve CSV export and re-import flows

This confirms the CSV import regression affects BOTH the backend API layer AND the frontend UI layer, indicating a fundamental bug in the CSV import/export logic.

Analysis

Why did commit c417d98 fail to fix the issue?

The commit added storeEntities() methods across repositories for batch storage. However, the root cause is in the entity tracking and preparation logic in EntityCsv.java, not in the repository storage methods.

The actual problem:

Entities are being pre-tracked in dryRunCreatedEntities BEFORE prepareInternal() completes
This causes entities to be tracked with incorrect/incomplete fully qualified names
When foreign key constraints try to resolve references, they can't find the entities because the FQNs don't match
The "Updated via CSV import" suffix in the error message suggests the entity was updated, but the constraint resolver is looking for it with that suffix embedded in the FQN

Impact on E2E Tests:

The E2E tests exercise the full CSV import/export workflow through the UI:

Export entities to CSV
Potentially modify the CSV
Re-import the CSV
Verify the import succeeded

Since the backend CSV import logic is broken, the E2E tests fail when attempting to re-import the CSV files.

What c417d98 did:

Added batch storage methods (storeEntities()) to repositories
This helps with performance and batch operations
But it doesn't fix the entity tracking issue during CSV preparation
May have introduced a new search indexing timing issue (DashboardResourceIT timeout)

What's still broken:

Entity FQN tracking during CSV import
Foreign key constraint resolution finding entities by FQN
CSV import workflow through both API and UI
Potentially: Search indexing for batch-stored entities

Solution Required

The fix should address the entity tracking issue:

Ensure entities are tracked with correct FQNs after prepareInternal() completes successfully
Fix constraint resolution to properly find entities during CSV import (handle FQN variations, suffixes, etc.)
Review the entity preparation flow in EntityCsv.java to ensure entities are findable by their references
Investigate search indexing for batch-stored entities - ensure proper index updates are triggered

The batch storage improvements in c417d98 are good for performance, but they don't address the core issue with entity tracking and constraint resolution.

Test Results Summary

Integration Tests (Java):

PostgreSQL OpenSearch (61823228366):

✅ 8459 tests passed
⏭️ 433 tests skipped
❌ 3 tests failed
- 1 CRITICAL regression (DatabaseSchemaResourceIT - Table not found)
- 2 pre-existing limitations (testCase CSV not supported)

MySQL Elasticsearch (61823228381):

✅ 8459 tests passed
⏭️ 433 tests skipped
❌ 4 tests failed
- 1 CRITICAL regression (DatabaseSchemaResourceIT - Table not found)
- 2 pre-existing limitations (testCase CSV not supported)
- 1 NEW search index timeout (DashboardResourceIT)

E2E Tests (Playwright):

Shard 1 (61823228345):

✅ PASSED (36 minutes)

Shard 2 (61823228355):

❌ FAILED (1h 27m)
3 CSV Bulk Import Export tests failed (Database service, Database, Database Schema)
1 Test Case validation test failed

Shard 4 (61823228347):

❌ FAILED (1h 26m)
Multiple generic test timeouts/element not visible

Shard 5 (61823228351):

✅ PASSED (1h 22m)

Shard 6 (61823228346):

❌ FAILED (1h 27m)
Multiple generic test timeouts/element not visible

Python Tests:

Python 3.10 (61823228333):

✅ 532 tests passed
⏭️ 21 tests skipped
❌ 1 test failed (⚠️ Infrastructure: AWS STS API version mismatch - NOT blocking)

Impact

The PR remains CRITICALLY BLOCKED by CSV import regression:

Core CSV import regression NOT FIXED by commit c417d98
The "Table not found" error is confirmed across ALL database configurations (PostgreSQL/OpenSearch and MySQL/Elasticsearch)
E2E tests confirm the bug affects the UI layer - users cannot successfully import CSV files through the web interface
This is a fundamental bug in the CSV import logic that affects both backend API and frontend UI
The regression blocks all CSV import/export workflows including Database service, Database, and Database Schema entities
Potential new issue with search indexing for batch operations

Python S3 test failure is NOT blocking - it's an infrastructure issue (AWS STS API version mismatch) unrelated to the PR changes (no Python files modified).

Code Review ⚠️ Changes requested 0 resolved / 4 findings

Large batch import refactoring with storeEntities implementations across 50+ repositories. Previous code quality issues remain unaddressed: duplicate comment/assignment lines and silent data loss edge case for missing originalEntity.

⚠️

Bug: Redundant processRecord check in overridden createEntity

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TeamRepository.java:1082

In the overridden createEntity method, processRecord is checked but this check is likely incorrect:

@Override
protected void createEntity(CSVPrinter resultsPrinter, CSVRecord csvRecord, Team entity)
    throws IOException {

  // Validate hierarchy now that entity is pre-tracked
  if (processRecord) {  // This check is redundant/suspicious
    TeamRepository repository = (TeamRepository) Entity.getEntityRepository(TEAM);

The createEntity method is only called when processRecord is already true (from the createRecord method). Checking processRecord again inside createEntity is redundant and could be confusing. Additionally, if processRecord is modified elsewhere between the call and this check, it could lead to inconsistent state.

Consider removing the redundant check or documenting why it's necessary.

⚠️

Edge Case: Missing originalEntity in update silently drops operation

📄 openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:1241

When an update operation has a null originalEntity, the code logs a warning but doesn't actually handle the entity:

} else {
  // Verify we have the original entity for update
  if (op.originalEntity != null) {
    toUpdate.add(op.entity);
    originals.add(op.originalEntity);
  } else {
    // Should not happen if createEntity logic is correct, but fallback safely
    LOG.warn(
        "Missing original entity for update operation: {}",
        op.entity.getFullyQualifiedName());
    // Treat as potential create or individual fallback?
    // Safest is to let it fail or try individual update fallback
  }
}

The entity is silently dropped from processing - no exception is thrown, no fallback is executed, and no failure is recorded to the CSV import results. This could lead to data loss where users think their update succeeded (since no error is reported in the import results) but the entity was never actually updated.

Consider either:

Adding the entity to a separate fallback list for individual processing
Recording an import failure for this row
Throwing an exception if this is truly unexpected

💡 Bug: Duplicate responseStatus assignment in dry-run branch

📄 openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:1130

In the createEntity method overload, there's a duplicate line:

responseStatus = exists ? Response.Status.OK : Response.Status.CREATED;
responseStatus = exists ? Response.Status.OK : Response.Status.CREATED;

The same assignment appears twice consecutively. This is a copy-paste error that should be removed.

💡 Bug: Duplicate comment lines in createEntity method

📄 openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:1062

There's a duplicate comment in the dry-run branch:

// Track the dryRun created entities, as they may be referred by other entities being created
// during import
// Track the dryRun created entities, as they may be referred by other entities being created
// during import
dryRunCreatedEntities.put(entity.getFullyQualifiedName(), entity);

The same comment appears twice consecutively. This is a copy-paste error that should be cleaned up.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

yan-3005 requested a review from a team as a code owner January 28, 2026 09:08

yan-3005 had a problem deploying to test January 28, 2026 09:08 — with GitHub Actions Error

yan-3005 added the safe to test Add this label to run secure Github workflows on PRs label Jan 28, 2026

yan-3005 had a problem deploying to test January 28, 2026 09:08 — with GitHub Actions Failure

yan-3005 temporarily deployed to test January 28, 2026 09:08 — with GitHub Actions Inactive

gitar-bot bot reviewed Jan 28, 2026

View reviewed changes

improve storeRelationshipsInternal internal methods - make them truly…

f39d91e

… batched operations

sonika-shah had a problem deploying to test January 28, 2026 11:23 — with GitHub Actions Failure

sonika-shah temporarily deployed to test January 28, 2026 11:23 — with GitHub Actions Inactive

sonika-shah had a problem deploying to test January 28, 2026 11:23 — with GitHub Actions Failure

sonika-shah temporarily deployed to test January 28, 2026 11:23 — with GitHub Actions Inactive

sonika-shah temporarily deployed to test January 29, 2026 03:21 — with GitHub Actions Inactive

sonika-shah had a problem deploying to test January 29, 2026 03:21 — with GitHub Actions Failure

sonika-shah temporarily deployed to test January 29, 2026 03:21 — with GitHub Actions Inactive

sonika-shah had a problem deploying to test January 29, 2026 03:21 — with GitHub Actions Failure

sonika-shah temporarily deployed to test January 29, 2026 03:21 — with GitHub Actions Inactive

sonika-shah had a problem deploying to test January 29, 2026 03:21 — with GitHub Actions Failure

yan-3005 changed the title ~~Fix tag clearing and circular dependency detection in batch CSV imports~~ Optimised Internal methods for adding relationships for import, fixed circular dependency, generated changeEvents Jan 29, 2026

yan-3005 merged commit eb28a7b into import-export-improvements Jan 29, 2026
10 of 21 checks passed

yan-3005 deleted the ram-sonika/import-export-improvements branch January 29, 2026 06:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimised Internal methods for adding relationships for import, fixed circular dependency, generated changeEvents #25582

Optimised Internal methods for adding relationships for import, fixed circular dependency, generated changeEvents #25582

Uh oh!

yan-3005 commented Jan 28, 2026

Uh oh!

gitar-bot bot Jan 28, 2026

Uh oh!

gitar-bot bot Jan 28, 2026

Uh oh!

gitar-bot bot commented Jan 29, 2026 •

edited

Loading

Issue

Root Cause

COMPREHENSIVE TEST FAILURE ANALYSIS

Integration Tests - Backend (Java)

E2E Tests - Frontend (Playwright)

Python Tests (Job 61823228333)

CRITICAL FINDING: CSV Import Regression Confirmed at ALL Java/Frontend Test Levels

Analysis

Solution Required

Test Results Summary

Impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimised Internal methods for adding relationships for import, fixed circular dependency, generated changeEvents #25582

Optimised Internal methods for adding relationships for import, fixed circular dependency, generated changeEvents #25582

Uh oh!

Conversation

yan-3005 commented Jan 28, 2026

Describe your changes:

Type of change:

Checklist:

Uh oh!

gitar-bot bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Root Cause

COMPREHENSIVE TEST FAILURE ANALYSIS

Integration Tests - Backend (Java)

E2E Tests - Frontend (Playwright)

Python Tests (Job 61823228333)

CRITICAL FINDING: CSV Import Regression Confirmed at ALL Java/Frontend Test Levels

Analysis

Solution Required

Test Results Summary

Impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gitar-bot bot commented Jan 29, 2026 •

edited

Loading