Skip to content

Conversation

@yan-3005
Copy link
Contributor

  • Tag clearing fix: Add deleteTagsByTarget before applying new tags in batch imports to match single entity import behavior, ensuring empty CSV fields properly clear existing tags
  • Circular dependency detection fix: Pre-track entities in dryRunCreatedEntities before parent resolution to enable proper circular reference validation during CSV team imports
  • Resolves test failures in TeamResourceIT.test_importCsv_circularDependency_trueRun and tag-related import issues
  • Maintains batch import performance while restoring pre-batch-import validation contracts

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

  - **Tag clearing fix**: Add deleteTagsByTarget before applying new tags in batch imports to match single entity import behavior, ensuring empty CSV fields properly clear existing
   tags
  - **Circular dependency detection fix**: Pre-track entities in dryRunCreatedEntities before parent resolution to enable proper circular reference validation during CSV team
  imports
  - Resolves test failures in TeamResourceIT.test_importCsv_circularDependency_trueRun and tag-related import issues
  - Maintains batch import performance while restoring pre-batch-import validation contracts
List<EntityInterface> toUpdate = new ArrayList<>();
List<EntityInterface> originals = new ArrayList<>();

for (PendingEntityOperation op : ops) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Edge Case: Missing originalEntity in update silently drops operation

Details

When an update operation has a null originalEntity, the code logs a warning but doesn't actually handle the entity:

} else {
  // Verify we have the original entity for update
  if (op.originalEntity != null) {
    toUpdate.add(op.entity);
    originals.add(op.originalEntity);
  } else {
    // Should not happen if createEntity logic is correct, but fallback safely
    LOG.warn(
        "Missing original entity for update operation: {}",
        op.entity.getFullyQualifiedName());
    // Treat as potential create or individual fallback?
    // Safest is to let it fail or try individual update fallback
  }
}

The entity is silently dropped from processing - no exception is thrown, no fallback is executed, and no failure is recorded to the CSV import results. This could lead to data loss where users think their update succeeded (since no error is reported in the import results) but the entity was never actually updated.

Consider either:

  1. Adding the entity to a separate fallback list for individual processing
  2. Recording an import failure for this row
  3. Throwing an exception if this is truly unexpected

Was this helpful? React with 👍 / 👎


// Validate hierarchy now that entity is pre-tracked
if (processRecord) {
TeamRepository repository = (TeamRepository) Entity.getEntityRepository(TEAM);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Redundant processRecord check in overridden createEntity

Details

In the overridden createEntity method, processRecord is checked but this check is likely incorrect:

@Override
protected void createEntity(CSVPrinter resultsPrinter, CSVRecord csvRecord, Team entity)
    throws IOException {

  // Validate hierarchy now that entity is pre-tracked
  if (processRecord) {  // This check is redundant/suspicious
    TeamRepository repository = (TeamRepository) Entity.getEntityRepository(TEAM);

The createEntity method is only called when processRecord is already true (from the createRecord method). Checking processRecord again inside createEntity is redundant and could be confusing. Additionally, if processRecord is modified elsewhere between the call and this check, it could lead to inconsistent state.

Consider removing the redundant check or documenting why it's necessary.


Was this helpful? React with 👍 / 👎

-  Add batch lock check to HierarchicalLockManager
- Add batch cache write to EntityRepository
-  Fix createManyEntitiesForImport with batched operations
- Fix updateManyEntitiesForImport with batched operations
-  Add change event creation in flushPendingEntityOperations
@gitar-bot
Copy link

gitar-bot bot commented Jan 29, 2026

🔍 CI failure analysis for c417d98: CSV import regression persists across backend (Java) and frontend (Playwright) tests. Python S3 test failure is infrastructure-related (AWS STS API version mismatch), NOT caused by PR changes. PR remains critically blocked by CSV import bug.

Issue

The CSV import regression PERSISTS across ALL TEST LEVELS after commit c417d98

Root Cause

Commit c417d98 added storeEntities() override to 50+ repositories but did NOT resolve the underlying entity tracking issue in CSV imports.

COMPREHENSIVE TEST FAILURE ANALYSIS

Integration Tests - Backend (Java)

PostgreSQL OpenSearch (Job 61823228366):

  • ❌ 3 test failures (8459 tests run, 433 skipped)
  • ❌ DatabaseSchemaResourceIT.testImportExportWithTableConstraints:1108 - CRITICAL
  • ❌ 2 testCase CSV limitations (pre-existing)

MySQL Elasticsearch (Job 61823228381):

  • ❌ 4 test failures (8459 tests run, 433 skipped)
  • ❌ DatabaseSchemaResourceIT.testImportExportWithTableConstraints:1108 - CRITICAL
  • ❌ 2 testCase CSV limitations (pre-existing)
  • ❌ 1 DashboardResourceIT search index timeout (possible batch storage side effect)

E2E Tests - Frontend (Playwright)

Shard 2 (Job 61823228355):

  • Bulk Import Export › Database service - FAILED (5.0m)
  • Bulk Import Export › Database - FAILED (5.0m)
  • Bulk Import Export › Database Schema - FAILED (4.0m)
  • ❌ Test Case Bulk Import validation errors - FAILED
  • All failures occurred AFTER RETRIES

Shards 4 & 6 (Jobs 61823228347, 61823228346):

  • ❌ Multiple generic test timeouts and element visibility failures
  • No CSV-specific failures in these shards

Shards 1 & 5:

  • ✅ PASSED

Python Tests (Job 61823228333)

Python 3.10:

  • ❌ 1 test failed: test_s3_storage.py::test_s3_ingestion
  • ✅ 532 tests passed
  • ⏭️ 21 tests skipped
  • Duration: 1h 17m

Error:

An error occurred (MissingParameter) when calling the ListMetrics operation: Invalid STS API version 2010-08-01, expecting 2011-06-15

Analysis: This is an AWS SDK/STS API version compatibility issue in the test infrastructure, NOT related to the PR changes:

  • The PR modifies only Java backend files (EntityCsv.java, EntityRepository.java, etc.)
  • No Python files are modified
  • The error is about AWS STS API version mismatch (boto3/botocore version issue)
  • This is an infrastructure/CI environment issue

Status: ⚠️ Infrastructure issue - NOT blocking (not caused by PR changes)

CRITICAL FINDING: CSV Import Regression Confirmed at ALL Java/Frontend Test Levels

Backend Integration Tests:

InvalidRequest: Table not found: postgresService_[...].constraint_test_schema.source_table Updated via CSV import.user_ref

Frontend E2E Tests:

  • Same CSV bulk import operations failing through the UI
  • Tests: Database service, Database, Database Schema
  • All involve CSV export and re-import flows

This confirms the CSV import regression affects BOTH the backend API layer AND the frontend UI layer, indicating a fundamental bug in the CSV import/export logic.

Analysis

Why did commit c417d98 fail to fix the issue?

The commit added storeEntities() methods across repositories for batch storage. However, the root cause is in the entity tracking and preparation logic in EntityCsv.java, not in the repository storage methods.

The actual problem:

  1. Entities are being pre-tracked in dryRunCreatedEntities BEFORE prepareInternal() completes
  2. This causes entities to be tracked with incorrect/incomplete fully qualified names
  3. When foreign key constraints try to resolve references, they can't find the entities because the FQNs don't match
  4. The "Updated via CSV import" suffix in the error message suggests the entity was updated, but the constraint resolver is looking for it with that suffix embedded in the FQN

Impact on E2E Tests:

The E2E tests exercise the full CSV import/export workflow through the UI:

  1. Export entities to CSV
  2. Potentially modify the CSV
  3. Re-import the CSV
  4. Verify the import succeeded

Since the backend CSV import logic is broken, the E2E tests fail when attempting to re-import the CSV files.

What c417d98 did:

  • Added batch storage methods (storeEntities()) to repositories
  • This helps with performance and batch operations
  • But it doesn't fix the entity tracking issue during CSV preparation
  • May have introduced a new search indexing timing issue (DashboardResourceIT timeout)

What's still broken:

  • Entity FQN tracking during CSV import
  • Foreign key constraint resolution finding entities by FQN
  • CSV import workflow through both API and UI
  • Potentially: Search indexing for batch-stored entities

Solution Required

The fix should address the entity tracking issue:

  1. Ensure entities are tracked with correct FQNs after prepareInternal() completes successfully
  2. Fix constraint resolution to properly find entities during CSV import (handle FQN variations, suffixes, etc.)
  3. Review the entity preparation flow in EntityCsv.java to ensure entities are findable by their references
  4. Investigate search indexing for batch-stored entities - ensure proper index updates are triggered

The batch storage improvements in c417d98 are good for performance, but they don't address the core issue with entity tracking and constraint resolution.

Test Results Summary

Integration Tests (Java):

PostgreSQL OpenSearch (61823228366):

  • ✅ 8459 tests passed
  • ⏭️ 433 tests skipped
  • ❌ 3 tests failed
    • 1 CRITICAL regression (DatabaseSchemaResourceIT - Table not found)
    • 2 pre-existing limitations (testCase CSV not supported)

MySQL Elasticsearch (61823228381):

  • ✅ 8459 tests passed
  • ⏭️ 433 tests skipped
  • ❌ 4 tests failed
    • 1 CRITICAL regression (DatabaseSchemaResourceIT - Table not found)
    • 2 pre-existing limitations (testCase CSV not supported)
    • 1 NEW search index timeout (DashboardResourceIT)

E2E Tests (Playwright):

Shard 1 (61823228345):

  • ✅ PASSED (36 minutes)

Shard 2 (61823228355):

  • ❌ FAILED (1h 27m)
  • 3 CSV Bulk Import Export tests failed (Database service, Database, Database Schema)
  • 1 Test Case validation test failed

Shard 4 (61823228347):

  • ❌ FAILED (1h 26m)
  • Multiple generic test timeouts/element not visible

Shard 5 (61823228351):

  • ✅ PASSED (1h 22m)

Shard 6 (61823228346):

  • ❌ FAILED (1h 27m)
  • Multiple generic test timeouts/element not visible

Python Tests:

Python 3.10 (61823228333):

  • ✅ 532 tests passed
  • ⏭️ 21 tests skipped
  • ❌ 1 test failed (⚠️ Infrastructure: AWS STS API version mismatch - NOT blocking)

Impact

The PR remains CRITICALLY BLOCKED by CSV import regression:

  • Core CSV import regression NOT FIXED by commit c417d98
  • The "Table not found" error is confirmed across ALL database configurations (PostgreSQL/OpenSearch and MySQL/Elasticsearch)
  • E2E tests confirm the bug affects the UI layer - users cannot successfully import CSV files through the web interface
  • This is a fundamental bug in the CSV import logic that affects both backend API and frontend UI
  • The regression blocks all CSV import/export workflows including Database service, Database, and Database Schema entities
  • Potential new issue with search indexing for batch operations

Python S3 test failure is NOT blocking - it's an infrastructure issue (AWS STS API version mismatch) unrelated to the PR changes (no Python files modified).

Code Review ⚠️ Changes requested 0 resolved / 4 findings

Large batch import refactoring with storeEntities implementations across 50+ repositories. Previous code quality issues remain unaddressed: duplicate comment/assignment lines and silent data loss edge case for missing originalEntity.

⚠️ Bug: Redundant processRecord check in overridden createEntity

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TeamRepository.java:1082

In the overridden createEntity method, processRecord is checked but this check is likely incorrect:

@Override
protected void createEntity(CSVPrinter resultsPrinter, CSVRecord csvRecord, Team entity)
    throws IOException {

  // Validate hierarchy now that entity is pre-tracked
  if (processRecord) {  // This check is redundant/suspicious
    TeamRepository repository = (TeamRepository) Entity.getEntityRepository(TEAM);

The createEntity method is only called when processRecord is already true (from the createRecord method). Checking processRecord again inside createEntity is redundant and could be confusing. Additionally, if processRecord is modified elsewhere between the call and this check, it could lead to inconsistent state.

Consider removing the redundant check or documenting why it's necessary.

⚠️ Edge Case: Missing originalEntity in update silently drops operation

📄 openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:1241

When an update operation has a null originalEntity, the code logs a warning but doesn't actually handle the entity:

} else {
  // Verify we have the original entity for update
  if (op.originalEntity != null) {
    toUpdate.add(op.entity);
    originals.add(op.originalEntity);
  } else {
    // Should not happen if createEntity logic is correct, but fallback safely
    LOG.warn(
        "Missing original entity for update operation: {}",
        op.entity.getFullyQualifiedName());
    // Treat as potential create or individual fallback?
    // Safest is to let it fail or try individual update fallback
  }
}

The entity is silently dropped from processing - no exception is thrown, no fallback is executed, and no failure is recorded to the CSV import results. This could lead to data loss where users think their update succeeded (since no error is reported in the import results) but the entity was never actually updated.

Consider either:

  1. Adding the entity to a separate fallback list for individual processing
  2. Recording an import failure for this row
  3. Throwing an exception if this is truly unexpected
💡 Bug: Duplicate responseStatus assignment in dry-run branch

📄 openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:1130

In the createEntity method overload, there's a duplicate line:

responseStatus = exists ? Response.Status.OK : Response.Status.CREATED;
responseStatus = exists ? Response.Status.OK : Response.Status.CREATED;

The same assignment appears twice consecutively. This is a copy-paste error that should be removed.

💡 Bug: Duplicate comment lines in createEntity method

📄 openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java:1062

There's a duplicate comment in the dry-run branch:

// Track the dryRun created entities, as they may be referred by other entities being created
// during import
// Track the dryRun created entities, as they may be referred by other entities being created
// during import
dryRunCreatedEntities.put(entity.getFullyQualifiedName(), entity);

The same comment appears twice consecutively. This is a copy-paste error that should be cleaned up.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@yan-3005 yan-3005 changed the title Fix tag clearing and circular dependency detection in batch CSV imports Optimised Internal methods for adding relationships for import, fixed circular dependency, generated changeEvents Jan 29, 2026
@yan-3005 yan-3005 merged commit eb28a7b into import-export-improvements Jan 29, 2026
10 of 21 checks passed
@yan-3005 yan-3005 deleted the ram-sonika/import-export-improvements branch January 29, 2026 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants