Import export improvements #25542

harshach · 2026-01-27T05:20:21Z

Describe your changes:

Fixes

I worked on ... because ...

Summary by Gitar

This PR implements batched CSV import/export operations to significantly improve performance for large datasets:

Batch Processing: Queues database operations and flushes every 100 records using insertMany()/updateMany() instead of individual INSERT/UPDATE statements
Bulk Search Indexing: Replaces individual Elasticsearch updates with bulk API calls via updateEntitiesBulk()
Optimized Table Updates: Batches multiple column updates into a single PATCH operation per table instead of one per column
Progress Reporting: Added WebSocket-based real-time progress callbacks with UI progress bars for both import and export operations
Graceful Degradation: Falls back to individual operations if batch operations fail, ensuring reliability

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

...s/ui/src/components/Entity/EntityExportModalProvider/EntityExportModalProvider.component.tsx

… circular dependency, generated changeEvents (#25582) * Fix tag clearing and circular dependency detection in batch CSV imports - **Tag clearing fix**: Add deleteTagsByTarget before applying new tags in batch imports to match single entity import behavior, ensuring empty CSV fields properly clear existing tags - **Circular dependency detection fix**: Pre-track entities in dryRunCreatedEntities before parent resolution to enable proper circular reference validation during CSV team imports - Resolves test failures in TeamResourceIT.test_importCsv_circularDependency_trueRun and tag-related import issues - Maintains batch import performance while restoring pre-batch-import validation contracts * improve storeRelationshipsInternal internal methods - make them truly batched operations * - Add storeEntities override to all repositories (57 repos) - Add batch lock check to HierarchicalLockManager - Add batch cache write to EntityRepository - Fix createManyEntitiesForImport with batched operations - Fix updateManyEntitiesForImport with batched operations - Add change event creation in flushPendingEntityOperations --------- Co-authored-by: sonika-shah <[email protected]>

…ing import

gitar-bot · 2026-01-30T14:10:09Z

🔍 CI failure analysis for 8486d24: Maven SonarCloud CI (MySQL) shows 4 failures (99.9% pass rate) - same infrastructure issues as PostgreSQL CI. Test Report failed as consequence. Critical TeamResourceTest bug remains fixed across both database backends.

Issue

Maven SonarCloud CI (job 62003021906, MySQL backend) shows 1 failure and 3 errors out of 7836 tests (99.9% pass rate: 7831 passed, 1 failure, 3 errors, 701 skipped).

Root Cause

Maven Test Failures (Same as PostgreSQL CI):

AppsResourceTest.post_trigger_app_200 - Polling timeout (1 failure)
AwsCredentialsUtilTest - AWS credentials not configured (3 errors)
- testBuildCredentialsProviderWithOnlyAccessKey
- testBuildCredentialsProviderWithNoCredentials
- testBuildCredentialsProviderWithEmptyCredentials

All failures are infrastructure/configuration issues unrelated to CSV batching changes.

Details

Consistency Across Databases:

PostgreSQL CI (job 62003022199): 1 failure, 5 errors (includes 2 WorkflowDefinitionResourceTest errors)
MySQL CI (job 62003021906): 1 failure, 3 errors
Common failures: AppsResourceTest (1) + AwsCredentialsUtilTest (3)
Both backends: 99.9% pass rate

Critical Success Confirmed: The TeamResourceTest.testTeamImportExport bug is not in the failure list for either database backend, confirming commit 8486d24 successfully fixed the hierarchical entity resolution issue across both MySQL and PostgreSQL.

Test Report (job 62025034391): Failed as downstream consequence of Maven test failures. This is a reporting/aggregation job, not a separate test suite.

Complete CI Status:

✅ Maven (both backends): 99.9% pass rate - infrastructure failures only
✅ Python: 99.8% pass rate - S3 infrastructure issue
⚠️ UI Coverage: 99.9% - test maintenance needed
⚠️ E2E: 98.9% - 2 timeouts, 4 unrelated

Impact: All Maven failures are infrastructure issues, not blocking for CSV batching functionality.

Code Review 👍 Approved with suggestions 4 resolved / 5 findings

Solid batched CSV import/export implementation. The previously identified unused batchNumber parameter in CsvImportProgressCallback remains unresolved - the parameter is captured in the lambda but not passed to sendCsvImportProgressNotification.

💡 Edge Case: Unused batchNumber parameter in CsvImportProgressCallback

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/EntityResource.java:887

The CsvImportProgressCallback interface defines:

void onProgress(int rowsProcessed, int totalRows, int batchNumber, String message);

However, in EntityResource.java, when the callback is created (line ~892), the batchNumber parameter is ignored:

CsvImportProgressCallback progressCallback =
    (rowsProcessed, totalRows, batchNumber, message) ->
        WebsocketNotificationHandler.sendCsvImportProgressNotification(
            jobId, securityContext, rowsProcessed, totalRows, message);  // batchNumber not passed

This means the batch number information is lost when sending WebSocket notifications, even though it's available.

Suggested fix: Either pass batchNumber to the WebSocket notification handler if useful for the UI, or simplify the callback interface to remove the unused parameter.

✅ 4 resolved

✅ Bug: PendingEntityOperation records deleted CSV failures incorrectly

📄 openmetadata-csv/src/main/java/org/openmetadata/csv/EntityCsv.java:533
In flushPendingTableUpdates(), when a batch patch fails, the code updates import statistics:
for (CSVRecord record : context.csvRecords) {
  importResult.withNumberOfRowsPassed(importResult.getNumberOfRowsPassed() - 1);
  importResult.withNumberOfRowsFailed(importResult.getNumberOfRowsFailed() + 1);
}
However, importSuccess() was already called for these records earlier (line ~479), meaning each record was already counted as "passed". When the batch fails, this code decrements numberOfRowsPassed, but the original success message was already written to the CSV results.

This creates inconsistency:

The CSV output shows success for rows that actually failed

The summary numbers will be inconsistent with the actual CSV results output

Suggested fix: Either defer calling importSuccess() until after flushPendingTableUpdates() succeeds, or update the result CSV when failures occur in flush.

✅ Bug: Missing change event generation in batch entity operations

📄 openmetadata-csv/src/main/java/org/openmetadata/csv/EntityCsv.java:202
In flushPendingEntityOperations() (EntityCsv.java), entities are inserted/updated in batches via insertMany() and updateMany(). However, the code only queues entities for search index updates and skips change event generation entirely.

The original code in createEntity() generated change events via createChangeEventAndUpdateInES() for each entity, but the batch path bypasses this. This means:

No ChangeEvent records are persisted for batch-created/updated entities

Event-driven systems relying on change events won't be notified

Audit trails may be incomplete

Suggested fix: Generate and persist change events for batch operations. Either:

Generate change events in flushPendingEntityOperations() before queuing for ES

Or modify the fallback path to ensure change events are created when createOrUpdate() is called

✅ Bug: Entity version not incremented on batch updates

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:1291
In flushPendingEntityOperations(), batch updates call dao.updateMany() directly, but this bypasses the entity versioning logic that normally happens in EntityRepository.createOrUpdate().

Looking at updateManyEntitiesForImport() (EntityRepository.java:1285-1305), the version is simply copied from the original without incrementing:
updated.setVersion(original.getVersion());
This means entities updated via batch import won't have their version incremented, which could cause:

Optimistic locking issues if the entity is later updated normally

Inconsistent version history

Potential data overwrite conflicts

Suggested fix: Increment the version using EntityUtil.nextVersion() or similar logic as done in normal update paths.

✅ Bug: Division by zero possible in progress calculation

📄 openmetadata-ui/src/main/resources/ui/src/pages/EntityImport/BulkEntityImportPage/BulkEntityImportPage.tsx:2396 📄 openmetadata-ui/src/main/resources/ui/src/components/Entity/EntityExportModalProvider/EntityExportModalProvider.component.tsx:319
In the frontend BulkEntityImportPage.tsx, the progress percentage is calculated as:
percent={Math.round(((activeAsyncImportJob.progress ?? 0) / activeAsyncImportJob.total) * 100)}
While there is a check activeAsyncImportJob.total > 0, this is in the rendering condition for the Progress component itself, but the calculation still happens. If total is 0 or undefined in a race condition before the check evaluates, this could cause a division by zero resulting in NaN or Infinity.

Similarly, in EntityExportModalProvider.component.tsx:
percent={Math.round((csvExportJob.progress / csvExportJob.total) * 100)}
Suggested fix: Add a safeguard to the calculation itself:
percent={Math.round(((activeAsyncImportJob.progress ?? 0) / Math.max(activeAsyncImportJob.total || 1, 1)) * 100)}

Rules ✅ All requirements met

Gitar Rules

✅ Summary Enhancement: PR description includes comprehensive technical summary with batching details

_{2 rules not applicable. Show all rules by commenting gitar display:verbose.}

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

harshach added 2 commits January 26, 2026 20:00

Batched Import

cdae7c4

Batched Import

c403a9e

harshach requested review from a team as code owners January 27, 2026 05:20

harshach had a problem deploying to test January 27, 2026 05:20 — with GitHub Actions Error

harshach temporarily deployed to test January 27, 2026 05:20 — with GitHub Actions Inactive

harshach had a problem deploying to test January 27, 2026 05:20 — with GitHub Actions Error

github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jan 27, 2026

Batched Import

e54eda2

harshach temporarily deployed to test January 27, 2026 06:04 — with GitHub Actions Inactive

harshach had a problem deploying to test January 27, 2026 06:04 — with GitHub Actions Failure

harshach temporarily deployed to test January 27, 2026 06:04 — with GitHub Actions Inactive

gitar-bot bot reviewed Jan 27, 2026

View reviewed changes

...s/ui/src/components/Entity/EntityExportModalProvider/EntityExportModalProvider.component.tsx Show resolved Hide resolved

yan-3005 had a problem deploying to test January 29, 2026 06:00 — with GitHub Actions Error

have default implementation of storeEntities

d979bfa

sonika-shah temporarily deployed to test January 29, 2026 06:29 — with GitHub Actions Inactive

sonika-shah had a problem deploying to test January 29, 2026 06:29 — with GitHub Actions Failure

yan-3005 added 2 commits January 29, 2026 20:34

fix: ensure custom properties persist during batch CSV imports

c806034

Fix rows processed count

49502c9

yan-3005 temporarily deployed to test January 29, 2026 16:15 — with GitHub Actions Inactive

yan-3005 had a problem deploying to test January 29, 2026 16:15 — with GitHub Actions Error

yan-3005 had a problem deploying to test January 29, 2026 16:15 — with GitHub Actions Failure

yan-3005 had a problem deploying to test January 29, 2026 16:15 — with GitHub Actions Error

Fix: Moving of glossaryTerms to correctly detect update operation dur…

23e6020

…ing import

yan-3005 had a problem deploying to test January 29, 2026 17:39 — with GitHub Actions Error

Increment version history

d037d7e

yan-3005 temporarily deployed to test January 29, 2026 17:46 — with GitHub Actions Inactive

yan-3005 had a problem deploying to test January 29, 2026 17:46 — with GitHub Actions Failure

yan-3005 temporarily deployed to test January 29, 2026 17:46 — with GitHub Actions Inactive

yan-3005 had a problem deploying to test January 29, 2026 17:46 — with GitHub Actions Failure

yan-3005 temporarily deployed to test January 29, 2026 17:46 — with GitHub Actions Inactive

fix TeamResourceTest

8486d24

sonika-shah temporarily deployed to test January 30, 2026 14:05 — with GitHub Actions Inactive

sonika-shah had a problem deploying to test January 30, 2026 14:05 — with GitHub Actions Failure

sonika-shah temporarily deployed to test January 30, 2026 14:05 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import export improvements #25542

Import export improvements #25542

Uh oh!

harshach commented Jan 27, 2026 •

edited by gitar-bot bot

Loading

Uh oh!

Uh oh!

gitar-bot bot commented Jan 30, 2026 •

edited

Loading

Issue

Root Cause

Details

Gitar Rules

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Import export improvements #25542

Are you sure you want to change the base?

Import export improvements #25542

Uh oh!

Conversation

harshach commented Jan 27, 2026 • edited by gitar-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Summary by Gitar

Type of change:

Checklist:

Uh oh!

Uh oh!

gitar-bot bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Root Cause

Details

Gitar Rules

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

harshach commented Jan 27, 2026 •

edited by gitar-bot bot

Loading

gitar-bot bot commented Jan 30, 2026 •

edited

Loading