Skip to content

Conversation

@thongdk8
Copy link
Contributor

@thongdk8 thongdk8 commented May 29, 2025

Description

This PR refactors the import processors in the data loader and removes the unnecessary usage of the import result status map, which is no longer needed.

Previously, the status map was created and persisted for the entire lifetime of the application. This led to continuous memory growth during large imports, potentially causing excessive memory usage or even out-of-memory (OOM) issues. By removing this map, we reduce memory consumption and improve the overall efficiency and stability of the import process.

Please take a look when you get a chance. Thank you!

Related issues and/or PRs

NA

Changes made

  • Move process method to parent class, as they are the same for import processors
  • Remove the usage of the status map for storing import results

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • I have considered whether similar issues could occur in other products, components, or modules if this PR is for bug fixes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

NA

Release notes

NA

@thongdk8 thongdk8 requested a review from Copilot May 29, 2025 02:05
@thongdk8 thongdk8 self-assigned this May 29, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the import processors in the data loader and removes the unnecessary usage of the import result status map, addressing potential memory issues when processing large imported datasets.

  • The process() method is moved to the parent ImportProcessor class and its signature is updated to no longer return a status map.
  • Test cases for JSON lines, JSON, and CSV import processors are updated to no longer expect or validate a status map.
  • Unused methods and references to ImportDataChunkStatus (including logging and event listener methods) are removed.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
JsonLinesImportProcessorTest.java Removed assertions on status map; updated tests to only assert no exception is thrown.
JsonImportProcessorTest.java Similar test updates as with JsonLinesImportProcessorTest.
CsvImportProcessorTest.java Similar test updates removing status map assertions.
JsonLinesImportProcessor.java Removed process() implementation that returned a status map and adjusted readDataChunks method.
JsonImportProcessor.java Removed process() implementation that returned a status map and updated readDataChunks.
ImportProcessor.java Updated process() signature to void and switched to passing dataChunk id instead of objects.
CsvImportProcessor.java Removed process() implementation and related concurrent processing code dealing with status map.
Various logger and manager classes Removed redundant addOrUpdateDataChunkStatus implementations and the status map in ImportManager.
Comments suppressed due to low confidence (3)

data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/JsonLinesImportProcessorTest.java:92

  • The test now only verifies that process() does not throw an exception. Consider adding assertions or verifications of side effects (e.g., log outputs or state changes) to ensure expected behavior.
Assertions.assertDoesNotThrow(() -> {

data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/JsonImportProcessorTest.java:92

  • The removal of status map assertions reduces the explicit validation of processing outcomes. If applicable, include additional verifications to confirm that the processor performs as expected.
Assertions.assertDoesNotThrow(() -> {

data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/CsvImportProcessorTest.java:92

  • Since the test no longer checks the output status map, it might help to add validations for side effects (e.g., file log summaries or state changes) that indicate correct processing.
Assertions.assertDoesNotThrow(() -> {

@ypeckstadt ypeckstadt changed the title Refactor import processors in the data loader, and remove the unnecessary usage of import result status map Refactor import logic and remove redundant status mapping May 29, 2025
Copy link
Contributor

@komamitsu komamitsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Copy link
Collaborator

@brfrn169 brfrn169 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you!

@thongdk8 thongdk8 requested a review from Torch3333 June 2, 2025 03:01
Copy link
Contributor

@ypeckstadt ypeckstadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

Copy link
Contributor

@inv-jishnu inv-jishnu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thank you!

Copy link
Contributor

@Torch3333 Torch3333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@ypeckstadt ypeckstadt merged commit 5b268f7 into master Jun 3, 2025
55 checks passed
@ypeckstadt ypeckstadt deleted the data-loader/ref/refactor-import-logic-and-remove-status-map branch June 3, 2025 04:30
feeblefakie pushed a commit that referenced this pull request Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants