-
Notifications
You must be signed in to change notification settings - Fork 40
Refactor import logic and remove redundant status mapping #2709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor import logic and remove redundant status mapping #2709
Conversation
…ary usage of import restul status map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the import processors in the data loader and removes the unnecessary usage of the import result status map, addressing potential memory issues when processing large imported datasets.
- The process() method is moved to the parent ImportProcessor class and its signature is updated to no longer return a status map.
- Test cases for JSON lines, JSON, and CSV import processors are updated to no longer expect or validate a status map.
- Unused methods and references to ImportDataChunkStatus (including logging and event listener methods) are removed.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| JsonLinesImportProcessorTest.java | Removed assertions on status map; updated tests to only assert no exception is thrown. |
| JsonImportProcessorTest.java | Similar test updates as with JsonLinesImportProcessorTest. |
| CsvImportProcessorTest.java | Similar test updates removing status map assertions. |
| JsonLinesImportProcessor.java | Removed process() implementation that returned a status map and adjusted readDataChunks method. |
| JsonImportProcessor.java | Removed process() implementation that returned a status map and updated readDataChunks. |
| ImportProcessor.java | Updated process() signature to void and switched to passing dataChunk id instead of objects. |
| CsvImportProcessor.java | Removed process() implementation and related concurrent processing code dealing with status map. |
| Various logger and manager classes | Removed redundant addOrUpdateDataChunkStatus implementations and the status map in ImportManager. |
Comments suppressed due to low confidence (3)
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/JsonLinesImportProcessorTest.java:92
- The test now only verifies that process() does not throw an exception. Consider adding assertions or verifications of side effects (e.g., log outputs or state changes) to ensure expected behavior.
Assertions.assertDoesNotThrow(() -> {
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/JsonImportProcessorTest.java:92
- The removal of status map assertions reduces the explicit validation of processing outcomes. If applicable, include additional verifications to confirm that the processor performs as expected.
Assertions.assertDoesNotThrow(() -> {
data-loader/core/src/test/java/com/scalar/db/dataloader/core/dataimport/processor/CsvImportProcessorTest.java:92
- Since the test no longer checks the output status map, it might help to add validations for side effects (e.g., file log summaries or state changes) that indicate correct processing.
Assertions.assertDoesNotThrow(() -> {
komamitsu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
brfrn169
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
…remove-status-map
ypeckstadt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you.
inv-jishnu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thank you!
…remove-status-map
Torch3333
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Co-authored-by: Peckstadt Yves <[email protected]>
Description
This PR refactors the import processors in the data loader and removes the unnecessary usage of the import result status map, which is no longer needed.
Previously, the status map was created and persisted for the entire lifetime of the application. This led to continuous memory growth during large imports, potentially causing excessive memory usage or even out-of-memory (OOM) issues. By removing this map, we reduce memory consumption and improve the overall efficiency and stability of the import process.
Please take a look when you get a chance. Thank you!
Related issues and/or PRs
NA
Changes made
processmethod to parent class, as they are the same for import processorsChecklist
Additional notes (optional)
NA
Release notes
NA