Skip to content

Conversation

@voonhous
Copy link
Member

@voonhous voonhous commented Jan 5, 2026

Describe the issue this Pull Request addresses

The current implementation of HoodieLogFormat.WriterBuilder is convoluted:

  1. Code Smell: The builder is implemented as a heavy, manual inner class within the HoodieLogFormat interface.
  2. Reflection Usage: It uses ReflectionUtils to load the default writer implementation via a String class name, which is brittle and bypasses compile-time checks, a change introduced in #11207 to decouple hudi-common with hadoop dependencies
  3. Maintenance: Any new field requires manual updates to both the builder methods and the instantiation logic.

This PR refactors the log writer to use Lombok's @Builder, standardizing the fluent API and improving type safety across the codebase.

Summary and Changelog

This refactor simplifies the construction of HoodieLogFormat writers by leveraging Lombok and improving the class hierarchy.

Key Changes:

  1. Refactored HoodieLogFormat.Writer from an interface to an abstract class to centralize shared fields and construction logic.
  2. Moved validation, default value assignment, and log version computation into the Writer base constructor. This ensures that all writer implementations follow the same versioning and path-generation logic.
  3. Applied Lombok @Builder to the HoodieLogFormatWriter constructor, replacing the manual WriterBuilder.
  4. Eliminated reflection-based instantiation, favoring direct constructor calls for better performance and safety.
  5. Updated call sites across the project (client, spark, flink, utilities) to use the new builder syntax (e.g., standardized on the with prefix).
  6. Updated existing tests and renamed TestHoodieLogWriterBuilder to TestHoodieLogFormatWriterBuilder to reflect the new structure.

Impact

  1. Internal API Change: Developers manually building a log writer will see changes in method names (e.g., onParentPath is now withParentPath).
  2. Type Safety: The builder is now type-aware, preventing runtime failures previously possible with the reflection-based approach.
  3. Codebase Health: Significant reduction in boilerplate code in HoodieLogFormat.

Risk Level

Low. This is a structural refactor. The core logic for log writing and versioning remains unchanged, just relocated to the base class constructor.

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Jan 5, 2026
@voonhous voonhous force-pushed the lombokify-builders-HoodieLogFormat branch 2 times, most recently from d21169c to fee094e Compare January 5, 2026 09:57
@apache apache deleted a comment from hudi-bot Jan 5, 2026
@voonhous voonhous force-pushed the lombokify-builders-HoodieLogFormat branch from fee094e to 342a043 Compare January 8, 2026 09:31
@voonhous voonhous force-pushed the lombokify-builders-HoodieLogFormat branch from 342a043 to 857cb70 Compare January 11, 2026 11:20
@voonhous voonhous requested a review from CTTY January 14, 2026 17:17
Copy link
Contributor

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @voonhous , thanks for the PR and it mostly looks good! I've left some comments

) throws IOException {
super(bufferSize, storage, parentPath, logFileId, fileExtension, instantTime, logVersion, logWriteToken,
suffix, fileLen, sizeThreshold, fileCreationCallback, tableVersion);
this.outputStream = outputStream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right, outputStream should be initialized lazily rather than set like this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good catch. Removed this and added a comment.

protected Integer logVersion;
// file len of this log file
private Long fileLen = 0L;
protected Long fileLen = 0L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use fileSize not fileLen since most APIs like HoodieLogFile still refer to it as getFileSize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, we should standardise the naming. In the other hudi-common Lombok refactoring, i've changed them to getFileSize too, so this makes sense.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants