HDDS-14246. Change fsync boundary for FilePerBlockStrategy to block level by ivandika3 · Pull Request #9570 · apache/ozone

ivandika3 · 2025-12-29T08:53:16Z

What changes were proposed in this pull request?

Currently, datanode has an option to flush the write on chunk boundary (hdds.container.chunk.write.sync) which is disabled by default since it might affect the DN write throughput and latency. However, disabling this means that if the datanode machine is suddenly down (e.g. power failure, reaped by OOM killer), this might cause the file to have incomplete data even if PutBlock (write commit) is successful which violates our durability guarantee. Although PutBlock triggers FilePerBlockStrategy#finishWriteChunks which will trigger close (RandomAccessFile#close), the buffer cache might not be flushed yet since closing a file does not imply that the buffer cache for the file is flushed (see https://man7.org/linux/man-pages/man2/close.2.html). So there might be a chance where the user's key is committed, but the data do not exist in datanodes.

However, flushing for every WriteChunk might cause unnecessary overhead. We might need to consider calling FileChannel#force on PutBlock instead of WriteChunk since the data is only visible for users when PutBlock returns successfully (the data is committed) and for failure the client will try to replace the block (allocate another block). Therefore, we can guarantee that the after user successfully uploaded the key, the data has been persistently stored in the leader and at least one follower promise to flush the data (MAJORITY_COMMITTED).

This might still affect the write throughput and latency due to waiting for the buffer cached to be flushed to persistent storage (ssd or disk), but will increase our data durability guarantee (which should be our priority). Flushing the buffer cache might also reduce the memory usage of datanode.

In the future, we should consider enabling hdds.container.chunk.write.sync by default.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14246

How was this patch tested?

CI when sync is enabled (https://github.com/ivandika3/ozone/actions/runs/20535392231)

…evel

rich7420 · 2025-12-30T03:56:30Z

thanks @ivandika3 for the patch!

rich7420 · 2025-12-30T03:52:57Z

hadoop-hdds/common/src/main/resources/ozone-default.xml

+      in the container happen as sync I/0 or buffered I/O operation. For FilePerBlockStrategy, this
+      the sync I/O operation only happens before block file is closed.


Suggested change

in the container happen as sync I/0 or buffered I/O operation. For FilePerBlockStrategy, this

the sync I/O operation only happens before block file is closed.

in the container happen as sync I/O or buffered I/O operation. For FilePerBlockStrategy, this

sync I/O operation only happens before block file is closed.

Thanks, updated.

swamirishi · 2026-01-05T17:44:08Z

@vyalamar Do you wanna take a look at this patch?

swamirishi · 2026-01-05T17:44:38Z

@rnblough Do you wanna take a look at this issue?

siddhantsangwan

@ivandika3 Thanks for working on this. I agree with the overall idea, will do a deeper review soon.

siddhantsangwan

What about the Ratis streaming writes? Will this change also affect that code path and do we need any handling there? CC @szetszwo

Please add some tests to verify this change.

siddhantsangwan · 2026-01-07T10:04:27Z

...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java

        if (eob) {
          chunkManager.finishWriteChunks(kvContainer, blockData);
        }


Do we also need to sync in the else case here, when eob is false? Similar to the else case that you added in handlePutBlock.

FilePerBlockStrategy#finishWriteChunks calls FilePerBlockStrategy.OpenFiles#close that will trigger FilePerBlockStrategy.OpenFiles#close which calls OpenFile#close which will sync before closing the block file.

siddhantsangwan · 2026-01-07T10:14:25Z

hadoop-hdds/common/src/main/resources/ozone-default.xml

+      in the container happen as sync I/0 or buffered I/O operation. For FilePerBlockStrategy, this
+      the sync I/O operation only happens before block file is closed.


Remove "the".

For FilePerBlockStrategy, this the sync

ivandika3 · 2026-01-08T05:30:57Z

What about the Ratis streaming writes? Will this change also affect that code path and do we need any handling there? CC @szetszwo

Streaming Write Pipeline sync is triggered by client and I have made it configurable in #9533 through ozone.client.datastream.sync.size configuration. In the future, we might need to revisit this. I expect this require 1) A way of keep track of the DataChannel (KeyValueStreamDataChannel) in DN 2) some logic to differentiate whether to get the FileChannel from FilePerBlockStrategy#OpenFiles. Or we can make StandardWriteOption.CLOSE to also trigger sync.

Please add some tests to verify this change.

Let me think about this. This requires some fault injection to trigger datanode crash just after PutBlock. Let me check if we can use byteman under ozone-fi for this.

github-actions · 2026-01-31T00:07:07Z

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

github-actions · 2026-02-24T00:08:46Z

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

szetszwo · 2026-02-24T01:09:34Z

... Although PutBlock triggers FilePerBlockStrategy#finishWriteChunks which will trigger close (RandomAccessFile#close), ...

HDFS has a sync-on-close option

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/outputstream.html#HDFS_and_OutputStream.close.28.29

We may add a similar option to Ozone.

vyalamar · 2026-02-24T02:13:22Z

Can I try to induce the failure, verify the PR and take it from here.

ivandika3 · 2026-02-24T02:39:51Z

HDFS has a sync-on-close option

@szetszwo Thanks for the suggestion. That's a good idea.

Can I try to induce the failure, verify the PR and take it from here.

@vyalamar Thanks, please do, I don't really have the bandwidth to handle this (I just found it while trying to check Ozone durability guarantee). You can take over if needed.

HDDS-14246. Change fsync boundary for FilePerBlockStrategy to block l…

3c01f35

…evel

ivandika3 marked this pull request as ready for review December 30, 2025 01:15

rich7420 reviewed Dec 30, 2025

View reviewed changes

ivandika3 self-assigned this Jan 5, 2026

siddhantsangwan self-requested a review January 6, 2026 06:02

siddhantsangwan reviewed Jan 6, 2026

View reviewed changes

siddhantsangwan reviewed Jan 7, 2026

View reviewed changes

Update ozone-default.xml

1274901

github-actions bot added the stale label Jan 31, 2026

ivandika3 removed the stale label Feb 2, 2026

github-actions bot added the stale label Feb 24, 2026

ivandika3 removed the stale label Feb 24, 2026

		in the container happen as sync I/0 or buffered I/O operation. For FilePerBlockStrategy, this
		the sync I/O operation only happens before block file is closed.

Comments

Conversation

ivandika3 commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

rich7420 commented Dec 30, 2025

Uh oh!

rich7420 Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

ivandika3 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

swamirishi commented Jan 5, 2026

Uh oh!

swamirishi commented Jan 5, 2026

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ivandika3 Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siddhantsangwan Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

ivandika3 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

ivandika3 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 31, 2026

Uh oh!

github-actions bot commented Feb 24, 2026

Uh oh!

szetszwo commented Feb 24, 2026

Uh oh!

vyalamar commented Feb 24, 2026

Uh oh!

ivandika3 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ivandika3 commented Dec 29, 2025 •

edited

Loading

ivandika3 Jan 8, 2026 •

edited

Loading

ivandika3 commented Jan 8, 2026 •

edited

Loading

ivandika3 commented Feb 24, 2026 •

edited

Loading