Skip to content

Conversation

@zhaizhibo
Copy link
Contributor

Descriptions of the changes in this PR:

Optimize BookKeeper startup speed with multi-directory ledger configuration

Main Issue:
image

Motivation

During production upgrades, BookKeeper instances with existing data typically take 3 to 5 minutes to start up.

Changes

Parallelize the initialization of DbLedgerStorage to improve startup speed.

@zhaizhibo zhaizhibo force-pushed the parallel_start_rocksdb branch from 71a0dbd to 0a8c0d4 Compare April 21, 2025 17:24
@zhaizhibo zhaizhibo marked this pull request as ready for review April 21, 2025 17:27
@zhaizhibo zhaizhibo force-pushed the parallel_start_rocksdb branch from 0a8c0d4 to 1974657 Compare April 22, 2025 00:49
@zhaizhibo
Copy link
Contributor Author

after:
image
image

@StevenLuMT
Copy link
Member

I understand that this optimization requires providing the changes before and after the optimization? Is there any verification data available?

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a big thread pool just for init storage instance is not a good idea.
How many disks are you configured for one bookie?

@StevenLuMT StevenLuMT requested a review from Copilot June 5, 2025 04:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the startup performance of BookKeeper by initializing DbLedgerStorage concurrently across multiple ledger directories.

  • Introduces a thread pool executor with a dynamic thread count based on available processors and ledger directories.
  • Uses CountDownLatch and an AtomicInteger to manage and track initialization failures across threads.
Comments suppressed due to low confidence (1)

bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/DbLedgerStorage.java:265

  • Consider adding more context to the error log (e.g., the directory index or path that failed to initialize) to ease troubleshooting.
log.error("Failed to initialize DbLedgerStorage", e);

Thread.currentThread().interrupt();
log.error("Failed to initialize DbLedgerStorage", e);
} finally {
storageInitExecutor.shutdown();
Copy link

Copilot AI Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After shutting down the executor, consider awaiting its termination to ensure all tasks have completed gracefully before proceeding.

Suggested change
storageInitExecutor.shutdown();
storageInitExecutor.shutdown();
try {
if (!storageInitExecutor.awaitTermination(60, java.util.concurrent.TimeUnit.SECONDS)) {
log.warn("storageInitExecutor did not terminate within the timeout.");
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
log.error("Interrupted while waiting for storageInitExecutor to terminate", e);
}

Copilot uses AI. Check for mistakes.
@zhaizhibo
Copy link
Contributor Author

Create a big thread pool just for init storage instance is not a good idea. How many disks are you configured for one bookie?

The single-threaded configuration cannot fully utilize the performance of SSDs. Each SSD configured for a bookie node is set up with 4 or 8 directories, depending on the disk performance.

@gaozhangmin
Copy link
Contributor

Create a big thread pool just for init storage instance is not a good idea. How many disks are you configured for one bookie?

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants