Skip to content

Conversation

@AnonHxy
Copy link
Contributor

@AnonHxy AnonHxy commented Mar 24, 2024

Descriptions of the changes in this PR:

Motivation

Fix #4238

From the issue above we can see that the core dump could happen when operate Rocksdb after it has been closed. So we should check if closed before operating rocksdb.

Changes

Checking if closed before operation RocksDB.

@AnonHxy AnonHxy force-pushed the operate_db_after_close branch from 8c6e5ff to f362987 Compare March 24, 2024 09:34
Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member

lhotari commented Mar 5, 2025

@AnonHxy Thanks, I recently also came across this in #4558

@lhotari
Copy link
Member

lhotari commented Mar 5, 2025

@AnonHxy
Another issue that I noticed in closing the RocksDB database is that it's not closing the database properly.

The javadoc of org.rocksdb.RocksDB#close says:
"This will not fsync the WAL files. If syncing is required, the caller must first call syncWal() or write(WriteOptions, WriteBatch) using an empty write batch with WriteOptions.setSync(boolean) set to true."

Would it make sense to also take this into account by adding db.syncWal() just after closed = true;?


@Override
public void put(byte[] key, byte[] value) throws IOException {
readLock.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acquiring the read lock for each put/get operation might be quite expensive, since the lock has to maintain the state of threads and the order they try to acquire the lock.

We could use a different approach, like a reference counter on the entire object. When everyone is done using it, then rocksdb is really closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AnonHxy It seems that a StampedLock could be sufficient here since there doesn't seem to be a need for re-entrancy of the lock. @merlimat would using a StampedLock address your concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an alternative PR #4581 which simply handles close and count

zhaizhibo pushed a commit to zhaizhibo/bookkeeper that referenced this pull request Apr 16, 2025
…t() (possibly triggered by Prometheus) after RocksDB has been closed(apache#4243)
zhaizhibo pushed a commit to zhaizhibo/bookkeeper that referenced this pull request Apr 16, 2025
…t() (possibly triggered by Prometheus) after RocksDB has been closed(apache#4243)
lhotari pushed a commit that referenced this pull request Apr 17, 2025
…t after rocksdb has been closed (#4581)

* Fix the coredump that occurs when calling KeyValueStorageRocksDB.count() (possibly triggered by Prometheus) after RocksDB has been closed(#4243)

* fix race when count op in process and db gets closed.

---------

Co-authored-by: zhaizhibo <[email protected]>
lhotari pushed a commit that referenced this pull request Apr 17, 2025
…t after rocksdb has been closed (#4581)

* Fix the coredump that occurs when calling KeyValueStorageRocksDB.count() (possibly triggered by Prometheus) after RocksDB has been closed(#4243)

* fix race when count op in process and db gets closed.

---------

Co-authored-by: zhaizhibo <[email protected]>
(cherry picked from commit 2831ed3)
lhotari pushed a commit that referenced this pull request Apr 17, 2025
…t after rocksdb has been closed (#4581)

* Fix the coredump that occurs when calling KeyValueStorageRocksDB.count() (possibly triggered by Prometheus) after RocksDB has been closed(#4243)

* fix race when count op in process and db gets closed.

---------

Co-authored-by: zhaizhibo <[email protected]>
(cherry picked from commit 2831ed3)
lhotari pushed a commit that referenced this pull request Apr 17, 2025
…t after rocksdb has been closed (#4581)

* Fix the coredump that occurs when calling KeyValueStorageRocksDB.count() (possibly triggered by Prometheus) after RocksDB has been closed(#4243)

* fix race when count op in process and db gets closed.

---------

Co-authored-by: zhaizhibo <[email protected]>
(cherry picked from commit 2831ed3)
@hangc0276
Copy link
Contributor

Since #4581 has fixed this issue, close this PR. @AnonHxy If you encountered this issue again, feel free the reopen it.

@hangc0276 hangc0276 closed this Jun 4, 2025
priyanshu-ctds pushed a commit to datastax/bookkeeper that referenced this pull request Jul 11, 2025
…t after rocksdb has been closed (apache#4581)

* Fix the coredump that occurs when calling KeyValueStorageRocksDB.count() (possibly triggered by Prometheus) after RocksDB has been closed(apache#4243)

* fix race when count op in process and db gets closed.

---------

Co-authored-by: zhaizhibo <[email protected]>
(cherry picked from commit 2831ed3)
(cherry picked from commit 9d067f4)
sandeep-ctds pushed a commit to datastax/bookkeeper that referenced this pull request Jul 22, 2025
…t after rocksdb has been closed (apache#4581)

* Fix the coredump that occurs when calling KeyValueStorageRocksDB.count() (possibly triggered by Prometheus) after RocksDB has been closed(apache#4243)

* fix race when count op in process and db gets closed.

---------

Co-authored-by: zhaizhibo <[email protected]>
(cherry picked from commit 2831ed3)
(cherry picked from commit 9d067f4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bookie could coredump when it is shutting down

5 participants