Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions src/server/pegasus_server_impl_init.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,43 @@ DSN_DEFINE_uint64(pegasus.server,
600, // 600 is the default value in RocksDB.
"If not zero, dump rocksdb.stats to RocksDB every stats_persist_period_sec");

/* Rocksdb blobdb for Key-value separation.
* For more infomation, see: https://github.com/facebook/rocksdb/wiki/BlobDB */
DSN_DEFINE_bool(pegasus.server,
rocksdb_enable_blob_files,
false,
"switch of the key-value separation function");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the description in rocksdb, i.e. "When set, large values (blobs) are written to separate blob files, and only pointers to them are stored in SST files. This can reduce write amplification for large-value use cases at the cost of introducing a level of indirection for reads."

It could provide more information, avoid missunderstanding or ambiguity.

Other options are the same.


DSN_DEFINE_uint32(pegasus.server,
rocksdb_min_blob_size,
4 * 1024, // 4KB
"minimum value size (in bytes) to trigger blob file writing");

DSN_DEFINE_uint64(pegasus.server,
rocksdb_blob_file_size,
256 * 1024 * 1024,
"maximum size (in bytes) of a blob file");

DSN_DEFINE_bool(pegasus.server,
rocksdb_enable_blob_garbage_collection,
true,
"whether to enable blob file garbage collection");

DSN_DEFINE_double(pegasus.server,
rocksdb_blob_garbage_collection_age_cutoff,
0.25,
"age cutoff of oldest blob files (as a fraction) to be considered in GC");

DSN_DEFINE_double(pegasus.server,
rocksdb_blob_garbage_collection_force_threshold,
0.60,
"threshold of garbage ratio in old blob files to force GC");

DSN_DEFINE_int32(pegasus.server,
rocksdb_blob_file_starting_level,
2,
"the lowest LSM tree level at which blob files can be created");

namespace dsn {
namespace replication {
class replica;
Expand Down Expand Up @@ -693,6 +730,17 @@ pegasus_server_impl::pegasus_server_impl(dsn::replication::replica *r)
_data_cf_opts.max_bytes_for_level_base = FLAGS_rocksdb_max_bytes_for_level_base;
_data_cf_opts.max_bytes_for_level_multiplier = FLAGS_rocksdb_max_bytes_for_level_multiplier;

// open db with key-value separation option (rocksdb blobdb)
_data_cf_opts.enable_blob_files = FLAGS_rocksdb_enable_blob_files;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happen if enable this switch on an already running rocksdb instance? Will the data to be lost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable_blob_files will affects flush/compaction task after your change, and will not interrupting ongoing background tasks.
If your turn it from 'true' to 'false', and at the same time enable_blob_garbage_collection=true.:
The background Blob GC thread will continue to scan old files. When the garbage rate of the entire file exceeds the threshold (blob_garbage_collection_force_threshold), the surviving entries in the file will be moved to the new SST (not the new blob), and then the original file will be deleted.
On the other hand, if enable_blob_garbage_collection=false, or the keys of this blob file remain unchanged for a long time, the blob file will stay in rocksdb for a long time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ninsmiracle If enable_blob_files is set as true at the beginning, I wrote k1 -> v1 to a table. Then set it to false and reboot the replica servers, can I read the former k1 from the DB? How it act if the data in both SST and blob files?

_data_cf_opts.min_blob_size = FLAGS_rocksdb_min_blob_size;
_data_cf_opts.blob_file_size = FLAGS_rocksdb_blob_file_size;
_data_cf_opts.enable_blob_garbage_collection = FLAGS_rocksdb_enable_blob_garbage_collection;
_data_cf_opts.blob_garbage_collection_age_cutoff =
FLAGS_rocksdb_blob_garbage_collection_age_cutoff;
_data_cf_opts.blob_garbage_collection_force_threshold =
FLAGS_rocksdb_blob_garbage_collection_force_threshold;
_data_cf_opts.blob_file_starting_level = FLAGS_rocksdb_blob_file_starting_level;

// we need set max_compaction_bytes definitely because set_usage_scenario() depends on it.
_data_cf_opts.max_compaction_bytes = _data_cf_opts.target_file_size_base * 25;
_data_cf_opts.level0_file_num_compaction_trigger =
Expand Down
Loading