Skip to content

Commit 047865e

Browse files
committed
Merge #12495: Increase LevelDB max_open_files
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke) Pull request description: Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed. When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible. If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid. The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory. The original concerns about file descriptor exhaustion are unwarranted on most systems because: * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files. * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there. This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files. Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
2 parents 082e26c + ccedbaf commit 047865e

File tree

2 files changed

+76
-2
lines changed

2 files changed

+76
-2
lines changed

doc/developer-notes.md

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -543,7 +543,10 @@ its upstream repository.
543543
Current subtrees include:
544544

545545
- src/leveldb
546-
- Upstream at https://github.com/google/leveldb ; Maintained by Google, but open important PRs to Core to avoid delay
546+
- Upstream at https://github.com/google/leveldb ; Maintained by Google, but
547+
open important PRs to Core to avoid delay.
548+
- **Note**: Follow the instructions in [Upgrading LevelDB](#upgrading-leveldb) when
549+
merging upstream changes to the leveldb subtree.
547550

548551
- src/libsecp256k1
549552
- Upstream at https://github.com/bitcoin-core/secp256k1/ ; actively maintaned by Core contributors.
@@ -554,6 +557,52 @@ Current subtrees include:
554557
- src/univalue
555558
- Upstream at https://github.com/jgarzik/univalue ; report important PRs to Core to avoid delay.
556559

560+
Upgrading LevelDB
561+
---------------------
562+
563+
Extra care must be taken when upgrading LevelDB. This section explains issues
564+
you must be aware of.
565+
566+
### File Descriptor Counts
567+
568+
In most configurations we use the default LevelDB value for `max_open_files`,
569+
which is 1000 at the time of this writing. If LevelDB actually uses this many
570+
file descriptors it will cause problems with Bitcoin's `select()` loop, because
571+
it may cause new sockets to be created where the fd value is >= 1024. For this
572+
reason, on 64-bit Unix systems we rely on an internal LevelDB optimization that
573+
uses `mmap()` + `close()` to open table files without actually retaining
574+
references to the table file descriptors. If you are upgrading LevelDB, you must
575+
sanity check the changes to make sure that this assumption remains valid.
576+
577+
In addition to reviewing the upstream changes in `env_posix.cc`, you can use `lsof` to
578+
check this. For example, on Linux this command will show open `.ldb` file counts:
579+
580+
```bash
581+
$ lsof -p $(pidof bitcoind) |\
582+
awk 'BEGIN { fd=0; mem=0; } /ldb$/ { if ($4 == "mem") mem++; else fd++ } END { printf "mem = %s, fd = %s\n", mem, fd}'
583+
mem = 119, fd = 0
584+
```
585+
586+
The `mem` value shows how many files are mmap'ed, and the `fd` value shows you
587+
many file descriptors these files are using. You should check that `fd` is a
588+
small number (usually 0 on 64-bit hosts).
589+
590+
See the notes in the `SetMaxOpenFiles()` function in `dbwrapper.cc` for more
591+
details.
592+
593+
### Consensus Compatibility
594+
595+
It is possible for LevelDB changes to inadvertently change consensus
596+
compatibility between nodes. This happened in Bitcoin 0.8 (when LevelDB was
597+
first introduced). When upgrading LevelDB you should review the upstream changes
598+
to check for issues affecting consensus compatibility.
599+
600+
For example, if LevelDB had a bug that accidentally prevented a key from being
601+
returned in an edge case, and that bug was fixed upstream, the bug "fix" would
602+
be an incompatible consensus change. In this situation the correct behavior
603+
would be to revert the upstream fix before applying the updates to Bitcoin's
604+
copy of LevelDB. In general you should be wary of any upstream changes affecting
605+
what data is returned from LevelDB queries.
557606

558607
Git and GitHub tips
559608
---------------------

src/dbwrapper.cpp

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,20 +71,45 @@ class CBitcoinLevelDBLogger : public leveldb::Logger {
7171
}
7272
};
7373

74+
static void SetMaxOpenFiles(leveldb::Options *options) {
75+
// On most platforms the default setting of max_open_files (which is 1000)
76+
// is optimal. On Windows using a large file count is OK because the handles
77+
// do not interfere with select() loops. On 64-bit Unix hosts this value is
78+
// also OK, because up to that amount LevelDB will use an mmap
79+
// implementation that does not use extra file descriptors (the fds are
80+
// closed after being mmaped).
81+
//
82+
// Increasing the value beyond the default is dangerous because LevelDB will
83+
// fall back to a non-mmap implementation when the file count is too large.
84+
// On 32-bit Unix host we should decrease the value because the handles use
85+
// up real fds, and we want to avoid fd exhaustion issues.
86+
//
87+
// See PR #12495 for further discussion.
88+
89+
int default_open_files = options->max_open_files;
90+
#ifndef WIN32
91+
if (sizeof(void*) < 8) {
92+
options->max_open_files = 64;
93+
}
94+
#endif
95+
LogPrint(BCLog::LEVELDB, "LevelDB using max_open_files=%d (default=%d)\n",
96+
options->max_open_files, default_open_files);
97+
}
98+
7499
static leveldb::Options GetOptions(size_t nCacheSize)
75100
{
76101
leveldb::Options options;
77102
options.block_cache = leveldb::NewLRUCache(nCacheSize / 2);
78103
options.write_buffer_size = nCacheSize / 4; // up to two write buffers may be held in memory simultaneously
79104
options.filter_policy = leveldb::NewBloomFilterPolicy(10);
80105
options.compression = leveldb::kNoCompression;
81-
options.max_open_files = 64;
82106
options.info_log = new CBitcoinLevelDBLogger();
83107
if (leveldb::kMajorVersion > 1 || (leveldb::kMajorVersion == 1 && leveldb::kMinorVersion >= 16)) {
84108
// LevelDB versions before 1.16 consider short writes to be corruption. Only trigger error
85109
// on corruption in later versions.
86110
options.paranoid_checks = true;
87111
}
112+
SetMaxOpenFiles(&options);
88113
return options;
89114
}
90115

0 commit comments

Comments
 (0)