Skip to content

[DocDB] macOS: tserver hits EMFILE at ~32K open files due to stdio FILE* limit in RocksDB #30601

@austenLacy

Description

@austenLacy

Jira Link: DB-20480

Description

Summary

On macOS, yb-tserver fails with Too many open files (EMFILE) errors when managing more than ~2,700 tablets (e.g. 2 databases with ~1,500 hash-sharded tables and secondary indexes). This happens even when ulimit -n, kern.maxfilesperproc, and kern.maxfiles are all set well above the failing threshold.

The root cause is that RocksDB's NewSequentialFile and NewLogger use fopen(), which on macOS/FreeBSD is capped at ~32,767 simultaneous FILE* streams due to the _file field in the FILE struct being a short.

Root Cause

Apple's libc (derived from FreeBSD) stores the file descriptor inside the FILE struct as a short (source). There is an unconditional guard in fopen.c:

/*
 * File descriptors are a full int, but _file is only a short.
 * If we get a valid file descriptor that is greater than
 * SHRT_MAX, then the fd will get sign-extended into an
 * invalid file descriptor.  Handle this case by failing the open.
 */
if (f > SHRT_MAX) {
    fp->_flags = 0;         /* release */
    _close(f);
    errno = EMFILE;
    return (NULL);
}

This check applies even when _DARWIN_UNLIMITED_STREAMS is defined — that flag only changes stream allocation counts via __sfp(), not the fd range check. The same limit exists in fdopen().

The raw open() syscall has no such limit.

The two fopen() call sites in RocksDB that hit this are:

  • src/yb/rocksdb/util/env_posix.ccNewSequentialFile (reads OPTIONS, CURRENT, SST metadata during tablet bootstrap)
  • src/yb/rocksdb/util/env_posix.ccNewLogger (creates per-RocksDB LOG files)

Steps to Reproduce

Requires macOS. Ensure system limits are raised first:

sudo sysctl -w kern.maxvnodes=1048576
sudo launchctl limit maxfiles 1048576 unlimited
ulimit -n 1048576

Start a single-node cluster with 1 tablet per table:

bin/yugabyted start \
  --base_dir /tmp/yb-emfile-repro \
  --listen 127.0.0.1 \
  --tserver_flags "ysql_num_shards_per_tserver=1,yb_num_shards_per_tserver=1,tablet_replicas_per_core_limit=0,tablet_replicas_per_gig_limit=0" \
  --master_flags "ysql_num_shards_per_tserver=1,yb_num_shards_per_tserver=1,replication_factor=1,tablet_replicas_per_core_limit=0,tablet_replicas_per_gig_limit=0"

Create 2 databases with 1,500 tables and 1 secondary index each:

YSQLSH="bin/ysqlsh -h 127.0.0.1 -U yugabyte"

for db in testdb1 testdb2; do
  $YSQLSH -d yugabyte -c "CREATE DATABASE $db;"
  for batch in $(seq 0 14); do
    SQL=""
    for i in $(seq $((batch*100+1)) $((batch*100+100))); do
      SQL="${SQL}CREATE TABLE t${i}(id INT PRIMARY KEY, val TEXT);"
      SQL="${SQL}CREATE INDEX t${i}_val_idx ON t${i}(val);"
    done
    $YSQLSH -d $db -c "$SQL"
  done
done

During creation of testdb2, the tserver log will show errors like:

E tablet.cc:1224] Failed to open a RocksDB database:
  IO error (env_posix.cc:592): .../CURRENT (num opened files 32770): Too many open files (system error 24)

Table creation will stall or fail with Timed out waiting for table creation.

Suggested Fix

Replace fopen() / FILE* with raw POSIX open() / read() / write() in NewSequentialFile and NewLogger, and update PosixSequentialFile and PosixLogger accordingly. Both classes already store the raw fd internally (via fileno()), so this is a minimal change. The same pattern is already used by PosixRandomAccessFile and PosixWritableFile, which use open() / pread() / write() directly.

Linux is unaffected — glibc's FILE struct uses a full int for the fd field.

Environment

  • macOS Sequoia 15.x (Darwin 25.3.0), Apple Silicon
  • YugabyteDB built from source (master branch)
  • Single-node cluster, 1 tablet per table

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions