feat: add parallelExecution mode for direct I/O thread command execution by fanson · Pull Request #191 · tonivade/resp-server

fanson · 2026-03-23T01:51:28Z

Summary

Add parallelExecution() builder option to RespServer that bypasses the RxJava single-thread scheduler and executes commands directly on Netty I/O threads, improving throughput for fast, stateless commands.

Motivation

The default single-thread executor serializes all commands globally, which is safe but limits throughput for read-heavy, stateless workloads. For such commands, executing directly on the I/O thread that decoded the request eliminates scheduling overhead (Observable allocation, BlockingQueue contention, context switch).

Design

Default (serial): single-thread RxJava scheduler with HashMap in StateHolder — global serialization, backward compatible, zero synchronization overhead. Unchanged from upstream.
parallelExecution(): bypass scheduler entirely, execute commands on Netty I/O threads with ConcurrentHashMap in StateHolder — parallel execution with thread-safe state access, no unused scheduler created.

This cleanly addresses both concerns from review:

No unused scheduler when parallel mode is selected (it's never created)
No unnecessary ConcurrentHashMap when serial mode is selected

Changes

RespServer.java: add parallelExecution() builder method (boolean flag)
RespServerContext.java: accept boolean parallelExecution; when true, skip RxJava scheduler, execute commands directly on I/O thread, use ConcurrentHashMap for state; when false, use existing single-thread scheduler with HashMap (default behavior unchanged)
StateHolder.java: accept Map implementation via constructor (HashMap for serial, ConcurrentHashMap for parallel)

Benchmark

redis-benchmark (2M requests, pipeline=16, full production dataset, loopback):

Clients	Serial (ops/s)	parallelExecution (ops/s)	Improvement
1	85,903	84,588	-1.5%
2	148,943	155,678	+4.5%
4	147,308	156,789	+6.4%
8	154,955	164,826	+6.4%
16	146,231	153,586	+5.0%
32	139,266	145,296	+4.3%
50	143,978	150,421	+4.5%

p50 latency improvement: -5% to -29% across concurrency levels.

Test plan

Existing tests pass (default serial mode unchanged)
parallelExecution mode command execution test
parallelExecution mode exception handling test
redis-benchmark multi-client workload verification

fanson · 2026-03-25T08:26:10Z

i've implmented a high performace IP query serivce using RESP project. and the query service is a read only service, a stateless service. So i propose this 'parallel execution mode'.

After some performance benchmarking tests, I found that using "parallel execution mode" can improve query performance a lot.

Clients	Serial (ops/s)	Parallel (ops/s)	Speedup
1	62,656	64,826	1.03x
2	62,548	130,065	2.08x
4	109,457	167,783	1.53x
8	122,847	189,771	1.54x
16	119,008	183,924	1.55x

tonivade · 2026-03-25T08:35:16Z

I like the idea, but I have some concerns, later I will enter into details, now I'm busy.

fanson · 2026-03-25T09:09:29Z

sure, take your time.
We can discuss this in more detail.

tonivade · 2026-03-25T18:02:15Z

Hi @fanson, thanks for you interest to contribute.

My main concern is that after this change, when the parallel execution is selected we are going to have an unused scheduler (line 38).

And when the serial execution is selected we are going to have to access to the ConcurrentHashMap in StateHolder when it's not required to do any kind of synchronization since we are single thread.

So maybe it would be better to instead to have a parallelExecution boolean property in config, add the ability to define the number of threads of the thread pool used in the scheduler, and in the other hand, pass in the constructor of StateHolder the concrete implementation of the internal Map depending on the number of threads.

wdyt?

Address maintainer feedback on PR tonivade#191: - Replace `boolean serialExecution` with `int numThreads` parameter in RespServerContext and RespServer.Builder - StateHolder now accepts a Map implementation via constructor: HashMap for single-thread (numThreads=1), ConcurrentHashMap for multi-thread (numThreads>1) - Scheduler always used (no bypass path), thread pool size matches numThreads: newSingleThreadExecutor for 1, newFixedThreadPool for >1 - Preserve upstream daemon thread naming ("resp-server") - Remove processCommand() if/else branching — unified scheduler path This eliminates the two concerns raised: 1. No unused scheduler in any mode 2. No unnecessary ConcurrentHashMap synchronization in single-thread mode Made-with: Cursor

fanson · 2026-03-26T03:13:30Z

Hi @tonivade, thanks for the great feedback! I've updated the PR to address your concerns:

Changes made:

Replaced boolean serialExecution with int numThreads — the Builder now exposes numThreads(int) instead of parallelExecution(). Default is 1 (backward compatible).
StateHolder receives the concrete Map via constructor — RespServerContext passes HashMap when numThreads == 1 and ConcurrentHashMap when numThreads > 1. No unnecessary synchronization overhead in single-thread mode.
Scheduler is always created and used — newSingleThreadExecutor for numThreads == 1, newFixedThreadPool(n) for numThreads > 1. No unused scheduler in any configuration. Preserved the daemon thread naming ("resp-server") from your recent commit.
Removed the processCommand() if/else branching — since all modes now go through the scheduler, the code path is unified and simpler.
Rebased on latest master to pick up recent changes (case-insensitive command map, DNS lookup fix, netty upgrade, etc.).

All tests pass, including new tests for multi-threaded mode and numThreads validation.

Let me know if you'd like any further adjustments!

fanson · 2026-03-26T04:02:49Z

Performance observation with `numThreads` on read-only workloads

I benchmarked this with my IP geolocation server (IPCity) — a pure read-only, stateless workload where the GET command does a binary search (~25ns per lookup on a sample dataset).

Results (pipelined GET, 100K ops/client, loopback):

Clients	Serial (ops/s)	Parallel (ops/s)	Speedup
1	75,891	71,897	0.95x
2	64,661	59,885	0.93x
4	126,233	137,764	1.09x
8	135,131	162,182	1.20x
16	111,313	41,108	0.37x

Analysis:

The numThreads > 1 path always routes through Observable.fromCallable(...).subscribeOn(scheduler), which introduces per-command overhead:

RxJava Observable creation + subscribe/dispose lifecycle
Thread pool task queue submission (BlockingQueue lock contention)
Context switch from Netty I/O thread → scheduler thread → back

For ultra-fast commands (~25ns compute), this scheduling overhead (~500-2000ns) dominates — it's 20-80x the actual work.

At 16 clients with numThreads = availableProcessors() (~10), the fixed thread pool becomes saturated. 16 Netty I/O threads compete for 10 scheduler threads through the shared BlockingQueue, causing severe lock contention. This explains the 0.37x regression.

Controlled benchmark: `numThreads(N)` vs `parallelExecution` (direct I/O)

I ran a more rigorous comparison using redis-benchmark (2M requests, pipeline=16, loopback) with a full production dataset (88MB IPv4 with 10.5M IP ranges + 36MB IPv6 with 2.8M ranges). All three configurations share the same I/O-layer optimizations (batch flush, session caching, zero-alloc parsing) to isolate the effect of the command execution strategy.

1. Serial (single-thread scheduler, `HashMap` state) — upstream default

Clients	ops/s	p50 latency
1	85,903	0.055ms
2	148,943	0.063ms
4	147,308	0.135ms
8	154,955	0.575ms
16	146,231	1.439ms
32	139,266	3.335ms
50	143,978	5.239ms

2. `numThreads(14)` (RxJava FixedThreadPool, `ConcurrentHashMap` state) — current PR

Clients	ops/s	vs Serial	p50
1	81,826	-5.0%	0.055ms
2	138,927	-8.1%	0.071ms
4	139,626	-4.4%	0.135ms
8	144,540	-6.7%	0.623ms
16	137,052	-5.7%	1.527ms
32	136,407	-4.5%	3.415ms
50	136,454	-2.8%	5.519ms

numThreads(14) is consistently slower than serial at every concurrency level. The BlockingQueue contention in the RxJava scheduler adds pure overhead for fast commands.

3. `parallelExecution` (direct Netty I/O thread, `ConcurrentHashMap`, no scheduler)

This mode bypasses the RxJava scheduler entirely — commands execute directly on the Netty I/O thread that decoded them.

Clients	ops/s	vs Serial	p50	p50 vs Serial
1	84,588	-1.5%	0.039ms	-29%
2	155,678	+4.5%	0.055ms	-13%
4	156,789	+6.4%	0.111ms	-18%
8	164,826	+6.4%	0.543ms	-6%
16	153,586	+5.0%	1.367ms	-5%
32	145,296	+4.3%	3.183ms	-5%
50	150,421	+4.5%	4.967ms	-5%

Summary

For commands with sub-microsecond execution time, the RxJava subscribeOn path introduces measurable per-request overhead (Observable.fromCallable() allocation, BlockingQueue.offer()/poll() contention, thread context switch). With numThreads(N), multiple I/O threads compete for the shared queue, making it worse than serial.

Proposal

I'd like to update this PR to offer two clean modes that address your original concerns:

Default (serial) — single-thread scheduler with HashMap (unchanged from upstream)
parallelExecution() — bypass scheduler, execute on I/O threads, use ConcurrentHashMap

This cleanly avoids both issues you raised:

No unused scheduler in parallel mode (it's never created)
No unnecessary ConcurrentHashMap in single-thread mode

The numThreads(N > 1) option would be removed since it's a net negative for the workloads that motivated this PR. If there's a future need for scheduler-based parallelism (e.g., commands with blocking I/O), it could be revisited as a separate feature.

What do you think?

fanson · 2026-03-26T07:22:16Z

Update: PR code now matches the proposed design

I've updated the PR implementation to match the proposal above:

Replaced numThreads(int) with a clean parallelExecution() boolean builder flag
Default (serial): unchanged — single-thread RxJava scheduler + HashMap
parallelExecution(): bypasses scheduler entirely, executes on Netty I/O threads + ConcurrentHashMap
No unused scheduler or unnecessary ConcurrentHashMap in either mode
Updated PR title and description to reflect the current design

The diff is minimal — 4 files changed, focused solely on the execution mode switch. Ready for review when you have a chance.

Add a `parallelExecution()` builder option that bypasses the RxJava single-thread scheduler and executes commands directly on Netty I/O threads. Changes: - RespServerContext: accept boolean parallelExecution flag; when true, skip scheduler and execute commands inline on I/O thread; use ConcurrentHashMap for thread-safe state; when false, use existing single-thread scheduler with HashMap (unchanged default behavior) - RespServer.Builder: add parallelExecution() method - StateHolder: accept Map implementation via constructor, allowing HashMap (serial) or ConcurrentHashMap (parallel) Benchmark results (redis-benchmark, 2M requests, pipeline=16, full dataset): - Serial baseline: ~155K ops/s peak - parallelExecution: ~165K ops/s peak (+6.4%), p50 latency -5% to -29% Made-with: Cursor

StateHolder's Map-based constructor is already exercised indirectly through RespServerContextTest.processCommandParallelExecution. Testing JDK's ConcurrentHashMap put/get/remove semantics adds no value. Made-with: Cursor

tonivade · 2026-03-26T07:42:42Z

Thanks a lot @fanson

I see that the scheduler adds an overhead and if we are not going to serialize request it doesn't make sense to try to keep using the scheduler.

I have one minor concern, I will add a comment directly in the code.

fanson · 2026-03-30T03:13:28Z

Thanks a lot @fanson

I see that the scheduler adds an overhead and if we are not going to serialize request it doesn't make sense to try to keep using the scheduler.

I have one minor concern, I will add a comment directly in the code.

these 4 PRs aim to improve the performance of resp

tonivade

please, take a look to these some minor changes

src/main/java/com/github/tonivade/resp/RespServerContext.java

tonivade · 2026-03-26T07:45:40Z

src/main/java/com/github/tonivade/resp/RespServerContext.java

-                   ex -> LOGGER.error("error executing command: " + request, ex));
-    } catch (RuntimeException ex) {
-      LOGGER.error("error executing command: " + request, ex);
+    if (parallelExecution) {


use the condition scheduler == null

Made-with: Cursor

fanson force-pushed the feat/parallel-execution branch from a6fa0a2 to fdb93bb Compare March 26, 2026 03:06

fanson changed the title ~~feat: add configurable parallel execution mode~~ feat: add configurable thread pool for command execution Mar 26, 2026

fanson force-pushed the feat/parallel-execution branch from 3b19532 to 45e360d Compare March 26, 2026 07:04

fanson changed the title ~~feat: add configurable thread pool for command execution~~ feat: add parallelExecution mode for direct I/O thread command execution Mar 26, 2026

haiyang.zhou added 2 commits March 26, 2026 15:24

fanson force-pushed the feat/parallel-execution branch from 6be074c to a934c11 Compare March 26, 2026 07:25

tonivade requested changes Mar 30, 2026

View reviewed changes

tonivade added the enhancement label Mar 30, 2026

refactor: remove parallelExecution field, use scheduler nullity instead

b2c1a24

Made-with: Cursor

tonivade merged commit 4e7c8c1 into tonivade:master Mar 30, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add parallelExecution mode for direct I/O thread command execution#191

feat: add parallelExecution mode for direct I/O thread command execution#191
tonivade merged 3 commits intotonivade:masterfrom
fanson:feat/parallel-execution

fanson commented Mar 23, 2026 •

edited

Loading

Uh oh!

fanson commented Mar 25, 2026 •

edited

Loading

Uh oh!

tonivade commented Mar 25, 2026

Uh oh!

fanson commented Mar 25, 2026

Uh oh!

tonivade commented Mar 25, 2026

Uh oh!

fanson commented Mar 26, 2026

Uh oh!

fanson commented Mar 26, 2026 •

edited

Loading

Uh oh!

fanson commented Mar 26, 2026

Uh oh!

tonivade commented Mar 26, 2026

Uh oh!

fanson commented Mar 30, 2026

Uh oh!

tonivade left a comment

Uh oh!

Uh oh!

tonivade Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fanson commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Design

Changes

Benchmark

Test plan

Uh oh!

fanson commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonivade commented Mar 25, 2026

Uh oh!

fanson commented Mar 25, 2026

Uh oh!

tonivade commented Mar 25, 2026

Uh oh!

fanson commented Mar 26, 2026

Uh oh!

fanson commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance observation with numThreads on read-only workloads

Controlled benchmark: numThreads(N) vs parallelExecution (direct I/O)

1. Serial (single-thread scheduler, HashMap state) — upstream default

2. numThreads(14) (RxJava FixedThreadPool, ConcurrentHashMap state) — current PR

3. parallelExecution (direct Netty I/O thread, ConcurrentHashMap, no scheduler)

Summary

Proposal

Uh oh!

fanson commented Mar 26, 2026

Update: PR code now matches the proposed design

Uh oh!

tonivade commented Mar 26, 2026

Uh oh!

fanson commented Mar 30, 2026

Uh oh!

tonivade left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tonivade Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fanson commented Mar 23, 2026 •

edited

Loading

fanson commented Mar 25, 2026 •

edited

Loading

fanson commented Mar 26, 2026 •

edited

Loading

Performance observation with `numThreads` on read-only workloads

Controlled benchmark: `numThreads(N)` vs `parallelExecution` (direct I/O)

1. Serial (single-thread scheduler, `HashMap` state) — upstream default

2. `numThreads(14)` (RxJava FixedThreadPool, `ConcurrentHashMap` state) — current PR

3. `parallelExecution` (direct Netty I/O thread, `ConcurrentHashMap`, no scheduler)