Skip to content

Make ValkeyClient go brrrr #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Make ValkeyClient go brrrr #158

wants to merge 1 commit into from

Conversation

fabianfett
Copy link
Collaborator

Motivation

ConnectionPool offers a fast mode for single requests. For this to work we have to leave the paradigms of structured concurrency. It is important that the ConnectionRequest is synchronously fulfilled without any continuations.

Using the withConnection API, we have the following context switch scenario:

  • On EL, a response has been received
  • On EL, the commands continuation is succeeded
  • JUMP to concurrent executor, EL is now potentially idle
  • On CE (concurrent executor) end of withConnection is reached -> acquire lock -> get next waiting request -> fulfill continuation for connection
  • On CE next withConnection closure is invoked
  • On CE next send is invoked
  • JUMP to EL for actor isolation, we are back on the EL and can run the command

If we don't use structured concurrency here, we can have the following flow:

  • On EL, a response has been received
  • On EL, we can release the connection -> acquire lock -> get next waiting request -> invoke request callback with connection
  • On EL request callback is invoked and can write command right away

Not a single context switch was necessary in this scenario.

Performance

In local testing I can see about a 20% improvement (throughput and wall clock time) for all the client benchmarks.

Changes

  • Add ValkeyConnectionRequest, that uses unstructured callbacks for improved performance

@fabianfett fabianfett requested a review from adam-fowler July 25, 2025 14:58
Copy link

❌ Benchmark comparison failed with error ❌

Summary
===============================================================
Threshold deviations for ValkeyBenchmarks:Client: GET benchmark
===============================================================
Malloc (total) (K, %) main pull_request Difference % Threshold %
p25 79 83 5 5
p50 79 86 8 5
p75 82 86 5 5
=========================================================================================================
Threshold deviations for ValkeyBenchmarks:Client: GET benchmark | parallel 50 | 20 concurrent connections
=========================================================================================================
Malloc (total) (K, %) main pull_request Difference % Threshold %
p25 9 14 52 5
p50 10 16 54 5
p75 11 17 49 5

New baseline 'pull_request' is WORSE than the 'main' baseline thresholds.

Full Benchmark Comparison

Comparing results between 'main' and 'pull_request'

Host 'db66eb8a4659' with 4 'x86_64' processors with 15 GB memory, running:
#18~24.04.1-Ubuntu SMP Sat Jun 28 04:46:03 UTC 2025

ValkeyBenchmarks

Client: GET benchmark metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) * p0 p25 p50 p75 p90 p99 p100 Samples
main 78 79 79 82 83 83 83 6
pull_request 82 83 86 86 90 90 90 9
Δ 4 4 7 4 7 7 7 3
Improvement % -5 -5 -9 -5 -8 -8 -8 3

Client: GET benchmark | parallel 20 | 20 concurrent connections metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) * p0 p25 p50 p75 p90 p99 p100 Samples
main 94 97 99 100 103 111 111 25
pull_request 93 95 96 97 99 101 101 25
Δ -1 -2 -3 -3 -4 -10 -10 0
Improvement % 1 2 3 3 4 9 9 0

Client: GET benchmark | parallel 50 | 20 concurrent connections metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) * p0 p25 p50 p75 p90 p99 p100 Samples
main 7 9 10 11 15 23 23 41
pull_request 11 14 16 17 19 20 20 49
Δ 4 5 6 6 4 -3 -3 8
Improvement % -57 -56 -60 -55 -27 13 13 8

Connection: GET benchmark metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) * p0 p25 p50 p75 p90 p99 p100 Samples
main 4 4 4 4 4 4 4 9
pull_request 4 4 4 4 4 4 4 9
Δ 0 0 0 0 0 0 0 0
Improvement % 0 0 0 0 0 0 0 0

HashSlot – {user}.whatever metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) * p0 p25 p50 p75 p90 p99 p100 Samples
main 0 0 0 0 0 0 0 19
pull_request 0 0 0 0 0 0 0 19
Δ 0 0 0 0 0 0 0 0
Improvement % 0 0 0 0 0 0 0 0

ValkeyCommandEncoder – Command with 7 words metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) * p0 p25 p50 p75 p90 p99 p100 Samples
main 0 0 0 0 0 0 0 755
pull_request 0 0 0 0 0 0 0 753
Δ 0 0 0 0 0 0 0 -2
Improvement % 0 0 0 0 0 0 0 -2

ValkeyCommandEncoder – Simple GET metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) * p0 p25 p50 p75 p90 p99 p100 Samples
main 0 0 0 0 0 0 0 1904
pull_request 0 0 0 0 0 0 0 1923
Δ 0 0 0 0 0 0 0 19
Improvement % 0 0 0 0 0 0 0 19

ValkeyCommandEncoder – Simple MGET 15 keys metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) * p0 p25 p50 p75 p90 p99 p100 Samples
main 0 0 0 0 0 0 0 357
pull_request 0 0 0 0 0 0 0 362
Δ 0 0 0 0 0 0 0 5
Improvement % 0 0 0 0 0 0 0 5

let future: EventLoopFuture<Channel>
switch address.value {
case .hostname(let host, let port):
future = connect.connect(host: host, port: port)
let socketAddress = try! SocketAddress(ipAddress: host, port: port)
future = connect.connect(to: socketAddress)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can see why you did this, but can we keep it to another PR. It would be good to still support hostnames.

@usableFromInline
let continuation: CheckedContinuation<T, any Error>
@usableFromInline
let lock: Mutex<RequestState>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You setup RequestState to be AtomicRepresentable, but are using a Mutex

defer { self.connectionPool.releaseConnection(connection) }

return try await operation(connection)
fatalError()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how are you planning to support withConnection now that the connection request is ValkeyConnectionRequest<RESPToken>

self.lock.withLock { state in
state = .onConnection(connection)
}
self.onConnection(connection, self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complete and succeed seem to do different things for a successful result, which is the correct one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants