Make ValkeyClient go brrrr #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

fabianfett wants to merge 1 commit into main from ff-client-performance

+227 −24

Collaborator

fabianfett commented Jul 25, 2025

Motivation

ConnectionPool offers a fast mode for single requests. For this to work we have to leave the paradigms of structured concurrency. It is important that the ConnectionRequest is synchronously fulfilled without any continuations.

Using the withConnection API, we have the following context switch scenario:

On EL, a response has been received
On EL, the commands continuation is succeeded
JUMP to concurrent executor, EL is now potentially idle
On CE (concurrent executor) end of withConnection is reached -> acquire lock -> get next waiting request -> fulfill continuation for connection
On CE next withConnection closure is invoked
On CE next send is invoked
JUMP to EL for actor isolation, we are back on the EL and can run the command

If we don't use structured concurrency here, we can have the following flow:

On EL, a response has been received
On EL, we can release the connection -> acquire lock -> get next waiting request -> invoke request callback with connection
On EL request callback is invoked and can write command right away

Not a single context switch was necessary in this scenario.

Performance

In local testing I can see about a 20% improvement (throughput and wall clock time) for all the client benchmarks.

Changes

Add ValkeyConnectionRequest, that uses unstructured callbacks for improved performance


          Make ValkeyClient go brrrr

d294b5b

fabianfett requested a review from adam-fowler

July 25, 2025 14:58

github-actions bot commented Jul 25, 2025

❌ Benchmark comparison failed with error ❌

Summary

===============================================================
Threshold deviations for ValkeyBenchmarks:Client: GET benchmark
===============================================================

Malloc (total) (K, %)	main	pull_request	Difference %	Threshold %
p25	79	83	5	5
p50	79	86	8	5
p75	82	86	5	5

=========================================================================================================
Threshold deviations for ValkeyBenchmarks:Client: GET benchmark | parallel 50 | 20 concurrent connections
=========================================================================================================

Malloc (total) (K, %)	main	pull_request	Difference %	Threshold %
p25	9	14	52	5
p50	10	16	54	5
p75	11	17	49	5

New baseline 'pull_request' is WORSE than the 'main' baseline thresholds.

Full Benchmark Comparison

Comparing results between 'main' and 'pull_request'

Host 'db66eb8a4659' with 4 'x86_64' processors with 15 GB memory, running:
#18~24.04.1-Ubuntu SMP Sat Jun 28 04:46:03 UTC 2025

ValkeyBenchmarks

Client: GET benchmark metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) *	p0	p25	p50	p75	p90	p99	p100	Samples
main	78	79	79	82	83	83	83	6
pull_request	82	83	86	86	90	90	90	9
Δ	4	4	7	4	7	7	7	3
Improvement %	-5	-5	-9	-5	-8	-8	-8	3

Client: GET benchmark | parallel 20 | 20 concurrent connections metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) *	p0	p25	p50	p75	p90	p99	p100	Samples
main	94	97	99	100	103	111	111	25
pull_request	93	95	96	97	99	101	101	25
Δ	-1	-2	-3	-3	-4	-10	-10	0
Improvement %	1	2	3	3	4	9	9	0

Client: GET benchmark | parallel 50 | 20 concurrent connections metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) *	p0	p25	p50	p75	p90	p99	p100	Samples
main	7	9	10	11	15	23	23	41
pull_request	11	14	16	17	19	20	20	49
Δ	4	5	6	6	4	-3	-3	8
Improvement %	-57	-56	-60	-55	-27	13	13	8

Connection: GET benchmark metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) *	p0	p25	p50	p75	p90	p99	p100	Samples
main	4	4	4	4	4	4	4	9
pull_request	4	4	4	4	4	4	4	9
Δ	0	0	0	0	0	0	0	0
Improvement %	0	0	0	0	0	0	0	0

HashSlot – {user}.whatever metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) *	p0	p25	p50	p75	p90	p99	p100	Samples
main	0	0	0	0	0	0	0	19
pull_request	0	0	0	0	0	0	0	19
Δ	0	0	0	0	0	0	0	0
Improvement %	0	0	0	0	0	0	0	0

ValkeyCommandEncoder – Command with 7 words metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) *	p0	p25	p50	p75	p90	p99	p100	Samples
main	0	0	0	0	0	0	0	755
pull_request	0	0	0	0	0	0	0	753
Δ	0	0	0	0	0	0	0	-2
Improvement %	0	0	0	0	0	0	0	-2

ValkeyCommandEncoder – Simple GET metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) *	p0	p25	p50	p75	p90	p99	p100	Samples
main	0	0	0	0	0	0	0	1904
pull_request	0	0	0	0	0	0	0	1923
Δ	0	0	0	0	0	0	0	19
Improvement %	0	0	0	0	0	0	0	19

ValkeyCommandEncoder – Simple MGET 15 keys metrics

Malloc (total): results within specified thresholds, fold down for details.

Malloc (total) *	p0	p25	p50	p75	p90	p99	p100	Samples
main	0	0	0	0	0	0	0	357
pull_request	0	0	0	0	0	0	0	362
Δ	0	0	0	0	0	0	0	5
Improvement %	0	0	0	0	0	0	0	5

adam-fowler reviewed

View reviewed changes

Sources/Valkey/Connection/ValkeyConnection.swift

    
                      case .hostname(let host, let port):

                          future = connect.connect(host: host, port: port)

                          let socketAddress = try! SocketAddress(ipAddress: host, port: port)

                          future = connect.connect(to: socketAddress)

Collaborator

adam-fowler Jul 28, 2025

Can see why you did this, but can we keep it to another PR. It would be good to still support hostnames.

Collaborator Author

fabianfett Aug 26, 2025

this was because of a bug in a pre-release os version. will remove :)

Sources/Valkey/ValkeyClient.swift

    
                  @usableFromInline

                  let continuation: CheckedContinuation<T, any Error>

                  @usableFromInline

                  let lock: Mutex<RequestState>

Collaborator

adam-fowler Jul 28, 2025

You setup RequestState to be AtomicRepresentable, but are using a Mutex

Sources/Valkey/ValkeyClient.swift

    
                      defer { self.connectionPool.releaseConnection(connection) }

                      return try await operation(connection)

                      fatalError()

Collaborator

adam-fowler Jul 28, 2025

And how are you planning to support withConnection now that the connection request is ValkeyConnectionRequest<RESPToken>

Sources/Valkey/ValkeyClient.swift

    
                          self.lock.withLock { state in

                              state = .onConnection(connection)

                          }

                          self.onConnection(connection, self)

Collaborator

adam-fowler Jul 28, 2025

complete and succeed seem to do different things for a successful result, which is the correct one?

fabianfett mentioned this pull request

Add Distributed Tracing support #177

Merged

fabianfett added this to the 1.0 milestone

Collaborator Author

fabianfett commented Sep 23, 2025

@adam-fowler I have discussed with @nilanshu-sharma that this is a feature that we can ship after 1.0 – as it does not affect API at all. Are you on board with this?

Collaborator

adam-fowler commented Sep 23, 2025

@adam-fowler I have discussed with @nilanshu-sharma that this is a feature that we can ship after 1.0 – as it does not affect API at all. Are you on board with this?

Yeah I'm fine with that

adam-fowler removed this from the 1.0 milestone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet