Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 17, 2026

Summary

Adds a BallistaClientPool that caches gRPC connections to executors by (host, port). This avoids the overhead of establishing new connections for each partition fetch during shuffle reads.

Changes

  • Add BallistaClientPool struct in client.rs with:
    • get_or_connect(host, port, max_message_size) - gets cached connection or creates new one
    • remove(host, port) - removes a connection (useful for error recovery)
    • clear() - clears all cached connections
  • Add global_client_pool() function for shared pool access across all shuffle reads
  • Update fetch_partition_remote() in shuffle_reader.rs to use the connection pool
  • Remove stale connections from pool on fetch errors

Implementation Details

The pool uses a read-write lock pattern with parking_lot::RwLock:

  • Fast path: read lock to check for existing connection (no contention for cache hits)
  • Slow path: create connection without holding lock, then write lock to cache

Race handling: if multiple tasks try to connect to the same host simultaneously, each creates a connection but only one gets cached. The others detect this and return the cached one, dropping their extra connection. This is a rare case and avoids holding locks during async connection establishment.

Error Handling

On FetchFailed or GrpcActionError, the connection is removed from the pool so subsequent requests create fresh connections. This handles cases where:

  • The remote server closes the connection
  • Network issues cause connection problems
  • The executor is restarted

Motivation

This addresses the TODO comment in shuffle_reader.rs:

// TODO for shuffle client connections, we should avoid creating new connections again and again.
// And we should also avoid to keep alive too many connections for long time.

Connection establishment involves TCP handshake, TLS negotiation (if enabled), and HTTP/2 setup. Reusing connections eliminates this overhead for repeated fetches from the same executor.

Note

This PR was created with AI assistance using Claude Code. All changes were reviewed and approved by a human maintainer.

Test plan

  • Existing shuffle_reader tests pass (cargo test -p ballista-core shuffle_reader)
  • Existing client tests pass (cargo test -p ballista-core client)
  • Manual testing with distributed queries to verify connection reuse

Adds a BallistaClientPool that caches gRPC connections to executors
by (host, port). This avoids the overhead of establishing new
connections for each partition fetch during shuffle reads.

Key changes:
- Add BallistaClientPool struct with get_or_connect() method
- Add global_client_pool() function for shared pool access
- Update fetch_partition_remote() to use the connection pool
- Remove stale connections from pool on fetch errors

The pool uses a read-write lock pattern with parking_lot::RwLock:
- Fast path: read lock to check for existing connection
- Slow path: create connection without holding lock, then
  write lock to cache (handles races by preferring existing)

Error handling: on FetchFailed or GrpcActionError, the connection
is removed from the pool so subsequent requests create fresh
connections. This handles cases where connections become stale
or the remote server closes them.

This addresses the TODO comment in shuffle_reader.rs about avoiding
creating new connections repeatedly.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@andygrove andygrove force-pushed the shuffle-reader-connection-pool branch from 70f145d to 4b28e0a Compare January 17, 2026 16:27
@andygrove andygrove requested a review from milenkovicm January 17, 2026 16:29
@milenkovicm
Copy link
Contributor

milenkovicm commented Jan 17, 2026

is there a chance we cover #1367 somehow as well? (if it is obvious)

@milenkovicm
Copy link
Contributor

there is a test

async fn should_force_local_read_with_flight(

which forces remote read always, not sure if that would cover your last task in the list.

/// access during connection creation. The `BallistaClient` itself is `Clone`
/// (wrapping an `Arc`), so cloned clients share the underlying connection.
#[derive(Default)]
pub struct BallistaClientPool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wonder would it make sense to use some off-the-shelf like deadpool or r2d2 instead of implementing our own ?

If i'm not mistaken, current implementation can, in theory, leak connections; connection will be removed from the pool only when it fails, but, in theory, we can have a executor removed/replaced and connection never get chance to fail

Copy link
Member Author

@andygrove andygrove Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a TTL improvement, but moved this to draft for now

@andygrove andygrove marked this pull request as draft January 17, 2026 21:54
Connections in the pool were only removed when they failed. If an
executor was removed/replaced without causing connection errors,
stale connections would leak indefinitely.

This adds a configurable TTL (default 5 minutes) so connections
automatically expire and are replaced with fresh ones on the next
access attempt.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@milenkovicm
Copy link
Contributor

i guess we can't use r2d2 or deadpool as they pool instances of the same destination, interfaces have get() method to return a pooled instance and we need get(address)

@milenkovicm
Copy link
Contributor

closes #736 i believe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants