Major performance regression in client-v2

### Summary
After migrating from the ClickHouse Java client v1 to v2 we observed a major performance regression: more than 2 times less throughput. We use ClickHouse as a pre-aggregation layer, and run analytical queries that may return 10^6-10^9 rows for later processing. Because our pipeline depends on low-latency, low-overhead reads, even small regressions translate into major throughput losses.

### Reproduction
The benchmark suite in this repository measures the general query performance differences between v1 and v2 but does not cover the most performance-sensitive scenario: retrieving column values using strongly-typed getters (e.g., the reason `java.sql.ResultSet#getLong()` exists). In such cases, the new v2 client exhibits a ~100% performance drop. This regression affects any user who relies on `getXxx()` methods and migrates to the new ClickHouse JDBC driver.

I’ve submitted a [PR](https://github.com/ClickHouse/clickhouse-java/pull/2515) that introduces 2 new benchmarks using strongly-typed getters. Below are the results from running them on my local machine:
- OpenJDK version: 24.0.2
- macOS 15.5, Apple M3 Pro
```
mvn compile exec:exec -Dexec.executable=java -Dexec.args="-classpath %classpath com.clickhouse.benchmark.BenchmarkRunner -m 3 -b q -l 300000"

QueryClient.queryV1                    110  276.252 ± 31.343  ms/op
QueryClient.queryV2                    125  245.277 ± 20.057  ms/op
QueryClient.queryV1WithTypes           144  209.260 ± 17.059  ms/op
QueryClient.queryV2WithTypes           68   454.188 ± 36.329  ms/op
```
As you can see, `QueryClient.queryV1` and `QueryClient.queryV2` perform similarly. However, `QueryClient.queryV2WithTypes` is more than 2x slower than `QueryClient.queryV1WithTypes`.
While the gap between `QueryClient.queryV2` and `QueryClient.queryV1WithTypes` is around 20%, memory allocations are significantly higher in v2, increasing GC pressure which usually run concurrently.

### Root cause
Two main differences in v2 contribute to the regression, both of which are not present in v1:
- It creates a new `Object[]` for every row. This means every read triggers an array allocation, and all primitive values are boxed. This increases GC pressure and negatively impacts data locality.
In contrast, v1 reuses a single array with mutable wrappers to store values, avoiding these allocations entirely (see `ClickHouseClientOption#REUSE_VALUE_WRAPPER`).
- In v2, reading a primitive value like `getLong(int)` involves a chain of unnecessary hash table lookups:
`com.clickhouse.client.api.metadata.TableSchema#columnIndexToName` -> `nameToIndex` -> `nameToIndex`. `com.clickhouse.jdbc.ResultSetImpl#getLong(int)` adds one more lookup on top of that chain. This results in 4 `HashMap.get()` calls per column access by index to read every primitive column value.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Major performance regression in client-v2 #2516

Summary

Reproduction

Root cause

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Major performance regression in client-v2 #2516

Description

Summary

Reproduction

Root cause

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions