Skip to content

Commit a2db614

Browse files
authored
Feat/idle timeouts (kroxylicious#3046)
Type of change Enhancement / new feature Description Harden the proxy by disconnecting idle timeouts. Additional Context This PR starts to model a KafkaSession to which allows the runtime and filters to understand what stage of a connections life cycle it is in. This is somewhat higher level than the current ProxyChanellStateMachine but they do overlap. The KafkaSession model allows this PR to draw a distinction between new and or anonymous clients and those which have been authenticated so that we can apply more relaxed timeouts to the authenticated sessions - those which are less likely to be problematic and or malicious. Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
1 parent 26e4587 commit a2db614

File tree

33 files changed

+1530
-271
lines changed

33 files changed

+1530
-271
lines changed

.idea/inspectionProfiles/Project_Default.xml

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Format `<github issue/pr number>: <short description>`.
77

88
## SNAPSHOT
99

10+
* [#3046](https://github.com/kroxylicious/kroxylicious/pull/3046): Add configurable idle connection timeouts for client connections
1011
* [#3242](https://github.com/kroxylicious/kroxylicious/pull/3242): chore: remove deprecated template kek selector brace style
1112
* [#3224](https://github.com/kroxylicious/kroxylicious/pull/3224): Add support for using Secret in `trustAnchorRef` field of the KafkaService and the VirtualKafkaCluster CRs.
1213
* [#3171](https://github.com/kroxylicious/kroxylicious/pull/3171): build(deps): bump io.strimzi:api from 0.48.0 to 0.50.0
@@ -27,6 +28,14 @@ Format `<github issue/pr number>: <short description>`.
2728
* A JSON Web Signature (JWS) Signature validator has been added. WARNING: This validator does NOT include JSON Web Token (JWT) validation (expiration, issuer, etc. are NOT checked).
2829
* Curly-brace style topicName tokens are no longer supported in the Record Encryption TemplateKekSelector template. `template` should use `$(topicName)` instead of `${topicName}`.
2930
The was deprecated in version 0.11.0.
31+
* Idle connection timeout support added with two optional configuration properties under `network.proxy`:
32+
* `unauthenticatedIdleTimeout` - Applies to connections where authentication cannot be detected
33+
* `authenticatedIdleTimeout` - Applies to connections with established identities
34+
Both properties use Go-style duration format (e.g., `30s`, `5m`, `1h30m`) with supported units: `d`, `h`, `m`, `s`, `ms`, `μs`/`us`, `ns`.
35+
* A new metric `kroxylicious_client_to_proxy_disconnects_total` tracks client-to-proxy disconnections with a `cause` label to distinguish between:
36+
* `idle_timeout` - Connection exceeded the configured idle timeout duration
37+
* `client_closed` - Client initiated the connection close
38+
* `server_closed` - Backend server closed the connection, causing the proxy to close the client connection
3039

3140
## 0.18.0
3241

kroxylicious-api/src/main/java/io/kroxylicious/proxy/authentication/Subject.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,10 @@ public <P extends Principal> Optional<P> uniquePrincipalOfType(Class<P> uniquePr
5959
return uniquePrincipalOfType(this.principals, uniquePrincipalType);
6060
}
6161

62+
public boolean isAnonymous() {
63+
return this.principals.isEmpty();
64+
}
65+
6266
private static <P extends Principal> Optional<P> uniquePrincipalOfType(Set<Principal> principals, Class<P> uniquePrincipalType) {
6367
if (uniquePrincipalType.isAnnotationPresent(Unique.class)) {
6468
return principals.stream()

kroxylicious-api/src/test/java/io/kroxylicious/proxy/authentication/SubjectTest.java

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,4 +70,34 @@ void canExtractPrincipals() {
7070
Assertions.assertThat(subject2.allPrincipalsOfType(FakeUniquePrincipal.class)).isEmpty();
7171
Assertions.assertThat(Subject.anonymous().allPrincipalsOfType(FakeUniquePrincipal.class)).isEmpty();
7272
}
73+
74+
@Test
75+
void shouldConsiderEmptySetOfPrinciplesAnonymous() {
76+
// Given
77+
Subject emptySubject = new Subject(Set.of());
78+
79+
// When
80+
// Then
81+
Assertions.assertThat(emptySubject.isAnonymous()).isTrue();
82+
}
83+
84+
@Test
85+
void shouldNotConsiderSetOfPrinciplesAnonymous() {
86+
// Given
87+
Subject emptySubject = new Subject(user1, foo);
88+
89+
// When
90+
// Then
91+
Assertions.assertThat(emptySubject.isAnonymous()).isFalse();
92+
}
93+
94+
@Test
95+
void shouldConsiderEmptySetOfPrinciplesEqual() {
96+
// Given
97+
Subject emptySubject = new Subject(Set.of());
98+
99+
// When
100+
// Then
101+
Assertions.assertThat(emptySubject).isEqualTo(Subject.anonymous());
102+
}
73103
}

kroxylicious-docs/docs/_assemblies/assembly-configuring-proxy.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,7 @@ include::../_modules/configuring/con-configuring-vc-target-tls.adoc[leveloffset=
1717
include::../_modules/configuring/con-configuring-vc-transport-subject-builder.adoc[leveloffset=+1]
1818
include::../_modules/configuring/con-configuring-vc-other-settings.adoc[leveloffset=+1]
1919
include::../_modules/configuring/con-configuring-toplevel-other-settings.adoc[leveloffset=+1]
20+
include::../_modules/configuring/con-configuring-network-settings.adoc[leveloffset=+1]
21+
include::../_modules/configuring/con-configuring-idle-timeouts.adoc[leveloffset=+1]
2022

2123
include::../_modules/configuring/ref-configuring-proxy-example.adoc[leveloffset=+1]
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
:_mod-docs-content-type: CONCEPT
2+
3+
[id='con-configuring-idle-timeouts-{context}']
4+
= Configuring idle connection timeouts
5+
6+
[role="_abstract"]
7+
The proxy can automatically disconnect idle client connections to reclaim resources.
8+
Idle timeout configuration is completely optional and disabled by default, allowing you to opt in only when needed for your deployment.
9+
10+
== Two-stage timeout mechanism
11+
12+
The proxy supports two independent idle timeout settings that apply at different stages of the connection lifecycle:
13+
14+
* **Unauthenticated timeout** (`unauthenticatedIdleTimeout`) - Applies to connections where the proxy cannot detect the completion of authentication. The proxy considers authentication to be complete if either of the following hold true:
15+
1. A transport subject builder creates a subject with an identity. Usually this would be from a client TLS certificate.
16+
2. A SASL inspection or termination filter has invoked `io.kroxylicious.proxy.filter.FilterContext.clientSaslAuthenticationSuccess` method.
17+
* **Authenticated timeout** (`authenticatedIdleTimeout`) - Applies to connections where an identity can be established. This timeout applies for the remainder of the connection's lifetime.
18+
19+
Both timeout settings are optional and have no default values.
20+
You can configure one, both, or neither depending on your requirements.
21+
Timeout values use Go-style duration format (for example, `30s` for 30 seconds, `5m` for 5 minutes, `1h` for 1 hour).
22+
Supported units are: `d` (days), `h` (hours), `m` (minutes), `s` (seconds), `ms` (milliseconds), `μs` or `us` (microseconds), and `ns` (nanoseconds).
23+
Units can be combined, such as `1h30m` or `90s`.
24+
25+
== Configuration examples
26+
27+
.Example: Unauthenticated timeout only
28+
[source,yaml]
29+
----
30+
network:
31+
proxy:
32+
unauthenticatedIdleTimeout: 30s # <1>
33+
virtualClusters:
34+
# ...
35+
----
36+
<1> Disconnect connections that remain unauthenticated for more than 30 seconds.
37+
38+
.Example: Authenticated timeout only
39+
[source,yaml]
40+
----
41+
network:
42+
proxy:
43+
authenticatedIdleTimeout: 5m # <1>
44+
virtualClusters:
45+
# ...
46+
----
47+
<1> Disconnect authenticated connections that are idle for more than 5 minutes.
48+
49+
.Example: Both timeouts configured
50+
[source,yaml]
51+
----
52+
network:
53+
proxy:
54+
unauthenticatedIdleTimeout: 30s # <1>
55+
authenticatedIdleTimeout: 10m # <2>
56+
virtualClusters:
57+
# ...
58+
----
59+
<1> Disconnect unauthenticated connections after 30 seconds of inactivity.
60+
<2> Disconnect authenticated connections after 10 minutes of inactivity.
61+
62+
== When to enable idle timeouts
63+
64+
Consider enabling idle timeouts in the following scenarios:
65+
66+
* **Misbehaving clients** - Clients that abandon connections without properly closing them, leaving resources allocated unnecessarily.
67+
* **High-scale deployments** - Environments with many clients where connection resources (memory, file descriptors) are constrained.
68+
* **Connection exhaustion prevention** - Deployments approaching operating system or network limits on concurrent connections.
69+
* **Network infrastructure requirements** - Environments where network infrastructure (firewalls, load balancers) drops idle connections, and you want the proxy to disconnect gracefully first.
70+
* **Different security postures** - Scenarios where unauthenticated connections require stricter timeouts than authenticated connections for security reasons.
71+
72+
== When not to enable idle timeouts
73+
74+
Avoid enabling idle timeouts in the following scenarios:
75+
76+
* **Legitimate idle connections** - Applications where clients maintain long-lived connections with extended idle periods, such as consumers with long poll timeouts or applications using connection pooling.
77+
* **Stable network infrastructure** - Environments with reliable network infrastructure and no issues with idle connection management.
78+
* **Minimal overhead desired** - Deployments where the proxy's monitoring overhead should be kept to an absolute minimum.
79+
* **No resource constraints** - Systems with ample connection resources and no risk of connection exhaustion.
80+
81+
== Monitoring idle disconnects
82+
83+
The proxy tracks idle disconnects using the `kroxylicious_client_to_proxy_disconnects_total` metric with `cause="idle_timeout"`.
84+
This counter is incremented each time a connection is closed due to exceeding the configured idle timeout.
85+
86+
The `kroxylicious_client_to_proxy_disconnects_total` metric also tracks other disconnect scenarios:
87+
88+
* `cause="idle_timeout"` - Connection exceeded the configured idle timeout duration
89+
* `cause="client_closed"` - The downstream client initiated the connection close
90+
* `cause="server_closed"` - The upstream node closed the connection, causing the proxy to close the client connection
91+
92+
For more information about connection metrics, see xref:con-prometheus-metrics-proxy-{context}[Overview of proxy metrics].
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
:_mod-docs-content-type: CONCEPT
2+
3+
[id='con-configuring-network-settings-{context}']
4+
= Configuring network and Netty settings
5+
6+
[role="_abstract"]
7+
The proxy allows configuration of network settings for both the proxy endpoints (client-facing) and management endpoints using the `network.proxy` and `network.management` properties.
8+
These settings control low-level Netty behavior and are optional, with sensible defaults provided.
9+
10+
.Configuration fragment showing network settings
11+
[source,yaml]
12+
----
13+
network:
14+
proxy: # <1>
15+
workerThreadCount: 8 # <2>
16+
shutdownQuietPeriodSeconds: 10 # <3>
17+
management: # <4>
18+
workerThreadCount: 2
19+
shutdownQuietPeriodSeconds: 5
20+
virtualClusters:
21+
# ...
22+
----
23+
<1> Network settings for the proxy endpoints that handle client connections.
24+
<2> Optional: Number of Netty worker threads for handling concurrent connections. Defaults to twice the number of available processors.
25+
<3> Optional: Grace period in seconds during which the proxy continues processing existing connections before shutting down. Defaults to 0.
26+
<4> Network settings for the management HTTP endpoints. Can be configured independently from proxy settings.
27+
28+
* All network settings are optional. The proxy will use sensible defaults if not specified.
29+
* The `workerThreadCount` setting allows tuning for high-concurrency deployments. Increasing this value can improve throughput when handling many simultaneous client connections.
30+
* The `shutdownQuietPeriodSeconds` setting provides a graceful shutdown window, allowing in-flight requests to complete before the proxy terminates.
31+
* Proxy and management endpoints can have different thread pool sizes and shutdown behaviors based on their different workload characteristics.

kroxylicious-docs/docs/_modules/monitoring/con-prometheus-metrics-proxy.adoc

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,12 +34,21 @@ For these specific errors, the `virtual_cluster` and `node_id` labels are set to
3434

3535
NOTE: Error conditions signaled _within_ the Kafka protocol response (such as `RESOURCE_NOT_FOUND` or `UNKNOWN_TOPIC_ID`) are not classed as errors by these metrics.
3636

37-
=== Understanding connection counter vs gauge metrics
37+
=== Understanding connection metrics relationships
3838

39-
The proxy provides both counter and gauge metrics for connections:
39+
The proxy provides several related metrics for tracking connections:
4040

4141
* **Connection counters** (`kroxylicious_*_connections_total`) track the total number of connection attempts over time. These values only increase and provide a historical view of connection activity.
4242
* **Active connection gauges** (`kroxylicious_*_active_connections`) show the current number of open connections at any given moment. These values increase when connections are established and decrease when connections are closed.
43+
* **Error counters** (`kroxylicious_*_errors_total`) track connections that closed due to errors.
44+
* **Disconnect counters** (`kroxylicious_client_to_proxy_disconnects_total`) track connections that closed without errors, categorized by cause.
45+
46+
When a connection closes, it increments either the error counter or one of the disconnect counter causes, but never both.
47+
The active connection gauge decreases regardless of whether the closure was due to an error or a clean disconnect.
48+
49+
The following relationship holds:
50+
51+
`Active connections = Connections total - (Errors total + sum of all Disconnect causes)`
4352

4453
.Connection metrics for client and broker interactions
4554
|===
@@ -78,6 +87,15 @@ The proxy provides both counter and gauge metrics for connections:
7887
|`virtual_cluster`, `node_id`
7988
|Shows the current number of active TCP connections from the proxy to servers. +
8089
This gauge reflects real-time connection state and decreases when connections are closed.
90+
91+
|`kroxylicious_client_to_proxy_disconnects_total`
92+
|Counter
93+
|`virtual_cluster`, `node_id`, `cause`
94+
|Incremented by one every time a client connection is closed by the proxy. The `cause` label indicates the reason for disconnection: +
95+
`idle_timeout` - Connection exceeded the configured idle timeout duration (requires idle timeouts configured via `network.proxy.unauthenticatedIdleTimeout` or `network.proxy.authenticatedIdleTimeout`). +
96+
`client_closed` - Client initiated the connection close. +
97+
`server_closed` - Backend server closed the connection, causing the proxy to close the client connection. +
98+
Note: Error-based disconnects are tracked separately via `kroxylicious_client_to_proxy_errors_total`, not this metric.
8199
|===
82100

83101
== Message metrics

kroxylicious-filters/kroxylicious-sasl-inspection/src/main/java/io/kroxylicious/filter/sasl/inspection/SaslInspectionFilter.java

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import org.apache.kafka.common.protocol.Errors;
2626
import org.slf4j.Logger;
2727
import org.slf4j.LoggerFactory;
28+
import org.slf4j.spi.LoggingEventBuilder;
2829

2930
import io.kroxylicious.proxy.authentication.ClientSaslContext;
3031
import io.kroxylicious.proxy.authentication.SaslSubjectBuilder;
@@ -315,7 +316,6 @@ private CompletionStage<ResponseFilterResult> processSuccessfulAuthenticateRespo
315316
.setMessage(
316317
"Server has accepted an expired SASL credentials on channel {}. Client must re-authenticate on the next request, or the server will disconnect.")
317318
.addArgument(context::channelDescriptor)
318-
.addArgument(authorizationIdFromClient)
319319
.log();
320320
context.clientSaslAuthenticationFailure(state.saslObserver().mechanismName(), authorizationIdFromClient, new SaslException("expired credential"));
321321
}
@@ -372,6 +372,13 @@ public String authorizationId() {
372372
e = exception;
373373
}
374374
else {
375+
LoggingEventBuilder eventBuilder = LOGGER.atWarn()
376+
.setMessage("Exception caught while trying to build subject (enable debug to see the stacktrace). {}")
377+
.addArgument(throwable.getMessage());
378+
if (LOGGER.isDebugEnabled()) {
379+
eventBuilder = eventBuilder.setCause(throwable);
380+
}
381+
eventBuilder.log();
375382
e = new SubjectBuildingException("SaslSubjectBuilder " + subjectBuilder.getClass() + " threw an unexpected exception", throwable);
376383
}
377384
context.clientSaslAuthenticationFailure(saslObserver.mechanismName(),

kroxylicious-filters/kroxylicious-sasl-inspection/src/main/java/io/kroxylicious/filter/sasl/inspection/State.java

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,11 @@ private RequiringHandshakeRequest() {
7878
public AwaitingHandshakeResponse nextState(SaslObserver saslObserver) {
7979
return new AwaitingHandshakeResponse(saslObserver, NegotiationType.INITIAL);
8080
}
81+
82+
@Override
83+
public String toString() {
84+
return this.getClass().getSimpleName();
85+
}
8186
}
8287

8388
/** We're waiting for a SASL handshake response from the server. */
@@ -98,6 +103,11 @@ public RequiringAuthenticateRequest nextState() {
98103
return new RequiringAuthenticateRequest(saslObserver(), negotiationType);
99104
}
100105

106+
@Override
107+
public String toString() {
108+
return this.getClass().getSimpleName();
109+
}
110+
101111
}
102112

103113
/**
@@ -123,6 +133,11 @@ public AwaitingAuthenticateResponse nextState(boolean authRequestApiSupportsReau
123133
var clientSupportsReauthentication = negotiationType == NegotiationType.REAUTH || authRequestApiSupportsReauth;
124134
return new AwaitingAuthenticateResponse(saslObserver(), negotiationType, clientSupportsReauthentication);
125135
}
136+
137+
@Override
138+
public String toString() {
139+
return this.getClass().getSimpleName();
140+
}
126141
}
127142

128143
/**
@@ -156,6 +171,11 @@ State nextState(boolean saslFinished) {
156171
return new RequiringAuthenticateRequest(saslObserver(), negotiationType);
157172
}
158173
}
174+
175+
@Override
176+
public String toString() {
177+
return this.getClass().getSimpleName();
178+
}
159179
}
160180

161181
/**
@@ -182,6 +202,10 @@ public AwaitingHandshakeResponse nextState(SaslObserver saslObserver) {
182202
return new AwaitingHandshakeResponse(saslObserver, NegotiationType.REAUTH);
183203
}
184204

205+
@Override
206+
public String toString() {
207+
return this.getClass().getSimpleName();
208+
}
185209
}
186210

187211
/**
@@ -191,6 +215,11 @@ public AwaitingHandshakeResponse nextState(SaslObserver saslObserver) {
191215
final class DisallowingAuthenticateRequest implements State {
192216
private DisallowingAuthenticateRequest() {
193217
}
218+
219+
@Override
220+
public String toString() {
221+
return this.getClass().getSimpleName();
222+
}
194223
}
195224

196225
enum NegotiationType {

0 commit comments

Comments
 (0)