Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@
*** xref:manage:cluster-maintenance/disk-utilization.adoc[]
*** xref:manage:cluster-maintenance/manage-throughput.adoc[Manage Throughput]
*** xref:manage:cluster-maintenance/compaction-settings.adoc[Compaction Settings]
*** xref:manage:cluster-maintenance/configure-availability.adoc[Configure Availability]
*** xref:manage:cluster-maintenance/configure-availability.adoc[]
*** xref:manage:cluster-maintenance/partition-recovery.adoc[Forced Partition Recovery]
*** xref:manage:cluster-maintenance/nodewise-partition-recovery.adoc[Node-wise Partition Recovery]
** xref:manage:security/index.adoc[Security]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
= Configure Client Connections
:description: Guidelines for configuring Redpanda clusters for optimal availability.
:description: Learn about guidelines for configuring client connections in Redpanda clusters for optimal availability.
:page-categories: Management, Networking
// tag::single-source[]

Optimize the availability of your clusters by configuring and tuning properties.

Expand All @@ -10,26 +11,68 @@ A malicious Kafka client application may create many network connections to exec

The following Redpanda cluster properties limit the number of connections:

* xref:reference:cluster-properties.adoc#kafka_connections_max[`kafka_connections_max`]: Similar to Kafka's `max.connections`, this sets the maximum number of connections per broker.
* xref:reference:cluster-properties.adoc#kafka_connections_max_per_ip[`kafka_connections_max_per_ip`]: Similar to Kafka's `max.connections.per.ip`, this sets the maximum number of connections accepted per IP address by a broker.
* xref:reference:cluster-properties.adoc#kafka_connections_max_overrides[`kafka_connections_max_overrides`]: A list of IP addresses for which `kafka_connections_max_per_ip` is overridden and doesn't apply.
* xref:reference:properties/cluster-properties.adoc#kafka_connections_max_per_ip[`kafka_connections_max_per_ip`]: Similar to Kafka's `max.connections.per.ip`, this sets the maximum number of connections accepted per IP address by a broker.
* xref:reference:properties/cluster-properties.adoc#kafka_connections_max_overrides[`kafka_connections_max_overrides`]: A list of IP addresses for which `kafka_connections_max_per_ip` is overridden and doesn't apply.
ifndef::env-cloud[]
* xref:reference:properties/cluster-properties.adoc#kafka_connections_max[`kafka_connections_max`]: Similar to Kafka's `max.connections`, this sets the maximum number of connections per broker.

Redpanda also provides properties to manage the rate of connection creation:

* xref:reference:cluster-properties.adoc#kafka_connection_rate_limit[`kafka_connection_rate_limit`]: This property limits the maximum rate of connections created per second. It applies to each CPU core.
* xref:reference:cluster-properties.adoc#kafka_connection_rate_limit_overrides[`kafka_connection_rate_limit_overrides`]: A list of IP addresses for which `kafka_connection_rate_limit` is overridden and doesn't apply.
* xref:reference:properties/cluster-properties.adoc#kafka_connection_rate_limit[`kafka_connection_rate_limit`]: This property limits the maximum rate of connections created per second. It applies to each CPU core.
* xref:reference:properties/cluster-properties.adoc#kafka_connection_rate_limit_overrides[`kafka_connection_rate_limit_overrides`]: A list of IP addresses for which `kafka_connection_rate_limit` is overridden and doesn't apply.
endif::[]

[NOTE]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisdowns is the note below accurate about connections counts? 'Typically two or three' sounds just wrong. Doesn't a client open a connection for each broker its connected to (or for each partition its producing/consuming to/from).

Its good to first note that num connections != num clients, but I think the message here is the max expected # of connections per client is 'on the order of [insert something]'

cc @micheleRP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 2 or 3 would be a large underestimate if there are many brokers, and if clients connect to each broker (which is workload dependent).

Here are the full details:

https://redpandadata.atlassian.net/wiki/spaces/CORE/pages/510099463/How+many+connections

That's probably not going to make it into this paragraph, but a conservative estimate is N+2 connections per client where N is the number of brokers.

Copy link
Contributor Author

@micheleRP micheleRP Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @travisdowns! I changed that bullet (typically 2-3 connections per client) to:
The total number of connections is not equal to the number of clients, because a client can open multiple connections. As a conservative estimate, for a cluster with N brokers, plan for N + 2 connections per client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

====
* These connection limit properties are disabled by default. You must manually enable them.
* Typically, a client opens two or three connections, so the total number of connections is not equal to the number of clients. For example, to support 100 clients, you might set your connection limit to 300.
* The total number of connections is not equal to the number of clients, because a client can open multiple connections. As a conservative estimate, for a cluster with N brokers, plan for N + 2 connections per client.
====

ifdef::env-cloud[]
=== Configure connection count limit by client IP

Use the `kafka_connections_max_per_ip` property to limit the number of connections from each client IP address.

IMPORTANT: Per-IP connection controls require Redpanda to see individual client IPs. If clients connect through private link endpoints, NAT gateways, or other shared-IP egress, the per-IP limit applies to the shared IP, affecting all clients behind it and preventing isolation of a single offending client. Similarly, multiple clients running on the same host will share the same IP address, and the limit applies collectively to all those clients.

==== Configure the limit

To configure `kafka_connections_max_per_ip` safely without disrupting legitimate clients, follow these steps:

. Set up your monitoring stack for your cluster. See xref:manage:monitor-cloud.adoc[].

. Monitor current connection patterns using the `redpanda_rpc_active_connections` metric with the `redpanda_server="kafka"` filter:
+
```
redpanda_rpc_active_connections{redpanda_id="CLOUD_CLUSTER_ID", redpanda_server="kafka"}
```

. Analyze the connection data to identify the normal range of connections for each broker during typical traffic cycles. For example, in the following Grafana screenshot, the normal range is around 200-300 connections:
+
image::shared:monitor_connections.png[Range of active connections over time]

. Set the `kafka_connections_max_per_ip` value based on your analysis. Use the upper bound of normal connections from step 3, or use a lower value if you know how many connections per client IP are being opened.

. Continue monitoring the connection metrics after applying the limit to ensure that legitimate clients are not affected and that the problematic client is properly controlled.

==== Limitations

* Decreasing the limit does not terminate any currently open Kafka API connections.
* This limit does not apply to Kafka HTTP Proxy connections.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we normally say Kafka HTTP Proxy or Redpanda HTTP Proxy or simply HTTP Proxy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to HTTP Proxy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP proxy is too generic, IMO.

* Clients behind NAT gateways or private links share the same IP address as seen by Redpanda brokers.
Copy link
Collaborator

@paulohtb6 paulohtb6 Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not necessarily a RP limitation. It's a network quirk that users should already know. I would remove from that list. Specially because you have L66

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For private links, it is a limitation because if Redpanda had support for proxy protocol, we could use it and get the client IPs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* The limit may negatively affect tail latencies across all client connections.
* All clients behind the shared IP are collectively subject to the single `kafka_connections_max_per_ip` limit.
* Connection rejections occur randomly among clients when the limit is reached. For example, suppose `kafka_connections_max_per_ip` is set to 100, but clients behind a NAT gateway collectively need 150 connections. When the limit is reached, clients can make only some of the connections while others get rejected, leaving the client in a not-working state.
* Redpanda may modify this property during internal operations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's kinda scary. Should we expand on that? Because it essentially means that the user doesn't have control over this operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they really don't. In Redpanda Cloud, we manage the cluster and the service. This is a concession we are making for 1 customer because we made a promise. If it were up to me, we wouldn't be exposing this and I would be rather fixing the underlying issue (which I am right now)

* Availability incidents caused by misconfiguring this feature are excluded from the Redpanda Cloud SLA.

endif::env-cloud[]

== Configure client reconnections

You can configure the Kafka client backoff and retry properties to change the default behavior of the clients to suit your failure requirements.

The following Kafka properties let you manage client reconnections:
Set the following Kafka client properties on your application's producer or consumer to manage client reconnections:

* `reconnect.backoff.ms`: Amount of time to wait before attempting to reconnect to the broker. The default is 50 milliseconds.
* `reconnect.backoff.max.ms`: Maximum amount of time in milliseconds to wait when reconnecting to a broker. The backoff increases exponentially for each consecutive connection failure, up to this maximum. The default is 1000 milliseconds (1 second).
Expand All @@ -42,21 +85,4 @@ Additionally, you can use Kafka properties to control message retry behavior. De

See also: xref:develop:produce-data/configure-producers.adoc[Configure Producers]

== Prevent crash loops

A Redpanda broker may create log segments at startup. If a broker crashes after startup, and if it gets stuck in a crash loop, it could produce progressively more stored state that uses more disk space and takes more time for each restart to process.

To prevent infinite crash loops, the Redpanda broker property xref:reference:node-properties.adoc#crash_loop_limit[`crash_loop_limit`] sets an upper limit on the number of consecutive crashes that can happen within one hour of each other. After it reaches the limit, a broker cannot restart until its internal consecutive crash counter is reset to zero by one of the following conditions:

* The `redpanda.yaml` configuration file is updated.
* The `startup_log` file in the broker's xref:reference:node-properties.adoc#data_directory[data_directory] is manually deleted.
* One hour has elapsed since the last crash.
* The broker is properly shut down. (This is not possible after `crash_loop_limit` has been reached and the broker cannot be restarted.)

[NOTE]
====
* The `crash_loop_limit` property is disabled by default. You must manually enable it by setting it to a non-zero value.
* If the limit is less than two, the broker is blocked from restarting after every crash, until one of the reset conditions is met.
====

To facilitate debugging in environments where a broker is stuck in a crash loop, set the xref:reference:properties/broker-properties.adoc#crash_loop_sleep_sec[`crash_loop_sleep_sec` configuration]. This setting determines how long the broker sleeps before terminating the process after reaching the crash loop limit. The window during which the broker remains available allows you to troubleshoot the issue. This setting is most useful when xref:troubleshoot:errors-solutions/k-resolve-errors.adoc[troubleshooting in Kubernetes environments].
// end::single-source[]
11 changes: 11 additions & 0 deletions modules/reference/pages/properties/cluster-properties.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2862,6 +2862,7 @@ Maximum number of Kafka client connections per broker. If `null`, the property i

---

// tag::kafka_connections_max_overrides[]
=== kafka_connections_max_overrides

A list of IP addresses for which Kafka client connection limits are overridden and don't apply. For example, `(['127.0.0.1:90', '50.20.1.1:40']).`.
Expand All @@ -2872,14 +2873,20 @@ A list of IP addresses for which Kafka client connection limits are overridden a

*Type:* array

ifndef::env-cloud[]
*Default*: `{}` (empty list)
endif::[]

*Related topics*:

* xref:manage:cluster-maintenance/configure-availability.adoc#limit-client-connections[Limit client connections]

---


// end::kafka_connections_max_overrides[]

// tag::kafka_connections_max_per_ip[]
=== kafka_connections_max_per_ip

Maximum number of Kafka client connections per IP address, per broker. If `null`, the property is disabled.
Expand All @@ -2892,14 +2899,18 @@ Maximum number of Kafka client connections per IP address, per broker. If `null`

*Accepted values:* [`0`, `4294967295`]

ifndef::env-cloud[]
*Default:* `null`
endif::[]

*Related topics*:

* xref:manage:cluster-maintenance/configure-availability.adoc#limit-client-connections[Limit client connections]

---

// end::kafka_connections_max_per_ip[]

=== kafka_enable_authorization

Flag to require authorization for Kafka connections. If `null`, the property is disabled, and authorization is instead enabled by <<enable_sasl,`enable_sasl`>>.
Expand Down
Binary file added modules/shared/images/monitor_connections.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.