Skip to content

Conversation

@micheleRP
Copy link
Contributor

@micheleRP micheleRP commented Sep 15, 2025

Description

This PR conditionalizes content for cloud docs. Related to redpanda-data/cloud-docs#412.

Resolves https://redpandadata.atlassian.net/browse/DOC-1643
Review deadline:

Page previews

Configure Client Connections (in Cloud docs)
Configure Client Connections (in SM docs)
Cluster Properties (in Cloud docs)
Cluster Properties (in SM docs)

Checks

  • New feature for Cloud
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

@micheleRP micheleRP requested a review from a team as a code owner September 15, 2025 19:30
@netlify
Copy link

netlify bot commented Sep 15, 2025

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit c712d09
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-docs-preview/deploys/68cb07188d5cdc000840611e
😎 Deploy Preview https://deploy-preview-1357--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 15, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Documentation changes convert a single-source guide into environment-aware content for cloud vs non-cloud builds. The Configure Client Connections and availability pages were split using env-cloud conditionals: non-cloud builds show kafka_connections_max and defaults, cloud builds show kafka_connections_max_per_ip and kafka_connections_max_overrides with a caveat about shared IPs. Crash-loop, startup_log, and crash_loop_sleep_sec guidance were also gated per environment. Two new cluster properties (kafka_connections_max_per_ip, kafka_connections_max_overrides) were added (non-cloud defaults gated), single-source tag markers were added, and the local Antora playbook branch for cloud-docs was updated.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Reader
    participant DocsRenderer
    rect rgba(40,120,200,0.06)
      Note right of DocsRenderer: Single-source doc with env-cloud gating
    end
    Reader->>DocsRenderer: Request "Configure Client Connections"
    DocsRenderer->>DocsRenderer: Evaluate env-cloud
    alt env-cloud == true
        DocsRenderer->>Reader: Render cloud content
        Note right of DocsRenderer: show kafka_connections_max_per_ip\nkafka_connections_max_overrides\ncloud-specific crash-loop/startup_log
    else env-cloud == false
        DocsRenderer->>Reader: Render non-cloud content
        Note right of DocsRenderer: show kafka_connections_max\nnon-cloud crash-loop/startup_log\nproperty defaults
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • mattschumpert
  • paulohtb6
  • weeco

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning In addition to the requested per-IP documentation, the PR conditionalizes and rewords several other sections (crash_loop_limit, crash_loop_sleep_sec, startup_log cleanup steps) and updates the local-antora-playbook.yml cloud-docs branch; those edits are not specified in DOC-1643 and therefore appear out-of-scope relative to the linked issue. Please either split the unrelated documentation changes (crash_loop*, startup_log instructions, and the Antora playbook branch change) into a separate PR or update the PR description and linked issue to explicitly include and justify these additional modifications and provide rendered previews showing their impact.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title "DOC-1643 single source client connections in docs" cleanly references the JIRA ticket and accurately summarizes the primary change—creating single-source, environment-aware documentation for client connection controls—so it is concise, specific, and relevant to the changeset.
Linked Issues Check ✅ Passed The changes implement the linked issue's objectives: Configure Client Connections was made single-source with tag wrappers and env-specific Asciidoc blocks, cluster-properties.adoc introduces kafka_connections_max_per_ip and kafka_connections_max_overrides with non-cloud defaults and cloud gating, and a cloud-specific IMPORTANT caveat about per-IP limits and shared-IP/NAT/PrivateLink scenarios was added, all of which align with DOC-1643.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Description Check ✅ Passed The PR description follows the repository template: it includes a Description with the 'Resolves' JIRA link (DOC-1643), a Review deadline field, page preview links for both cloud and self-managed pages, and a populated Checks list marking "New feature for Cloud". It also references the related cloud-docs PR and succinctly summarizes the change and scope. The Review deadline field is present but left blank, which is non-critical for this update.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modules/manage/pages/cluster-maintenance/configure-availability.adoc (1)

25-29: Add prominent warning about NAT/PrivateLink limitations (per PR objective).

Per-IP controls won’t isolate clients behind shared IPs (PrivateLink, NAT, proxies). Add an admonition near this NOTE.

Apply this diff just after Line 29:

+ [IMPORTANT]
+ ====
+ Per-IP connection controls require Redpanda to see individual client IPs. If clients connect through PrivateLink endpoints, NAT gateways, or other shared-IP egress, the per‑IP limit applies to the shared IP, affecting all clients behind it and preventing isolation of a single offending client.
+ ====
modules/reference/pages/properties/cluster-properties.adoc (1)

2866-2879: Fix type/default and example formatting for kafka_connections_max_overrides.

  • Type is array; default should be [], not {}.
  • Remove stray parentheses and trailing period inside the code example.
  • Clarify that it overrides kafka_connections_max_per_ip.

Apply this diff:

- A list of IP addresses for which Kafka client connection limits are overridden and don't apply. For example, `(['127.0.0.1:90', '50.20.1.1:40']).`.
+ A list of IP addresses for which <<kafka_connections_max_per_ip,`kafka_connections_max_per_ip`>> is overridden and doesn't apply. For example, `['127.0.0.1:90', '50.20.1.1:40']`.
@@
-ifndef::env-cloud[]
-*Default*: `{}` (empty list)
-endif::[]
+ifndef::env-cloud[]
+*Default*: `[]` (empty list)
+endif::[]
🧹 Nitpick comments (2)
modules/manage/pages/cluster-maintenance/configure-availability.adoc (1)

52-66: Consider consistent linking in Cloud blocks.

Cloud variant names crash_loop_limit/startup_log without xrefs. If the cloud doc build includes these pages/anchors, prefer xrefs for parity.

If linking is allowed in Cloud, adjust Lines 57–58 and 65–66 to use the same xref targets as non‑cloud. Otherwise, keep as-is.

modules/reference/pages/properties/cluster-properties.adoc (1)

2843-2863: Gate kafka_connections_max in Cloud or mark as not supported.

This property is excluded from the Cloud path in the “Configure Client Connections” page. Mirror that here to avoid confusing Cloud readers.

Apply this diff:

+ifndef::env-cloud[]
 === kafka_connections_max
@@
 ---
+endif::[]

Alternative: keep visible but add a NOTE: “Not supported in Redpanda Cloud.”

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b01d63f and 4aa065f.

📒 Files selected for processing (2)
  • modules/manage/pages/cluster-maintenance/configure-availability.adoc (4 hunks)
  • modules/reference/pages/properties/cluster-properties.adoc (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Feediver1
PR: redpanda-data/docs#1153
File: modules/reference/pages/properties/topic-properties.adoc:45-50
Timestamp: 2025-07-16T19:33:20.420Z
Learning: In the Redpanda documentation, topic property cross-references like <<max.compaction.lag.ms>> and <<min.compaction.lag.ms>> require corresponding property definition sections with anchors like [[maxcompactionlagms]] and [[mincompactionlagms]] to prevent broken links.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Redirect rules - redpanda-docs-preview
  • GitHub Check: Header rules - redpanda-docs-preview
  • GitHub Check: Pages changed - redpanda-docs-preview
🔇 Additional comments (4)
modules/manage/pages/cluster-maintenance/configure-availability.adoc (2)

16-23: Good env gating for broker-wide limit.

Cloud-vs-non-cloud split for kafka_connections_max looks correct.


76-78: Do not change the xref — the anchor exists in broker-properties.adoc.

Search found the "crash_loop_sleep_sec" anchor in modules/reference/pages/properties/broker-properties.adoc at line 238, so the current xref:reference:properties/broker-properties.adoc#crash_loop_sleep_sec is correct.

Likely an incorrect or invalid review comment.

modules/reference/pages/properties/cluster-properties.adoc (2)

2890-2912: Looks good; env‑specific default block for per‑IP limit.

Definition and gating align with usage in the guide.


2915-2936: Verify semantics for kafka_enable_authorization = null.

Statement says: if null, authorization is “disabled, and authorization is instead enabled by enable_sasl”. This blends authN and authZ and may confuse readers. Please confirm intended behavior with platform SMEs and reword accordingly.

Apply this wording if accurate:

- Flag to require authorization for Kafka connections. If `null`, the property is disabled, and authorization is instead enabled by <<enable_sasl,`enable_sasl`>>.
+ Flag to require ACL-based authorization for Kafka connections. If `null`, Redpanda uses legacy behavior: when <<enable_sasl,`enable_sasl`>> is `true`, ACL enforcement defaults to enabled; when `false`, ACLs are not enforced.

If not accurate, supply the exact fallback behavior for null and its interaction with enable_sasl and kafka_api[].authentication_method.

@micheleRP
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 15, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
local-antora-playbook.yml (1)

20-20: Revert cloud-docs branch override before merge.

Pointing the local playbook to a feature branch is fine for previews, but it should target main to avoid accidental drift in future local builds.

Apply this diff:

-    branches: 'DOC-1673-single-source-client-connections-in-cloud-docs'
+    branches: main
modules/manage/pages/cluster-maintenance/configure-availability.adoc (4)

25-26: Fix xref paths to cluster properties.

Use the same reference:properties/cluster-properties.adoc path as above for consistency and to avoid broken links.

The diff above already updates both links.


56-63: Use xref in Cloud block for parity.

Non‑cloud uses an xref for crash_loop_limit; Cloud uses inline code. Prefer consistent linking.

Apply this diff:

-ifdef::env-cloud[]
-To prevent infinite crash loops, the Redpanda broker property `crash_loop_limit` sets an upper limit on the number of consecutive crashes that can happen within one hour of each other. After it reaches the limit, a broker cannot restart until its internal consecutive crash counter is reset to zero by one of the following conditions:
+ifdef::env-cloud[]
+To prevent infinite crash loops, the Redpanda broker property xref:reference:node-properties.adoc#crash_loop_limit[`crash_loop_limit`] sets an upper limit on the number of consecutive crashes that can happen within one hour of each other. After it reaches the limit, a broker cannot restart until its internal consecutive crash counter is reset to zero by one of the following conditions:

65-70: Unify data_directory references.

Cloud block uses inline data_directory; link it like non‑cloud for consistency and navigation.

Apply this diff:

-ifdef::env-cloud[]
-* The `startup_log` file in the broker's `data_directory` is manually deleted.
+ifdef::env-cloud[]
+* The `startup_log` file in the broker's xref:reference:node-properties.adoc#data_directory[data_directory] is manually deleted.

80-86: Use xref for crash_loop_sleep_sec in Cloud block.

Mirror the non‑cloud xref so users can jump to the property reference.

Apply this diff:

-ifdef::env-cloud[]
-To facilitate debugging in environments where a broker is stuck in a crash loop, set the `crash_loop_sleep_sec` broker property. This setting determines how long the broker sleeps before terminating the process after reaching the crash loop limit. The window during which the broker remains available allows you to troubleshoot the issue.
+ifdef::env-cloud[]
+To facilitate debugging in environments where a broker is stuck in a crash loop, set the xref:reference:properties/broker-properties.adoc#crash_loop_sleep_sec[`crash_loop_sleep_sec`] broker property. This setting determines how long the broker sleeps before terminating the process after reaching the crash loop limit. The window during which the broker remains available allows you to troubleshoot the issue.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4aa065f and a8115cb.

📒 Files selected for processing (2)
  • local-antora-playbook.yml (1 hunks)
  • modules/manage/pages/cluster-maintenance/configure-availability.adoc (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Redirect rules - redpanda-docs-preview
  • GitHub Check: Header rules - redpanda-docs-preview
  • GitHub Check: Pages changed - redpanda-docs-preview
🔇 Additional comments (2)
modules/manage/pages/cluster-maintenance/configure-availability.adoc (2)

14-15: Good: clear intro of per‑IP and override properties with consistent xrefs.

The wording and xref targets look correct here.


31-33: Defaults statement may vary by environment—confirm accuracy.

“Disabled by default” might not hold in Cloud if defaults are omitted or differ. Consider clarifying per environment.

Would you like me to scan the reference page in this branch to confirm the stated defaults and propose exact wording?

@micheleRP
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 15, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

endif::[]

ifdef::env-cloud[]
To prevent infinite crash loops, the Redpanda broker property `crash_loop_limit` sets an upper limit on the number of consecutive crashes that can happen within one hour of each other. After it reaches the limit, a broker cannot restart until its internal consecutive crash counter is reset to zero by one of the following conditions:
Copy link
Member

@c4milo c4milo Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not exposed in Redpanda Cloud, we manage it internally. We also don't have plans to expose it; it makes no absolute sense for a managed service like ours.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this in production?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm surprised to see anything related to crash loop prevention on this page as its a completely different topic for admins (not about clients but brokers) and indeed also not planned to be exposed in cloud. I think this whole second section of this page about crash loops should just be removed entirely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've conditionalized out this Prevent crash loops section for Cloud docs. @pgellert: Looks like this was updated with #966. Please see Matt's comment below, and then can you please confirm that this section should remain documented for Self-Managed docs? Is there a better location for this content, some other page?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crash loop tracking is documented in the kubernetes troubleshooting docs as well here: https://docs.redpanda.com/current/troubleshoot/errors-solutions/k-resolve-errors/#crash-loop-backoffs

I think those troubleshooting docs + the detailed description around the cluster configs themselves here are sufficient and we can remove these paragraphs from the "Configure Client Connections" page.

An alternative would be to have them on a separate page under Self-managed > Cluster Maintenance (self-managed only; excluded from cloud docs), but I think the cluster config descriptions are detailed enough that they are sufficient and we don't need a separate page for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @pgellert!

endif::[]

ifdef::env-cloud[]
To prevent infinite crash loops, the Redpanda broker property `crash_loop_limit` sets an upper limit on the number of consecutive crashes that can happen within one hour of each other. After it reaches the limit, a broker cannot restart until its internal consecutive crash counter is reset to zero by one of the following conditions:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm surprised to see anything related to crash loop prevention on this page as its a completely different topic for admins (not about clients but brokers) and indeed also not planned to be exposed in cloud. I think this whole second section of this page about crash loops should just be removed entirely

IMPORTANT: Per-IP connection controls require Redpanda to see individual client IPs. If clients connect through PrivateLink endpoints, NAT gateways, or other shared-IP egress, the per-IP limit applies to the shared IP, affecting all clients behind it and preventing isolation of a single offending client.
endif::[]

[NOTE]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisdowns is the note below accurate about connections counts? 'Typically two or three' sounds just wrong. Doesn't a client open a connection for each broker its connected to (or for each partition its producing/consuming to/from).

Its good to first note that num connections != num clients, but I think the message here is the max expected # of connections per client is 'on the order of [insert something]'

cc @micheleRP

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 2 or 3 would be a large underestimate if there are many brokers, and if clients connect to each broker (which is workload dependent).

Here are the full details:

https://redpandadata.atlassian.net/wiki/spaces/CORE/pages/510099463/How+many+connections

That's probably not going to make it into this paragraph, but a conservative estimate is N+2 connections per client where N is the number of brokers.

Copy link
Contributor Author

@micheleRP micheleRP Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @travisdowns! I changed that bullet (typically 2-3 connections per client) to:
The total number of connections is not equal to the number of clients, because a client can open multiple connections. As a conservative estimate, for a cluster with N brokers, plan for N + 2 connections per client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@c4milo c4milo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you @micheleRP!


To configure `kafka_connections_max_per_ip` safely without disrupting legitimate clients, follow these steps:

. Set up metrics scraping into your monitoring stack for the relevant cluster. See xref:manage:monitor-cloud.adoc[].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the metrics that we mention here differnt than L44?

Asking because the monitor-cloud section is too big, and this entry is somewhat vague. If we have any targeted metrics that we aim for, we should list them here. Or if we just want to say that the user must set up monitoring in general, we could reword to something like "Set up your monitoring stack for your cluster."

redpanda_rpc_active_connections{redpanda_id="CLOUD_CLUSTER_ID", redpanda_server="kafka"}
```

. Analyze the connection data to identify the normal range of connections for each broker during typical traffic cycles.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the user know what a "normal range" constitues? is this user-defined ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


. Set the `kafka_connections_max_per_ip` value based on your analysis:
** Use the upper bound of normal connections from step 3, OR
** Use a lower value if you know the expected connections per client (typically 2-3 connections per client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also confused where the 2-3 connections comes from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "Use the upper bound of normal connections from step 3, or use a lower value if you know how many connections per client are being opened."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per client IP

==== Limitations

* Decreasing the limit does not terminate any currently open Kafka API connections.
* This limit does not apply to Kafka HTTP Proxy connections.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we normally say Kafka HTTP Proxy or Redpanda HTTP Proxy or simply HTTP Proxy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to HTTP Proxy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP proxy is too generic, IMO.

* Decreasing the limit does not terminate any currently open Kafka API connections.
* This limit does not apply to Kafka HTTP Proxy connections.
* The limit may negatively affect tail latencies across all client connections.
* Clients behind NAT gateways or private links share the same IP address as seen by Redpanda brokers.
Copy link
Collaborator

@paulohtb6 paulohtb6 Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not necessarily a RP limitation. It's a network quirk that users should already know. I would remove from that list. Specially because you have L66

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For private links, it is a limitation because if Redpanda had support for proxy protocol, we could use it and get the client IPs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* The limit may negatively affect tail latencies across all client connections.
* Clients behind NAT gateways or private links share the same IP address as seen by Redpanda brokers.
* All clients behind the shared IP are collectively subject to the single `kafka_connections_max_per_ip` limit.
* Connection rejections occur randomly among clients once the limit is reached. For example: If `kafka_connections_max_per_ip` is set to 100, but clients behind a NAT gateway collectively need 150 connections, whichever client attempts the 101st connection gets rejected.
Copy link
Collaborator

@paulohtb6 paulohtb6 Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whichever client attempts the 101st connection gets rejected.

So not random. More like FIFO approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random from the perspective of the clients

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's effectively random since all the clients are racing to connect but most importantly clients try to make several connections and at the limit you will have clients that make only some of the connections they want and the others get rejected, leaving the client in a not-working state.

So it's not like you try to connect 100 clients but 20 hit the limit and get rejected and the 80 keep working, it's more like 60 of them will get a partial connection set and will be left not working. So it is important not to hit the limit at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to: Connection rejections occur randomly among clients when the limit is reached. For example, suppose kafka_connections_max_per_ip is set to 100, but clients behind a NAT gateway collectively need 150 connections. When the limit is reached, clients can make only some of the connections while others get rejected, leaving the client in a not-working state.

* Clients behind NAT gateways or private links share the same IP address as seen by Redpanda brokers.
* All clients behind the shared IP are collectively subject to the single `kafka_connections_max_per_ip` limit.
* Connection rejections occur randomly among clients once the limit is reached. For example: If `kafka_connections_max_per_ip` is set to 100, but clients behind a NAT gateway collectively need 150 connections, whichever client attempts the 101st connection gets rejected.
* Redpanda may modify this property during internal operations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's kinda scary. Should we expand on that? Because it essentially means that the user doesn't have control over this operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they really don't. In Redpanda Cloud, we manage the cluster and the service. This is a concession we are making for 1 customer because we made a promise. If it were up to me, we wouldn't be exposing this and I would be rather fixing the underlying issue (which I am right now)


Use the `kafka_connections_max_per_ip` property to limit the number of connections from each client IP address.

IMPORTANT: Per-IP connection controls require Redpanda to see individual client IPs. If clients connect through PrivateLink endpoints, NAT gateways, or other shared-IP egress, the per-IP limit applies to the shared IP, affecting all clients behind it and preventing isolation of a single offending client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an obvious case to mention in this list too is multiple clients on the same host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:
IMPORTANT: Per-IP connection controls require Redpanda to see individual client IPs. If clients connect through private link endpoints, NAT gateways, or other shared-IP egress, the per-IP limit applies to the shared IP, affecting all clients behind it and preventing isolation of a single offending client. Similarly, multiple clients running on the same host will share the same IP address, and the limit applies collectively to all those clients.

Copy link
Member

@travisdowns travisdowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments inline

@micheleRP micheleRP requested a review from paulohtb6 September 17, 2025 16:07
Copy link
Collaborator

@paulohtb6 paulohtb6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Thanks for the clarifications @c4milo @travisdowns

@micheleRP micheleRP merged commit 1e39d67 into main Sep 17, 2025
7 checks passed
@micheleRP micheleRP deleted the DOC-1643-Document-per-ip-connection-limit-config-settings-with-notice-about-limitations-in-certain-networking-environments branch September 17, 2025 19:13
@coderabbitai coderabbitai bot mentioned this pull request Nov 3, 2025
4 tasks
@coderabbitai coderabbitai bot mentioned this pull request Nov 11, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants