Skip to content

Commit d7c4bae

Browse files
JakeSCahillhyperlint-ai-deprecated[bot]asimms41
committed
How to use rpk to analyze partitions and size clusters (#1034)
Co-authored-by: hyperlint-ai[bot] <154288675+hyperlint-ai[bot]@users.noreply.github.com> Co-authored-by: Angela Simms <[email protected]>
1 parent f87d53b commit d7c4bae

File tree

5 files changed

+4703
-130
lines changed

5 files changed

+4703
-130
lines changed

modules/deploy/pages/deployment-option/self-hosted/manual/sizing.adoc

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,89 @@ The https://github.com/redpanda-data/openmessaging-benchmark[Open Messaging Benc
153153

154154
See also: https://github.com/redpanda-data/openmessaging-benchmark/blob/main/driver-redpanda/README.md[Redpanda Benchmarks^]
155155

156+
== Assess throughput
157+
158+
This section describes how to use the xref:reference:rpk/rpk-topic-analyze.adoc[`rpk topic analyze`] command to check how much work your Redpanda cluster is handling. It shows the number of messages the cluster is processing and the size of the data groups (batches). This information helps you decide if you need to add more servers or make changes to your setup.
159+
160+
This command shows you the throughput of your Redpanda cluster:
161+
162+
[source,bash]
163+
----
164+
rpk topic analyze --regex '*' --print-all --time-range -1m:end
165+
----
166+
167+
The arguments are:
168+
169+
* `--regex '*'`: Analyzes all topics.
170+
* `--print-all`: Prints all the metrics.
171+
* `--time-range -1m:end`: Analyzes the last minute of data.
172+
173+
Example output:
174+
175+
[,bash,role="no-copy no-wrap"]
176+
----
177+
SUMMARY
178+
=======
179+
TOPICS 6
180+
PARTITIONS 17
181+
TOTAL THROUGHPUT (BYTES/S) 1361.9166666666667
182+
TOTAL BATCH RATE (BATCHES/S) 2.9833333333333334
183+
AVERAGE BATCH SIZE (BYTES) 456.50837988826817
184+
185+
TOPIC SUMMARY
186+
=============
187+
TOPIC PARTITIONS BYTES-PER-SECOND BATCHES-PER-SECOND AVERAGE-BYTES-PER-BATCH
188+
_redpanda.audit_log 12 61 0.1 610
189+
_redpanda.transform_logs 1 890.2666666666667 0.7833333333333333 1136.5106382978724
190+
_schemas 1 0 0 0
191+
edu-filtered-domains 1 14.283333333333333 0.1 142.83333333333334
192+
logins 1 144.61666666666667 1 144.61666666666667
193+
transactions 1 251.75 1 251.75
194+
195+
PARTITION BATCH RATE (BATCHES/S)
196+
================================
197+
TOPIC P25 P50 P75 P99
198+
_redpanda.audit_log 0.016666666666666666 0.016666666666666666 0.03333333333333333 0.03333333333333333
199+
_redpanda.transform_logs 0.7833333333333333 0.7833333333333333 0.7833333333333333 0.7833333333333333
200+
_schemas 0 0 0 0
201+
edu-filtered-domains 0.1 0.1 0.1 0.1
202+
logins 1 1 1 1
203+
transactions 1 1 1 1
204+
205+
PARTITION BATCH SIZE (BYTES)
206+
============================
207+
TOPIC P25 P50 P75 P99
208+
_redpanda.audit_log 608 610 610 611
209+
_redpanda.transform_logs 895 895 895 895
210+
_schemas 0 0 0 0
211+
edu-filtered-domains 141 141 141 141
212+
logins 144 144 144 144
213+
transactions 255 255 255 255
214+
----
215+
216+
* **Total throughput:**
217+
Indicates the total amount of data processed by the cluster every second.
218+
219+
* **Total batch rate:**
220+
Shows the number of message batches processed per second. A higher rate suggests increased activity, which may require more CPU or I/O resources.
221+
222+
* **Average batch size:**
223+
Reflects the average size of each message batch. Large or inconsistent batch sizes may indicate the need to adjust producer settings or verify storage capacity.
224+
225+
* **Topic and partition summaries:**
226+
Provides details on resource usage by individual topics. For example, if a single topic (such as `_redpanda.transform_logs` in the example output) is responsible for most throughput, it may need optimization or additional resources.
227+
228+
* **Percentiles (P25, P50, P75, P99):**
229+
Offers insights into workload distribution across partitions. Consistent values suggest balanced workloads, while significant variations may highlight areas that need rebalancing or capacity adjustments.
230+
231+
=== Plan for capacity
232+
233+
Compare the current throughput and batch rate with your cluster's hardware limits, such as network bandwidth, disk IOPS, or CPU capacity. If usage is nearing these limits, consider scaling up (upgrading hardware) or scaling out (adding brokers). Monitor trends over time to anticipate when expansion is necessary.
234+
235+
=== Address bottlenecks
236+
237+
If specific topics or partitions consistently show higher loads, it may indicate uneven workload distribution. Redistribute partitions or adjust replication factors to balance the load more effectively.
238+
156239
include::shared:partial$suggested-reading.adoc[]
157240

158241
* https://redpanda.com/blog/sizing-redpanda-cluster-best-practices[Four sizing principles for Redpanda production clusters^]

0 commit comments

Comments
 (0)