diff --git a/_posts/2026-02-02-measuring-debezium-server-performance-mysql-streaming.adoc b/_posts/2026-02-02-measuring-debezium-server-performance-mysql-streaming.adoc new file mode 100644 index 00000000000..c6681adc841 --- /dev/null +++ b/_posts/2026-02-02-measuring-debezium-server-performance-mysql-streaming.adoc @@ -0,0 +1,283 @@ +--- +layout: post +title: "Measuring Debezium Server performance when streaming MySQL to Kafka" +date: 2026-02-02 +tags: [debezium-server, performance, streaming, mysql, docker] +author: alvarvg +--- + +Performance is a critical concern when implementing Change Data Capture (CDC) solutions in production environments. +In this post, I am going to share how we have measured Debezium Server performance (deployed in a Docker container) +while streaming changes from MySQL to Kafka. All of this with the default configurations for Debezium, MySQL and Kafka. + +++++++ + +This post does not aim to compare Debezium Server with Kafka Connect nor to identify absolute throughput limits. +Instead, it focuses on understanding how Debezium Server behaves under realistic workloads using default configurations, +and whether it can sustain real-time change data capture in lightweight architectures. + +It is also important to underline that, in this evaluation, the database is not stressed in a fully real-world scenario. +Conditions such as long-running transactions or other complex workload patterns are not represented, +as incorporating them would significantly increase the complexity of the test case. + +[id=introduction] +== Introduction + +Measuring software performance is important. It directly affects reliability, scalability, cost efficiency and user trust. It is a prerequisite for making informed engineering decisions. +Performance measurement transforms software from a black box into an understandable, predictable and optimizable system. It is well known that: + +_"What is not measured cannot be controlled, predicted or reliably improved"_ + +Going beyond sayings and proverbs, performance measurement provides you with evidence to inform and drive architectural decisions. +Specifically, performance testing can help you by providing information that enables you to: + +- Validate that the system meets requirements +- Identify bottlenecks and limiting factors +- Help ensure predictable scalability +- Reduce operational and infrastructure costs +- Deliver improvements in reliability and resilience +- Build trust with users and stakeholders + +[id=debezium-server] +== Debezium Server + +Apart from performance tests, another great unknown in the CDC community might be *Debezium Server*. If you are reading this, you probably know something about Debezium and CDC. +But let me introduce you to Debezium Server. +Debezium Server is a standalone runtime for Change Data Capture that captures database changes using Debezium and delivers them directly to configurable sinks without requiring Kafka or Kafka Connect. +It addresses a common gap in CDC architectures: situations where Kafka Connect, Kubernetes, or large distributed platforms introduce unnecessary operational complexity. +It enables teams to adopt CDC with minimal infrastructure while still benefiting from Debezium’s mature and proven connectors. + +++++ +
+ Debezium Server Architecture +
+++++ + +Debezium Server is built on top of https://quarkus.io[Quarkus] and the Debezium Engine. +This combination provides a production-ready packaged runtime that can run as a single JVM process. +It runs as a single process and although Kafka is not required, it preserves the core CDC guarantees: + +- At-least-once offset management (per connector) +- Ordered event delivery per table (depends on the connector) +- Snapshotting and streaming of the changes in the database +- Rich metadata about transactions and schemas + +[id=benchmarking]§ +== Benchmarking + +Now that we are familiar with all the components, let us dive into the technical details. + +Starting with the hardware, we have deployed two EC2 machines in AWS of type "c5a.xlarge". Each instance provides 4 vCPUs and 8 GiB of memory, ensuring sufficient resources for both workload generation and CDC processing. +We have called these two instances _Observed_ and _Observer_ (in order to separate the monitoring load from the machine where Debezium and MySQL are running). In the Observer instance, we have deployed Prometheus, Grafana and YCSB +(the software piece generating the load). In the Observed instance, we have a single Kafka container (although it is not really necessary and could be substituted with any other target, such as Pulsar, Kinesis, PubSub, or even a custom sink solution), +Debezium Server, and MySQL. + +++++ +
+ Benchmarking architecture +
+++++ + +As per the preceding description, we are using several software pieces, most of which are already known, except for Yahoo! Cloud Serving Benchmark (YCSB). The YCSB tool, which was created by the Yahoo +team and released into open source as of 04/23/2010, developed to evaluate the performance of different cloud serving stores. This is the link to their GitHub project: https://ycsb.site[https://ycsb.site]. + +If you want to reproduce this scenario, it is quite simple. Go to https://github.com/debezium/performance.git[debezium performance repo], clone it and cd into the infrastructure-automation folder and then run `terraform apply`. +This will automatically create the EC2 instances, provision them with ansible (copy files and install docker) and start the scenario. +Logging in Grafana will allow you to see how everything evolves and progresses (inside Grafana you will need to navigate to the MySQL Streaming dashboard). + +++++ +
+ Streaming dashboard from a completed scenario +
+++++ + +With the technology stack clearly defined and reproducible, we can now focus on how performance will be measured and which metrics will be considered: + +- CPU and RAM Percentage usage: We are monitoring the resource usage from the Debezium Server container. +- Network Input and Output: We are also taking into account the communications performed by the container. +- Disk Reads and Writes: To check how much disk space/bandwidth is needed. +- Throughput: This is the main value we are going to focus on (based on this value, we can estimate the operation latency) and we are comparing this to MySQL throughput. + +For each of the preceding metrics, we calculate the average value over the duration of the test. Calculating the average value in this way, we arrive at a stable value, so that any spikes in the data do not affect +the final result and are included in it. These are a few more detailed panels of the dashboard. + +++++ +
+ CPU and Memory consumption during one execution +
+++++ + +In the picture above, you can see a few of the panels related to the resource consumption. The two on top are related to the CPU usage, which is the calculated percentage of usage for a 4 vCPUs instance. +And the other ones are the memory consumption in MB throughout the execution. The two on the left represent the evolution over time, and the ones on the right are the all-time average. + +++++ +
+ Throughput evolution during one execution +
+++++ + +The preceding example shows the Debezium Server throughput that the test captures during execution. This throughput chart is split into two parts, a result that reflects the two phases of the YCSB test. +YCSB executions are divided into two phases: an initial load phase that populates the tables, followed by a workload phase that executes the configured mix of operations. This behavior explains the two distinct phases visible in the throughput charts. +If during first phase we observe that the throughput remains consistent with the configured level, we consider the execution successful (OK/1). On the other hand, if something in the process results in variable throughput, we consider that the execution failed (NOK/0). + +The image that follows shows a detailed view of the second phase of the execution, which runs 300 updates per second and 300 inserts per second. + +++++ +
+ Detailed view of the throughput panel +
+++++ + +[id=results-analysis] +== Results and Analysis + +Let's now examine the results to gain a clear view of how the system behaves under varying levels of load, so that we can identify trends that might not be evident from isolated measurements. If you want to review the detailed analysis, you can view the https://github.com/debezium/performance/tree/e5732ddc3324ec861f0708692f2372bcd25a2e61/_results/debezium_server/streaming_mysql[raw results]. + +++++ +
+ Debezium Server streaming results screenshot for MySQL +
+++++ + +With the preceding image in mind, let's consider the meaning of each column. + +- TABLE_RECORDS: Desired table records we are going to work with in the execution. +- TABLE_TARGET_OPS: Desired operations per second we are going to perform during the execution. +- TOTAL_RECORDS: Calculated value, the value of TABLE_RECORDS times the value of TABLES. +- TOTAL_OPS: Calculated value, the value of TABLE_TARGET_OPS times the value of TABLES. +- RESULT: Indicates if the execution has ended positively or not. 1 means OK and 0 means NOK. +- DURATION: Total duration of the execution. This includes both phases of YCSB, as explained above. +- AVG_*: The average (throughout the execution) of the given resource. Resources: CPU, MEM, NET_IN, NET_OUT, DISK_WRITE and DISK_READ. +Averaging smooths short-lived spikes and highlights sustained behavior, making it suitable for capacity planning but less precise for analyzing tail latency or transient saturation. + +There are still a few hidden columns. + +- DATABASE: The database picked to run the test against. In this case it is always MySQL. +- TABLES: The number of different tables we are using in this execution. +- DISTRIBUTION: The distribution selected for the given execution. The examples cited in this post all assume that the data distribution is set to` uniform`. You can modify the setting in the YCSB options. +- COMMENTS: A field for recording observations taken while executing the test. + +Side note: these are only the values for Debezium Server container. We are not keeping the data for other containers, although we can see them in the dashboard. + +The following analysis shows the results obtained from the performance executions, where the objective is to evaluate the system's behavior under different load +scenarios and operating conditions. This analysis focuses on identifying trends, bottlenecks, and relevant variations in the key metrics presented earlier. +Let us look at the charts and analyze the results. + +The chart that follows shows the execution time (measured in hours) based on the number of operations per second and the number of tables involved in the execution. +The chart reveals that execution time does not follow a single uniform growth pattern that is driven solely by the number of operations. Although some table groups exhibit an increase in duration as total operations grow, but +this behavior is not consistent across all tables. Several tables show relatively high execution times even at lower operation counts, whereas others maintain low durations despite handling more operations. +This indicates that execution time is strongly influenced by table-specific factors rather than by workload size alone. + +++++ +
+ Duration by operation per second and table amount +
+++++ + +The next chart shows average CPU utilization. Notice that rather than scaling linearly with the number of operations, utilization varies across tables. +In the initial table groups, CPU increases as the workload grows, reaching peak values at a mid-range operation count, after which it stabilizes or even decreases despite higher numbers of operations. +For subsequent tables, CPU usage remains within a relatively narrow range, with only moderate fluctuations, indicating that higher workloads do not necessarily translate into proportionally higher CPU consumption. + +CPU utilization is influenced more by table-specific processing characteristics and execution efficiency than by total operation volume alone. +Other factors - I/O wait, batching or internal optimizations - might play a more significant role in performance at higher workloads. + +++++ +
+ CPU usage by operation per second and table amount +
+++++ + +Memory consumption remains stable across different operation counts and tables, with no clear relationship to the number of operations executed. Most measurements cluster within a similar range, suggesting a consistent baseline memory footprint that is largely independent of workload size. +The following chart suggests that memory usage is dominated by factors related to logical database schema or to configuration, rather than being directly related to the volume of operations. The analysis indicates that memory is unlikely to be the primary scaling constraint. + +++++ +
+ Memory usage by operation per second and table amount +
+++++ + +As we can see in the next figure, network usage varies significantly across tables and operations counts, with output traffic being consistently higher than input traffic in all cases. In the initial table groups, both input and output volumes increase as the number of operations grows, indicating a clear relationship between workload and network utilization. +However, this pattern becomes less consistent in later tables, where network usage remains high or fluctuates despite changes in total operations. This finding suggests that data transfer is influenced more by table-specific characteristics than by the number of operations. + +Overall, we can infer that network utilization is a dominant and variable factor in system behavior, particularly on the output side, and can represent a potential constraint depending on table structure and data change patterns. + +++++ +
+ Network usage by operation per second and table amount +
+++++ + +Our next chart shows how disk activity varies with different table and operation counts. The chart indicates that disk write activity remains relatively stable across most tables and operation counts, indicating a consistent write pattern independent of workload size. +By contrast, disk read activity is minimal or negligible in the early table groups, and becomes more prominent only in later tables, where noticeable spikes appear at specific operation counts. +One table in particular exhibits a pronounced write peak compared to all others, suggesting an isolated behavior such as increased flushing or checkpointing. +This behavior is typically observed during log file rotation or log exchanges, where disk activity is driven primarily by the number of log files generated and the overall execution duration, rather than by the volume of operations. + +++++ +
+ Disk usage by operation per second and table amount +
+++++ + +As we can see in the next figure, the maximum throughput that Debezium achieves varies noticeably across tables, indicating the effect that table characteristics can have on upper performance limits. +Tables 1 and 2 reach the highest values, demonstrating a greater capacity to sustain higher operation rates, while subsequent tables show a gradual reduction in maximum achievable throughput. +This decrease is not strictly linear, as some tables outperform others despite their position, but it clearly reflects table-specific constraints, possibly related to data structure, indexing, or processing complexity. + +The next chart highlights that the maximum number of operations per second is primarily driven by table design and access patterns, rather than by uniform system limits, reinforcing the importance of table-level performance evaluation. + +++++ +
+ Maximum operation per second by table amount +
+++++ + +But, *what is Debezium Server's observed upper bound in number of operations per second?* + +In this next chart we compare the relationship between MySQL and Debezium Server throughput. The initial peak observed at the start of the execution matches the value that we are targeting for this execution. +MySQL fails to keep pace at such high values (1500 operations per second in 1 table). On the Debezium side, the throughput closely mirrors MySQL activity. +In the lower panel, you can see that Debezium's internal queue maintains a high and stable capacity throughout the execution, with no indication of saturation or backlog accumulation. +Overall, the chart confirms that Debezium Server processes MySQL changes efficiently and in near real time, and that the throughput behavior is driven by the database write pattern rather than by downstream processing constraints. + +++++ +
+ MySQL vs Debezium Server throughput comparison +
+++++ + +[id=conclusion-recommendations] +== Conclusion and recommendations + +This study demonstrates that Debezium Server is a mature, production-ready CDC solution and a credible alternative to traditional CDC architectures based on Kafka Connect, Kubernetes or complex distributed setups. +Throughout all tested scenarios, Debezium Server showed stable behavior, predictable resource usage and reliable change propagation, preserving CDC guarantees. +The results confirm that Debezium Server is not an experimental or limited runtime, but a well-engineered product built on top of Debezium's proven CDC engines powered by https://quarkus.io[Quarkus]. + +One of the strongest takeaways from these measurements is that Debezium Server is particularly well suited for small-to-medium-sized projects, lightweight architectures and environments where Kafka and Kubernetes are either unnecessary or operationally expensive. +Its single process deployment model, combined with low and stable CPU and memory requirements, makes it an excellent fit for simpler infrastructures, edge deployments and teams seeking to minimize operational overhead without sacrificing correctness or observability. +In these contexts, Debezium Server significantly lowers the entry barrier to CDC adoption. + +From a performance standpoint, the results clearly show that Debezium Server is not the bottleneck in the tested pipelines. Instead, the upper throughput limit is dictated by MySQL's ability to sustain write operations. +Debezium Server consistently mirrors MySQL throughput in near real time without accumulating lag. This confirms that, for the targeted class of projects, Debezium Server is capable of real-time CDC, with end-to-end performance effectively bounded by the source database rather than by the CDC layer itself. + +With these results and conclusions in mind, if I had to make a recommendation to someone who wants to implement CDC in their pipeline, I would say the following: + +1. *Start with Debezium Server and default configurations.* For small-to-medium projects, Debezium Server provides an excellent starting point without the operational complexity of Kafka Connect or Kubernetes. +The results show that default configurations already deliver stable, predictable and near real-time CDC performance, allowing teams to validate their use cases before investing time in advanced tuning or more complex architectures. +2. *Design tables and schemas with CDC in mind.* Performance is strongly influenced by table-level characteristics rather than raw operations counts. Careful schema design can have a significant impact on CDC throughput and latency. +Optimizing table structures will often yield greater benefits than tuning Debezium itself. +3. *Treat the database as the primary throughput limiter.* The benchmark shows that Debezium Server can keep up with MySQL in near real time and that overall throughput is bounded by database's write capacity. +Performance efforts should therefore focus first on the database before attempting to optimize the CDC layer. +4. *Scale architectural complexity only when necessary.* Kafka, Kubernetes and distributed CDC topologies are powerful but introduce non-trivial operational overhead. For lightweight or well-bounded workloads, Debezium Server alone is often sufficient and easier to operate. +Adopt more complex architectures only if the scalability, availability, or integration requirements clearly justify them. + +In summary, Debezium Server emerges as a simple, efficient and high-performance CDC solution that challenges the assumption that Kafka-centric or Kubernetes-based architectures are always required. +For lightweight projects and streamlined deployments, Debezium Server offers a compelling balance of maturity, simplicity and performance, delivering real-time change data capture while keeping infrastructure complexity to a minimum. + +[id=follow-up] +== What’s next? + +While the results presented in this post provide a clear view of Debezium Server behavior under default configurations, they also raise additional questions worth exploring in follow-up experiments, for example: + +- Discover Debezium Server limits: Improve the architecture and configuration of this experiment to discover the load limits for Debezium Server. +- Deploy Debezium with Kafka Connect: Run this experiment with a distributed and more complex architecture, using Kafka Connect to deploy Debezium. +- Get involved with snapshotting: By omitting snapshots from our experiment, we skipped one the most important Debezium features. Snapshotting certainly merits further performance investigation. +- Run different sets of workloads and databases: We ran the present experiment against MySQL and used a uniform workload. +For a more complete assessment of Debezium's capabilities, we should include different databases, connectors, and workloads in the test environment. diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/benchmark_architecture_blueprint.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/benchmark_architecture_blueprint.png new file mode 100644 index 00000000000..50dc4e782bb Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/benchmark_architecture_blueprint.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/benchmark_completed_scenario.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/benchmark_completed_scenario.png new file mode 100644 index 00000000000..34e32143b04 Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/benchmark_completed_scenario.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/cpu_by_ops_table.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/cpu_by_ops_table.png new file mode 100644 index 00000000000..4154c1c5ee1 Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/cpu_by_ops_table.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/cpu_memory_consumption.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/cpu_memory_consumption.png new file mode 100644 index 00000000000..d905e0cd740 Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/cpu_memory_consumption.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/debezium_server_streaming_mysql_results.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/debezium_server_streaming_mysql_results.png new file mode 100644 index 00000000000..faec84de427 Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/debezium_server_streaming_mysql_results.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/disk_by_ops_table.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/disk_by_ops_table.png new file mode 100644 index 00000000000..e4e5a402a50 Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/disk_by_ops_table.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/duration_by_ops_table.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/duration_by_ops_table.png new file mode 100644 index 00000000000..9e8e5c42941 Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/duration_by_ops_table.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/max_ops_by_table.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/max_ops_by_table.png new file mode 100644 index 00000000000..799ff031dfa Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/max_ops_by_table.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/memory_by_ops_table.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/memory_by_ops_table.png new file mode 100644 index 00000000000..07c3e9adec9 Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/memory_by_ops_table.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/mysql_vs_debezium_server_throughput.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/mysql_vs_debezium_server_throughput.png new file mode 100644 index 00000000000..94ce99ed424 Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/mysql_vs_debezium_server_throughput.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/network_by_ops_table.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/network_by_ops_table.png new file mode 100644 index 00000000000..34469869a7d Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/network_by_ops_table.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/throughput_detailed_view.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/throughput_detailed_view.png new file mode 100644 index 00000000000..0d94425c97d Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/throughput_detailed_view.png differ diff --git a/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/throughput_time_evolution.png b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/throughput_time_evolution.png new file mode 100644 index 00000000000..b191ca2ad6b Binary files /dev/null and b/assets/images/2026-02-02-measuring-debezium-server-performance-mysql-streaming/throughput_time_evolution.png differ diff --git a/assets/images/debezium-server-architecture.png b/assets/images/debezium-server-architecture.png new file mode 100644 index 00000000000..5fb36967111 Binary files /dev/null and b/assets/images/debezium-server-architecture.png differ