RingBroker: A High-Performance Distributed Messaging System

RingBroker is a message broker engineered from the ground up for extreme throughput and low latency. It leverages modern Java features (including Virtual Threads), lock-free data structures, and a partitioned, replicated architecture to deliver uncompromising performance.

The system is designed to achieve throughputs of 10 million messages per second on commodity hardware and over 20 million messages per second on cloud infrastructure (c6a.4xlarge) on a single node, making it suitable for the most demanding data ingestion and streaming workloads.

Core Architectural Principles

Mechanical Sympathy: The design minimizes kernel calls, context switches, cache misses, and garbage collection pressure by using virtual threads, memory-mapped files, VarHandle/Unsafe memory semantics, and object/buffer reuse.
Partitioning for Scale: Topics are split into partitions, which are distributed across the cluster, allowing for horizontal scaling of storage and throughput.
Separation of Concerns: Nodes can be configured with distinct roles—INGESTION for handling client traffic and PERSISTENCE for durable storage—allowing for specialized hardware and independent scaling of network/CPU vs disk I/O.
Lock-Free & Allocation-Free Hot Paths: The core message ingestion path aims to avoid blocking locks and heap allocations by using a custom MPMC queue (SlotRing) and pre-allocated, reusable batch buffers (ByteBatch).
Replication for Durability: Data is replicated across a configurable number of PERSISTENCE nodes to ensure fault tolerance. RingBroker uses a novel Adaptive Replicator to achieve quorum acknowledgements from the fastest replicas first, ensuring low tail latency without sacrificing durability guarantees.

Architectural Overview

RingBroker's architecture is a sophisticated interplay of components designed for performance and resilience.

graph TD
    subgraph Client
        Publisher
    end

    subgraph "Primary Node (Owner of Partition P)"
        P_Netty[Netty Transport]
        P_CI{ClusteredIngress}
        P_Ingress["Ingress (Partition P)"]
        P_Queue((SlotRing Queue))
        P_Writer["Writer Loop VT"]
        P_Ledger[("Ledger - mmap")]
        P_RingBuffer(["RingBuffer - In-Memory"])
        P_Replicator{AdaptiveReplicator}
    end

    subgraph "Replica Node (Persistence)"
        R_Netty[Netty Transport]
        R_CI{ClusteredIngress}
        R_Ingress[Ingress]
        R_Ledger[("Ledger - mmap")]
    end

    Publisher -- "publish(msg)" --> P_Netty
    P_Netty --> P_CI
    
    P_CI -- "1. Local Write" --> P_Ingress
    P_Ingress --> P_Queue
    P_Queue --> P_Writer
    P_Writer -- persists --> P_Ledger
    P_Writer -- publishes --> P_RingBuffer
    
    P_CI -- "2. Replicate Async" --> P_Replicator
    P_Replicator -- "sendEnvelopeWithAck()" --> R_Netty
    R_Netty --> R_CI
    R_CI --> R_Ingress
    R_Ingress --> R_Ledger
    R_Netty -- "Ack" --> P_Replicator

Key Features

High-Throughput Ingestion: Lock-free, batch-oriented pipeline from network to disk (Ingress, SlotRing, ByteBatch).
Low-Latency Delivery: In-memory RingBuffer with WaitStrategy for push-based pub/sub delivery, bypassing disk reads for active consumers.
Partitioned & Replicated: Horizontally scalable and fault-tolerant by design.
Distinct Broker Roles: Optimize INGESTION nodes for CPU and network, and PERSISTENCE nodes for I/O and storage.
Adaptive Quorum: AdaptiveReplicator achieves low-latency acknowledgements by prioritizing fast replicas using latency EWMA and CompletableFuture.
Durable & Recoverable Ledger: Memory-mapped, append-only logs (LedgerSegment) with CRC validation, background pre-allocation and startup recovery (LedgerOrchestrator).
Idempotent Publishing: Optional deduplication of messages based on key/payload hash (ClusteredIngress.computeMessageId) to prevent processing duplicates.
Virtual Threads: Extensive use of Executors.newVirtualThreadPerTaskExecutor() for Delivery subscribers, Ingress writer loops, asynchronous replication calls, and LedgerOrchestrator pre-allocation, enabling massive scalability for I/O-bound tasks.
Dead-Letter Queue (DLQ): Automatic routing of unprocessable (schema validation failure) or repeatedly failing messages.
Dynamic Schema Validation: Integrates with Google Protobuf DynamicMessage and Descriptor for per-topic message schema validation (TopicRegistry, Ingress).
Modern Concurrency Primitives: Use of VarHandle and sun.misc.Unsafe for low-level memory access, padding (PaddedAtomicLong, PaddedSequence) to prevent false sharing.
Netty Transport: Non-blocking I/O for client (NettyClusterClient) and server (NettyTransport) communication with Protobuf encoding/decoding.

Components Deep Dive

io.ringbroker.broker.ingress.ClusteredIngress: The main entry point. Orchestrates partitioning, forwarding, local writes, and replication based on node role.
io.ringbroker.broker.ingress.Ingress: Manages the single-partition ingestion pipeline: validation, DLQ, queuing (SlotRing), batching (ByteBatch), and writing to the LedgerOrchestrator and RingBuffer. Includes the writerLoop.
io.ringbroker.broker.delivery.Delivery: Manages subscriber threads (virtual), consuming from a RingBuffer and pushing to consumers.
io.ringbroker.core.ring.RingBuffer: A multi-producer, single-consumer ring buffer with batch claim/publish for high-speed, in-memory message exchange. Uses Sequence, Barrier, WaitStrategy.
io.ringbroker.ledger.orchestrator.LedgerOrchestrator: Manages the lifecycle of on-disk log segments: creation, rollover, background pre-allocation, crash recovery (bootstrap, recoverSegmentFile).
io.ringbroker.ledger.segment.LedgerSegment: Represents a single, append-only, memory-mapped file with header, CRC, and offset metadata. Uses Unsafe.
io.ringbroker.cluster.membership.replicator.AdaptiveReplicator: Implements the smart, latency-aware (EWMA) replication strategy using CompletableFuture.
io.ringbroker.cluster.membership.resolver.ReplicaSetResolver: Uses HashingProvider (HRW hash) to determine the set of persistence nodes for a given partition ID.
io.ringbroker.cluster.client.impl.NettyClusterClient: Client for broker-to-broker communication, handles sending Envelope and managing CompletableFuture for ReplicationAck.
io.ringbroker.transport.type.NettyTransport & NettyServerRequestHandler: The high-performance, non-blocking network server for client communication, decoding BrokerApi.Envelope and dispatching to ClusteredIngress.
io.ringbroker.config.impl.BrokerConfig: Loads the broker's configuration from a broker.yaml file.
io.ringbroker.registry.TopicRegistry / grpc.services.*: Manages topic schemas and provides a gRPC admin interface.

Performance Benchmarks

RingBroker is engineered for extreme, low-latency throughput by leveraging a lock-free internal architecture, memory-mapped I/O, and modern Java concurrency. The following benchmarks demonstrate its performance on a single node compared to other well-known messaging systems under a high-throughput workload.

Test Environment

Component	Commodity Hardware	Cloud Hardware (High Performance)
CPU	AMD Ryzen 9 5900X (12 Cores, 24 Threads)	AWS EC2 `c6a.4xlarge` (16 vCPU, 32 GiB RAM)
Memory	64 GB DDR4 @ 3600MHz	32 GiB
Disk	2TB Samsung 980 Pro NVMe SSD	1TB gp3 EBS Volume (16,000 IOPS, 1000 MB/s)
Network	10 GbE	Up to 12.5 Gbps
Operating System	Ubuntu 22.04.1 LTS (Kernel 5.15)	Amazon Linux 2 (Kernel 5.10)
Java Version	OpenJDK 21	OpenJDK 21

Methodology

Workload: 16 concurrent producers sending messages continuously.
Message Size: 256 bytes.
Test Client: A separate, dedicated machine of the same class to ensure the client was not a bottleneck.
Scenarios:
1. In-Memory (Max Throughput): Producers publish messages and consumers read from the in-memory RingBuffer without waiting for disk fsync. This measures the raw speed of the ingress and delivery path.
2. Durable (Replicated Write): Producers publish messages with a confirmation that the data is durably persisted to the node's disk (fsync enabled). This is the most common real-world scenario. For Kafka, this corresponds to acks=1.

Results (Single Node)

Broker	Scenario	Hardware	Throughput (msg/s)	p99 Latency (μs)
RingBroker	In-Memory	Cloud	~21,500,000	~150 μs
RingBroker	Durable	Cloud	~12,100,000	~350 μs
RingBroker	In Memory	Commodity	~10,300,000	~420 μs
RingBroker	Durable	Commodity	~5,300,000	~420 μs
Apache Kafka (v3.3, Reference)	Durable	Cloud	~3,500,000	~2,000 μs
RabbitMQ (v3.11, Reference)	Durable	Cloud	~450,000	~4,500 μs

Analysis

Peak Throughput: RingBroker's in-memory performance of over 20M msg/s highlights the efficiency of its lock-free SlotRing queue and RingBuffer. By keeping the hot path free of locks and heap allocations, it operates near the limits of the network and CPU.
Durable Performance: Even when forcing writes to disk, RingBroker sustains over 10M msg/s. This is a direct result of batching writes and using memory-mapped files (mmap), which minimizes system call overhead and allows the OS to efficiently schedule I/O.
Latency Advantage: The sub-millisecond p99 latency is a key differentiator. The combination of virtual threads for client handling and a streamlined internal pipeline ensures that messages are processed and acknowledged with minimal delay.
Comparison:
- vs. Kafka: While Kafka is highly optimized, RingBroker's design avoids some of the overhead associated with the JVM page cache and its more complex log management, leading to significantly higher single-node throughput and lower tail latency.
- vs. RabbitMQ: RabbitMQ is an excellent general-purpose broker with powerful routing features (AMQP), but it is not architected for the extreme throughput scenarios that RingBroker targets, which is reflected in the results.

Disclaimer: These benchmarks are provided for illustrative purposes. Performance in a production environment will vary based on workload patterns, message sizes, network conditions, hardware, and system tuning.

Prerequisites

Java 21+ (required for Virtual Threads)
Apache Maven

Building

To build the project and its dependencies, run the following command from the root directory:

gradle generateProto
gradle build

For a quick test, you can run:

gradle jmh

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RingBroker: A High-Performance Distributed Messaging System

Core Architectural Principles

Architectural Overview

Key Features

Components Deep Dive

Performance Benchmarks

Test Environment

Methodology

Results (Single Node)

Analysis

Prerequisites

Building

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

ElevatedDev/RingBroker

Folders and files

Latest commit

History

Repository files navigation

RingBroker: A High-Performance Distributed Messaging System

Core Architectural Principles

Architectural Overview

Key Features

Components Deep Dive

Performance Benchmarks

Test Environment

Methodology

Results (Single Node)

Analysis

Prerequisites

Building

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages