- 
                Notifications
    You must be signed in to change notification settings 
- Fork 90
Decentralized Message Queue (DMQ) Implementation Overview
This document serves as an extension to CIP-137, providing technical insights and implementation guidance for the proposed Decentralized Message Queue (DMQ), primarily supporting Mithril protocol operations.
Significant progress toward enabling Mithril has already been achieved.
Notably, an initial implementation step was
merged into the
ouroboros-network repository, and a document highlighting decisions and
goals
demonstrates ongoing progress. Current development progress is actively
tracked in this
issue.
This document is divided into two parts. The first section outlines the network team's interpretation of CIP-137 requirements, discussing both implementation considerations and expected business logic. This will form the basis for collaboration with the Mithril team. The second section will provide a more detailed implementation plan.
The objective is to leverage Cardano's Network Diffusion layer to distribute various types of information—in this case, signatures—by creating multiple overlay networks. Achieving this goal involves:
- Ensuring Cardano's Network Diffusion layer is reusable;
- Implementing CIP-137.
The first requirement has already been met, with more information available here. The second requirement necessitates implementing three mini-protocols:
- Message Submission Protocol: Node-to-node communication employing a pull-based model for efficient and secure diffusion.
- Local Message Submission Protocol: Enables local clients (e.g., Mithril Signers) to submit messages securely.
- Local Message Notification Protocol: Allows clients to receive notifications of new messages via the network node.
This section captures the sequence diagram of signatures through the Mithril network architecture.
flowchart RL
    subgraph MPA[Mithril Processes A]
        MSA1[Mithril Signer A1]
        MSA2[Mithril Signer A2]
        MAA1[Mithril Aggregator A1]
        MAA2[Mithril Aggregator A2]
    end
    subgraph MN[Mithril Diffusion Network]
        DMQN1[DMQ Node 1]
        DMQN2[DMQ Node 2]
        DMQN3[DMQ Node 3]
        DMQN1 <-- "N2N Signature Submission Protocol" --> DMQN2
        DMQN1 <-- "N2N Signature Submission Protocol" --> DMQN3
        DMQN2 <-- "N2N Signature Submission Protocol" --> DMQN3
    end
    subgraph MPB[Mithril Processes B]
        MSB1[Mithril Signer B1]
        MSB2[Mithril Signer B2]
        MAB1[Mithril Aggregator B1]
        MAB2[Mithril Aggregator B2]
    end
    MSA1 -- "N2C Local Message Submission Protocol" --> DMQN1
    MSA2 -- "N2C Local Message Submission Protocol" --> DMQN1
    MAA1 <-- "N2C Local Message Notification Protocol" --> DMQN1
    MAA2 <-- "N2C Local Message Notification Protocol" --> DMQN1
    DMQN3 -- "N2C Local Message Submission Protocol" --> MSB1
    DMQN3 -- "N2C Local Message Submission Protocol" --> MSB2
    DMQN3 <-- "N2C Local Message Notification Protocol" --> MAB1
    DMQN3 <-- "N2C Local Message Notification Protocol" --> MAB2
    Signatures in the above diagram, flow from Signers to Aggregators. Starting from Mithril Signers that submit messages to local DMQ nodes via the N2C Local Message Submission Protocol, these messages are then diffused to other nodes through the N2N Signature Diffusion Protocol. Mithril Aggregators, connected via the N2C Message Notification Protocol, then receive notifications of new signatures.
The Local Message Submission mini-protocol allows local clients, such as Mithril Signers, to submit messages directly to network nodes. Due to mutual trust between local processes, risks like malformed messages, excessive message sizes, or invalid contents are minimal.
Connections are short-lived, reducing complexity in resource management and connection handling.
This protocol enables local clients (e.g., Mithril Aggregators) to receive timely notifications from network nodes regarding newly received messages.
As currently specified, the protocol involves short-lived connections where an Aggregator node queries the DMQ node for a single message at a time. This design limitation can result in inefficiencies when multiple messages are available, forcing the Aggregator node to continuously poll the DMQ node. To avoid unnecessary polling, the protocol could instead be enhanced by allowing the DMQ node to indicate, within its response, whether additional messages are available and, if so, how many. This adjustment would streamline communication, reduce redundant connection attempts, and improve overall efficiency with minimal protocol complexity. However, there will be burst of messages shortly after the opening of a signing round, resulting in around 3000 messages per round to be available at a time. Given this it makes more sense to send all messages available in batches (e.g. returning a list of messages instead of just one message and a flag) and rely on the client side to request messages at a suitable rate.
This protocol facilitates message diffusion between DMQ nodes, utilizing a pull-based strategy. The inbound side explicitly requests new messages from peers, thereby efficiently managing resource consumption and safeguarding against potential DoS attacks.
This protocol closely mirrors the Cardano Transaction Submission Protocol,
leveraging existing logic from the ouroboros-network library.
Before being diffused to other peers, an incoming message must be verified by the receiving node. The message contains almost all the information required to validate a message (as explained in the CIP), the only external piece of information needed is a snapshot of the stake distribution that can be cached once every couple of epochs since it doesn't change too often.
It is possible to perform concurrent/parallel verification of multiple messages since there's no dependency between individual messages.
Messages can be garbage collected after their expected TTL has expired. The TTL should at least cover the duration of a signature round (e.g., 10 minutes). However, DMQ nodes are unaware of Mithril-specific details like signing round durations. As a result, they cannot independently determine an appropriate TTL for messages in their mempool — but Signer nodes can.
If Signer nodes include a TTL with each message submitted via the N2C Message Submission protocol, DMQ nodes can use and propagate that value accordingly.
To prevent adversarial nodes from abusing this by assigning excessively long TTLs (potentially leading to DoS attacks), a maximum allowed TTL should be configurable at the protocol level. This lets peers reject connections or messages that exceed the acceptable TTL threshold, protecting against misuse.
Currently, no specialized network parameters are required beyond the standard
NodeToNodeVersionData.
Protocol message size limits as defined by CIP-137 are:
| Message part | Lower bound | Upper bound | 
|---|---|---|
| messageBody | 360 B | 2,000 B | 
| blockNumber | 4 B | 4 B | 
| ttl (min 0, max 65535s) | 2 B | 2 B | 
| kesSignature | 448 B | 448 B | 
| operationalCertificate | 304 B | 304 B | 
| message part totals | 1,150 B | 2,790 B | 
These limits should guide the sizing of protocol tokens.
The Node-to-Client Message Submission Protocol implementation adheres to the CIP mini-protocol specification. The protocol uses the following concrete CDDL specification:
localMessageSubmissionMessage
  = msgSubmitMessage
  / msgAcceptMessage
  / msgRejectMessage
  / msgDone
msgSubmitMessage = [0, message]
msgAcceptMessage = [1]
msgRejectMessage = [2, reason]
msgDone          = [3]
reason = invalid
       / alreadyReceived
       / ttlTooLarge
       / other
invalid         = [0, tstr]
alreadyReceived = [1]
ttlTooLarge     = [2]
other           = [3, tstr]
messageId    = bstr
messageBody  = bstr
blockNumber  = word32
ttl          = word16
kesSignature = bstr
operationalCertificate = bstr
message = [
  messageId,
  messageBody,
  blockNumber,
  ttl
  kesSignature,
  operationalCertificate
]
Only the server side of this protocol is relevant for this implementation. Upon establishing a new connection via the local socket, the server processes inbound messages according to this protocol. When receiving a MsgSubmitMessage, the server must validate the message prior to adding it to the internal mempool. If the message is invalid or has already been received, the server must reject it by sending a MsgRejectMessage with the relevant reason and immediately close the connection. If the message is valid, the server adds it to the mempool and responds with MsgAcceptMessage. After the response, the connection is closed when the protocol concludes with a MsgDone.
The Node-to-Client Message Notification Protocol also follows the CIP mini-protocol specification (with a very small modification) and uses the following CDDL specification:
localMessageNotificationMessage
  = msgNextMessages
  / msgHasMessages
  / msgNoMessages
  / msgDone
msgNextMessages = [0]
msgHasMessages  = [1, messages]
msgNoMessages   = [2]
msgDone         = [3]
messageId    = bstr
messageBody  = bstr
blockNumber  = word32
ttl          = word16
kesSignature = bstr
operationalCertificate = bstr
message = [
  messageId,
  messageBody,
  blockNumber,
  kesSignature,
  operationalCertificate
]
messages = [* message]
Again, this protocol is concerned primarily with the server side. Upon establishing a connection via the local socket, the server awaits a MsgNextMessages from the client. In response, the server sends all available unsent messages, marking them as sent, and replies using MsgHasMessages. If no new messages are available, the server must respond with a MsgNoMessages. The connection concludes with a MsgDone.
Note: This protocol currently assumes one-to-one client-server connections and does not explicitly support one-to-many scenarios where multiple Aggregator nodes concurrently request messages. To support multiple Aggregators, the server must track which messages have been sent to which clients. A configurable limit on the fan-out (out-degree) of client connections should be introduced to manage resource usage effectively, rejecting additional incoming connections once the configured limit is exceeded. For the first iteration of this protocol, only one-to-one connections are going to be supported.
The Node-to-Node Message Submission Protocol adheres to the CIP mini-protocol specification and CDDL specification.
This protocol involves both outbound and inbound sides. Upon establishing a connection, the Handshake protocol runs using NodeToNodeVersionData. Currently, there are no specific protocol parameters to negotiate, so handshake data remains standard and does not directly impact the Signature Submission Protocol.
After the handshake, the Signature Submission Protocol is initialized exclusively for peers identified as hot by the diffusion layer, indicating high activity or network value.
To facilitate the protocol operations, a shared internal state (the mempool) is maintained. This mempool stores all messages and tracks their status to determine readiness for Aggregator consumption.
Messages enter the mempool through two paths:
- Local Message Submission Protocol (from Signer nodes)
- Message Submission Protocol (from peer DMQ nodes)
Every message entering the mempool undergoes validation as defined by the CIP (see Message Authentication Mechanism). Each message is independent, allowing parallel validation.
Messages are removed from the mempool upon exceeding their TTL. The TTL for each message will either come from the Signer node for locally received messages or from other DMQ nodes, assuming that they respect the max TTL value allowed.
The current design allows for the possibility that some Aggregators might not receive all diffused messages. However, we believe this scenario is considered acceptable, given that the Aggregators do not require this invariant to hold in order to function properly.
Work is ongoing to develop a reusable version of the transaction submission protocol (utilized by cardano-node), which is anticipated to simplify implementation efforts for this protocol.
Message validation requires access to the most recent Peer Stake Distribution data, obtainable from existing mechanisms such as cardano-cli. A dedicated mechanism is required to periodically fetch and refresh the stake distribution snapshot every few epochs. This can be efficiently managed via a lightweight background thread periodically updating a shared mutable state variable.