-
Notifications
You must be signed in to change notification settings - Fork 18
Add ADR for large message chunking in MQTT protocol #607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 16 commits
8644e85
6de0c8e
75a767c
e284083
8c29c44
2251ff3
a8a9cc1
4967150
0af2742
cc724bb
11de62b
0bb4de6
18b36fe
833774f
0199e10
448f8e2
aa8e946
8e9c760
e323780
f6894f8
98baf9b
f95fe5c
788dc3d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| # ADR 20: Large Message Chunking in MQTT Protocol | ||
|
|
||
| ## Status | ||
|
|
||
| Proposed | ||
|
|
||
| ## Context | ||
|
|
||
| The MQTT protocol has inherent message size limitations imposed by brokers and network constraints. Azure IoT Operations scenarios often require transmitting payloads that exceed these limits (e.g., firmware updates, large telemetry batches, complex configurations). Without a standardized chunking mechanism, applications must implement their own fragmentation strategies, leading to inconsistent implementations and interoperability issues. | ||
|
|
||
| ## Decision | ||
|
|
||
| We will implement sdk-level message chunking as part of the Protocol layer to transparently handle messages exceeding the MQTT broker's maximum packet size. | ||
|
|
||
| **The chunking mechanism will**: | ||
|
|
||
| - Be enabled/disabled by a configuration setting. | ||
maximsemenov80 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - Use standardized user properties for chunk metadata: | ||
|
|
||
| - The `__chunk` user property will contain a JSON object with chunking metadata. | ||
| - The JSON structure will include: | ||
|
|
||
| ```json | ||
| { | ||
| "messageId": "unique-id-for-chunked-message", | ||
maximsemenov80 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| "chunkIndex": 0, | ||
| "totalChunks": 5, | ||
| "checksum": "message-hash" | ||
| } | ||
| ``` | ||
|
|
||
| - `messageId, chunkIndex` - present for every chunk; `totalChunks, checksum` - present only for the first chunk. `messageId` is UUID. | ||
|
|
||
| **Chunk size calculation:** | ||
|
|
||
| - Maximum chunk size will be derived from the MQTT CONNECT packet's Maximum Packet Size. | ||
| - A static overhead value will be subtracted from the Maximum Packet Size to account for MQTT packet headers, topic name, user properties, and other metadata. | ||
maximsemenov80 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| - The overhead size will be configurable, large enough to simplify calculations while ensuring we stay under the broker's limit. | ||
|
|
||
| **Chunk Timeout Mechanism** | ||
|
|
||
| > [MQTT-3.3.2-6] | The PUBLISH packet sent to a Client by the Server MUST contain a Message Expiry Interval set to the received value minus the time that the message has been waiting in the Server. | ||
|
|
||
| The receiving client uses the Message Expiry Interval from the first chunk as the timeout period for collecting all remaining chunks of the message. | ||
|
|
||
| Edge case: | ||
| - Since the Message Expiry Interval is specified in seconds, chunked messages may behave differently than single messages when the expiry interval is very short (e.g., 1 second remaining). For a single large message, the QoS flow would complete even if the expiry interval expires during transmission. However, with chunking, if the remaining expiry interval is too short to receive all chunks, the message reassembly will fail due to timeout. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for other message expiry calculations, we always round partial seconds up, never down - I imagine doing something similar here should maintain acceptable behavior. To your statement about
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it means it does not matter if the single message expires during QoS flow execution it will complete (even if it involves whole message resend), but in case of chunks at the very end of expiration interval it is possible some tail chunks would expire before their transfer started thus creating effect of the whole (original) message expires midflight and transfer canceled. |
||
| - In case of QoS 0 and no Message Expiry Interval (forever message) if any of the chunks lost during transmission client will never cleanup assembler buffer for this message, thus use of the chunking should be restricted to QoS 1/2. | ||
|
|
||
| **Checksum Algorithm Options for MQTT Message Chunking** | ||
|
|
||
| SDK will provide user with options to inject their algorithm of choice or use SDK's default SHA-256. | ||
|
|
||
| **Implementation layer:** | ||
|
|
||
| - Sending Process: | ||
| - When a payload exceeds the maximum packet size, the message is split into fixed-size chunks (with potentially smaller last chunk) | ||
| - Each chunk is sent as a separate MQTT message with the same topic but with chunk metadata. | ||
| - Effort should be made to minimize user properties copied over to every chunk: first chunk will have full set of original user properties and the rest only those that are necessary to reassemble original message (ex.: ```$partition``` property to support shared subscriptions). | ||
| - QoS settings are maintained across all chunks. | ||
| - Receiving Process: | ||
| - The Chunking aware client receives messages and identifies chunked messages by the presence of chunk metadata. | ||
| - Chunks are stored in a temporary buffer, indexed by message ID and chunk index. | ||
| - When all chunks for a message ID are received, they are reassembled in order and message checksum verified (see Checksum Algorithm Options for MQTT Message Chunking). | ||
| - The reconstructed message is then processed as a single message by the application. | ||
| - Receiving Failures: | ||
| - Message timeout interval ended before all chunks received. | ||
| - Reassembly buffer size limit reached before all chunks received. | ||
| - Calculated checksum does not match checksumm from chunk metadata. | ||
|
|
||
| **Configuration settings:** | ||
| - Enable/Disable | ||
| - Overhead size | ||
| - Reassembly buffer size limit | ||
maximsemenov80 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ### Implementation Considerations | ||
|
|
||
| - **Error Handling:** | ||
| - Chunk timeout mechanisms (see Chunk Timeout Mechanism) | ||
| - Error propagation to application code | ||
| - **Performance Optimization:** | ||
| - Concurrent chunk transmission | ||
| - Efficient memory usage during reassembly | ||
| - Maximum reassembly buffer size limits (configurable) | ||
|
|
||
| ### Benefits | ||
|
|
||
| - **Property Preservation:** maintains topic, QoS, and other message properties consistently. | ||
| - **Network Optimized:** allows efficient transmission of large payloads over constrained networks. | ||
|
|
||
| ### Compatibility | ||
|
|
||
| - Non-chunking-aware clients will receive individual chunks as separate messages. Chunks reassembly could be implemented on the application side, given chunking implementation is known. | ||
|
|
||
| ## Appendix | ||
|
|
||
| ### Message Flow Diagram | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| participant Sender as Sending Client | ||
| participant Broker as MQTT Broker | ||
| participant Receiver as Receiving Client | ||
|
|
||
| Note over Sender: Large message (>max size) | ||
| Sender->>Sender: Split into chunks | ||
| Note right of Sender: Calculate chunk size:<br/>MaxPacketSize - Overhead | ||
|
|
||
| loop For each chunk | ||
| Sender->>Broker: MQTT PUBLISH with __chunk metadata | ||
| Note over Broker: No special handling<br/>required by broker | ||
| Broker->>Receiver: Forward chunk | ||
| Note over Receiver: First chunk starts timeout countdown | ||
| Receiver->>Receiver: Store in buffer | ||
| Note over Receiver: Index by:<br/>messageId + chunkIndex | ||
| end | ||
|
|
||
| alt Success Path | ||
| Note over Receiver: All chunks received | ||
| Receiver->>Receiver: Verify checksum | ||
| Note right of Receiver: SHA-256 or<br/>custom algorithm | ||
| Receiver->>Receiver: Reassemble message | ||
| Note over Receiver: Process complete message | ||
| else Failure: Timeout | ||
| Note over Receiver: Message Expiry Interval exceeded | ||
| Receiver->>Receiver: Timeout occurred | ||
| Receiver->>Receiver: Cleanup buffers | ||
| Note over Receiver: Notify application:<br/>ChunkTimeoutError | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think for these error cases, we would just log the error and ignore the received message, similar to any other invalid message that we've received. There's no action the application can take
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we will need some new rpc errors for communicating chunking issues to the invoker in the command response
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (bad example as this one would be past when the invoker is listening, but for the buffer size full there should be some communication) |
||
| else Failure: Buffer Limit | ||
| Note over Receiver: Reassembly buffer full | ||
| Receiver->>Receiver: Reject new chunks | ||
| Receiver->>Receiver: Cleanup oldest incomplete messages | ||
| Note over Receiver: Notify application:<br/>BufferLimitExceededError | ||
| else Failure: Checksum Mismatch | ||
| Note over Receiver: All chunks received | ||
| Receiver->>Receiver: Calculate checksum | ||
| Note over Receiver: Checksum verification failed | ||
| Receiver->>Receiver: Discard reassembled message | ||
| Receiver->>Receiver: Cleanup buffers | ||
| Note over Receiver: Notify application:<br/>ChecksumMismatchError | ||
| end | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a key question we need to answer is - when can the client chunk, vs not.
This will depend on whether the receiver is able to understand our chunking protocol.
So we need to enable this only when both sides of the communication pipe use the same mechanism. One example is mRPC. Telemetry can also be applicable, but I suppose telemetry could also be asymmetric?