Skip to content
Open
Changes from 16 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8644e85
Add ADR for large message chunking in MQTT protocol
maximsemenov80 Mar 17, 2025
6de0c8e
accept suggestion
maximsemenov80 Mar 18, 2025
75a767c
accept suggestion
maximsemenov80 Mar 18, 2025
e284083
Address review comments
maximsemenov80 Mar 19, 2025
8c29c44
progress
maximsemenov80 Mar 24, 2025
2251ff3
progress
maximsemenov80 Mar 24, 2025
a8a9cc1
Address coments to large message chunking implementation details
maximsemenov80 May 13, 2025
4967150
progress
maximsemenov80 May 13, 2025
0af2742
Update doc/dev/adr/0020-large-message-chunking.md
maximsemenov80 May 14, 2025
cc724bb
Address review comments
maximsemenov80 Jun 5, 2025
11de62b
progress
maximsemenov80 Jun 6, 2025
0bb4de6
progress
maximsemenov80 Jun 6, 2025
18b36fe
Enhance large message chunking documentation with failure handling an…
maximsemenov80 Jun 9, 2025
833774f
Clarify handling of message chunks for non-chunking-aware clients in …
maximsemenov80 Jun 9, 2025
0199e10
Update 0020-large-message-chunking.md
maximsemenov80 Jun 9, 2025
448f8e2
Update 0020-large-message-chunking.md
maximsemenov80 Jun 10, 2025
aa8e946
Update 0020-large-message-chunking.md
maximsemenov80 Jun 10, 2025
8e9c760
Update 0020-large-message-chunking.md
maximsemenov80 Jun 11, 2025
e323780
rename file
maximsemenov80 Jul 9, 2025
f6894f8
Update 0023-large-message-chunking.md
maximsemenov80 Jul 9, 2025
98baf9b
Update 0023-large-message-chunking.md
maximsemenov80 Jul 9, 2025
f95fe5c
Update 0023-large-message-chunking.md
maximsemenov80 Jul 9, 2025
788dc3d
Update 0023-large-message-chunking.md
maximsemenov80 Aug 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 142 additions & 0 deletions doc/dev/adr/0020-large-message-chunking.md

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a key question we need to answer is - when can the client chunk, vs not.
This will depend on whether the receiver is able to understand our chunking protocol.
So we need to enable this only when both sides of the communication pipe use the same mechanism. One example is mRPC. Telemetry can also be applicable, but I suppose telemetry could also be asymmetric?

Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# ADR 20: Large Message Chunking in MQTT Protocol

## Status

Proposed

## Context

The MQTT protocol has inherent message size limitations imposed by brokers and network constraints. Azure IoT Operations scenarios often require transmitting payloads that exceed these limits (e.g., firmware updates, large telemetry batches, complex configurations). Without a standardized chunking mechanism, applications must implement their own fragmentation strategies, leading to inconsistent implementations and interoperability issues.

## Decision

We will implement sdk-level message chunking as part of the Protocol layer to transparently handle messages exceeding the MQTT broker's maximum packet size.

**The chunking mechanism will**:

- Be enabled/disabled by a configuration setting.
- Use standardized user properties for chunk metadata:

- The `__chunk` user property will contain a JSON object with chunking metadata.
- The JSON structure will include:

```json
{
"messageId": "unique-id-for-chunked-message",
"chunkIndex": 0,
"totalChunks": 5,
"checksum": "message-hash"
}
```

- `messageId, chunkIndex` - present for every chunk; `totalChunks, checksum` - present only for the first chunk. `messageId` is UUID.

**Chunk size calculation:**

- Maximum chunk size will be derived from the MQTT CONNECT packet's Maximum Packet Size.
- A static overhead value will be subtracted from the Maximum Packet Size to account for MQTT packet headers, topic name, user properties, and other metadata.
- The overhead size will be configurable, large enough to simplify calculations while ensuring we stay under the broker's limit.

**Chunk Timeout Mechanism**

> [MQTT-3.3.2-6] | The PUBLISH packet sent to a Client by the Server MUST contain a Message Expiry Interval set to the received value minus the time that the message has been waiting in the Server.

The receiving client uses the Message Expiry Interval from the first chunk as the timeout period for collecting all remaining chunks of the message.

Edge case:
- Since the Message Expiry Interval is specified in seconds, chunked messages may behave differently than single messages when the expiry interval is very short (e.g., 1 second remaining). For a single large message, the QoS flow would complete even if the expiry interval expires during transmission. However, with chunking, if the remaining expiry interval is too short to receive all chunks, the message reassembly will fail due to timeout.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for other message expiry calculations, we always round partial seconds up, never down - I imagine doing something similar here should maintain acceptable behavior.

To your statement about the QoS flow would (not) complete for the chunking scenario, what exactly do you mean by this? Just that the message might not be received by the end application if all chunks aren't delivered in time, or is there some ramification about the QoS flow not completing - we would definitely still need to ack all messages for the chunking scenario

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means it does not matter if the single message expires during QoS flow execution it will complete (even if it involves whole message resend), but in case of chunks at the very end of expiration interval it is possible some tail chunks would expire before their transfer started thus creating effect of the whole (original) message expires midflight and transfer canceled.

- In case of QoS 0 and no Message Expiry Interval (forever message) if any of the chunks lost during transmission client will never cleanup assembler buffer for this message, thus use of the chunking should be restricted to QoS 1/2.

**Checksum Algorithm Options for MQTT Message Chunking**

SDK will provide user with options to inject their algorithm of choice or use SDK's default SHA-256.

**Implementation layer:**

- Sending Process:
- When a payload exceeds the maximum packet size, the message is split into fixed-size chunks (with potentially smaller last chunk)
- Each chunk is sent as a separate MQTT message with the same topic but with chunk metadata.
- Effort should be made to minimize user properties copied over to every chunk: first chunk will have full set of original user properties and the rest only those that are necessary to reassemble original message (ex.: ```$partition``` property to support shared subscriptions).
- QoS settings are maintained across all chunks.
- Receiving Process:
- The Chunking aware client receives messages and identifies chunked messages by the presence of chunk metadata.
- Chunks are stored in a temporary buffer, indexed by message ID and chunk index.
- When all chunks for a message ID are received, they are reassembled in order and message checksum verified (see Checksum Algorithm Options for MQTT Message Chunking).
- The reconstructed message is then processed as a single message by the application.
- Receiving Failures:
- Message timeout interval ended before all chunks received.
- Reassembly buffer size limit reached before all chunks received.
- Calculated checksum does not match checksumm from chunk metadata.

Check warning on line 69 in doc/dev/adr/0020-large-message-chunking.md

View workflow job for this annotation

GitHub Actions / CI-spelling

Unknown word (checksumm) Suggestions: (checksum, checksums, checks, checkmk, checkup)

**Configuration settings:**
- Enable/Disable
- Overhead size
- Reassembly buffer size limit

### Implementation Considerations

- **Error Handling:**
- Chunk timeout mechanisms (see Chunk Timeout Mechanism)
- Error propagation to application code
- **Performance Optimization:**
- Concurrent chunk transmission
- Efficient memory usage during reassembly
- Maximum reassembly buffer size limits (configurable)

### Benefits

- **Property Preservation:** maintains topic, QoS, and other message properties consistently.
- **Network Optimized:** allows efficient transmission of large payloads over constrained networks.

### Compatibility

- Non-chunking-aware clients will receive individual chunks as separate messages. Chunks reassembly could be implemented on the application side, given chunking implementation is known.

## Appendix

### Message Flow Diagram

```mermaid
sequenceDiagram
participant Sender as Sending Client
participant Broker as MQTT Broker
participant Receiver as Receiving Client

Note over Sender: Large message (>max size)
Sender->>Sender: Split into chunks
Note right of Sender: Calculate chunk size:<br/>MaxPacketSize - Overhead

loop For each chunk
Sender->>Broker: MQTT PUBLISH with __chunk metadata
Note over Broker: No special handling<br/>required by broker
Broker->>Receiver: Forward chunk
Note over Receiver: First chunk starts timeout countdown
Receiver->>Receiver: Store in buffer
Note over Receiver: Index by:<br/>messageId + chunkIndex
end

alt Success Path
Note over Receiver: All chunks received
Receiver->>Receiver: Verify checksum
Note right of Receiver: SHA-256 or<br/>custom algorithm
Receiver->>Receiver: Reassemble message
Note over Receiver: Process complete message
else Failure: Timeout
Note over Receiver: Message Expiry Interval exceeded
Receiver->>Receiver: Timeout occurred
Receiver->>Receiver: Cleanup buffers
Note over Receiver: Notify application:<br/>ChunkTimeoutError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for these error cases, we would just log the error and ignore the received message, similar to any other invalid message that we've received. There's no action the application can take

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need some new rpc errors for communicating chunking issues to the invoker in the command response

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(bad example as this one would be past when the invoker is listening, but for the buffer size full there should be some communication)

else Failure: Buffer Limit
Note over Receiver: Reassembly buffer full
Receiver->>Receiver: Reject new chunks
Receiver->>Receiver: Cleanup oldest incomplete messages
Note over Receiver: Notify application:<br/>BufferLimitExceededError
else Failure: Checksum Mismatch
Note over Receiver: All chunks received
Receiver->>Receiver: Calculate checksum
Note over Receiver: Checksum verification failed
Receiver->>Receiver: Discard reassembled message
Receiver->>Receiver: Cleanup buffers
Note over Receiver: Notify application:<br/>ChecksumMismatchError
end
```
Loading