-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Summary
When writing a fixed‑length UTF‑16LE string to holding registers, ModbusTools prepends a Byte Order Mark (BOM, U+FEFF) to the payload. This consumes the first 16‑bit register, so an 18‑character field only writes 17 actual characters. Fixed‑length Modbus string fields should not include a BOM; the device already knows the encoding/length from its register map.
Environment
ModbusTools (Client), latest release as of Dec 2025
OS: (Windows / Linux) — Windows
Protocol: (TCP / RTU) — TCP
Addressing: Standard Modbus (1‑based), start at 400001
Field: fixed‑length 18 characters (UTF‑16LE → 36 bytes → 18 holding registers)
Configuration (XML):
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item>
<device>TEST</device>
<address>400001</address>
<format>String</format>
<stringEncoding>UTF-16LE</stringEncoding>
<stringLengthType>FullLength</stringLengthType>
<!-- Fixed-length field intended for 18 UTF-16 code units -->
<value>123456789123456789</value> <!-- 18 BMP chars -->
<variableLength>36</variableLength> <!-- bytes -->
</item>
</>
Steps to Reproduce
Configure a fixed‑length UTF‑16LE string field of 18 characters starting at 400001.
Enter exactly 18 ASCII digits (BMP characters), e.g. 123456789123456789.
Perform Write Multiple Registers (0x10) from the client and capture the frame (Wireshark for TCP or serial log for RTU).
Observe the payload bytes and the number of registers/bytes sent.
Expected behavior
0x10 PDU:
Starting address: 400001 → 0‑based offset 0x0000
Quantity of registers: 0x0012 (18)
Byte count: 0x24 (36)
Payload: 36 bytes of UTF‑16LE code units (no BOM).
First 2 bytes must be the first character’s UTF‑16LE (lowByte highByte), e.g. '1' = 31 00, not a BOM.
Function 10
StartAddr 0000
Quantity 0012
ByteCount 24
Payload: 31 00 32 00 33 00 34 00 ... (18 × 2 bytes)
Actual behavior
0x10 PDU shows Quantity=0x0012 and ByteCount=0x24 (correct),
BUT the first 16‑bit word of the payload is 0xFEFF (bytes FE FF) — a UTF‑16 BOM.
As a result, only 17 characters follow within the 36‑byte payload, so the 18th character is dropped.
Reading back yields the BOM + 17 characters (or the device trims the BOM), effectively truncating by one character.
Example (annotated payload; first word is BOM):
Payload: FE FF | 31 00 32 00 33 00 34 00 35 00 36 00 37 00 38 00
39 00 31 00 32 00 33 00 34 00 35 00 36 00 37 00 38 00
^ BOM consumes register 400001; only 17 chars remain in 36 bytes
Why this is incorrect
BOM (U+FEFF) is a file/text stream signature to indicate endianness for UTF‑16; it is optional and intended for text files/streams—not binary protocols with fixed layouts. In UTF‑16BE it appears as FE FF; in UTF‑16LE as FF FE. [cnblogs.com], [github.com]
A Modbus device’s register map already defines the string field’s length and encoding; adding a BOM wastes one register and causes off‑by‑one truncation in fixed‑length fields. Strings over Modbus are commonly sent without terminators or BOMs, using padding (zeros or spaces) when shorter. [modbuskit.com]
Proposed fix
Do not insert BOM for stringEncoding=UTF-16LE (or UTF-16BE) when stringLengthType=FullLength.
Encode payload as a sequence of UTF‑16 code units only (per configured endianness) and fill any remaining registers with the specified padding (e.g., 0x0000 or 0x0020).
If a BOM option is desired for some niche scenarios, expose it as a toggle (default: off) and document that BOM is not appropriate for Modbus fixed‑length string fields.
Additional notes / test suggestions
With 17 characters in an 18‑register field, current behavior “looks correct” because BOM + 17 chars = 18 registers—this masks the bug unless exactly full length is used.
Please confirm whether BOM is added on reads (decoding) and ensure symmetry: do not treat leading U+FEFF in the middle of a Modbus string as a BOM; it’s just a code point if present