Skip to content

Comments

Add serialization#120

Open
pavel-kirienko wants to merge 28 commits intomasterfrom
dev
Open

Add serialization#120
pavel-kirienko wants to merge 28 commits intomasterfrom
dev

Conversation

@pavel-kirienko
Copy link
Member

@pavel-kirienko pavel-kirienko commented Feb 16, 2026

This PR adds a single simple utility module _serdes.py that implements (de)serialization. This is useful when one needs to build/parse serialized representations without having to generate code beforehand. I considered extracting this into a separate package but upon looking at how small the module is I decided to embed it into PyDSDL. It integrates very cleanly without affecting the rest of the codebase.

This changeset also renames FrontendError into just Error so that we could also use it for serialization. A compatibility redirect is left behind to avoid API breakage. This is the only change to the rest of the codebase here.

A simple demo is added and docs slightly improved.

Later I want to add a simple command-line shim that takes serialized data line-by-line at the input and provides JSON at the output, and the other way around. This is intentionally not included here yet.

- Add bit-level I/O (_BitWriter, _BitReader) with LSB-first ordering
- Implement primitive codec for all UAVCAN types (bool, int, float, void)
- Implement array codec with fixed/variable-length and special UTF-8/byte handling
- Implement composite codec for structures, unions, and delimited types
- Add comprehensive error hierarchy (SerDesError, ArrayLengthError, UnionFieldError, etc.)
- Include 7 inline test functions with 100+ assertions covering all functionality
- Clean up type hints: use modern syntax (| instead of Union, dict instead of Dict)
- Remove unused imports and add type annotations for class attributes
- All tests passing, zero LSP diagnostics errors
- Implement serialize() function with ServiceType rejection and with_delimiter_header handling
- Implement deserialize() function with ServiceType rejection and with_delimiter_header handling
- Handle DelimitedType with and without delimiter header flag
- Add exports to pydsdl/__init__.py for serialize, deserialize, and all error types
- Add _unittest_serdes_api test covering API scenarios and package-level imports
- All 8 tests passing, zero LSP diagnostics errors
- Add type casts for struct.unpack() return values
- Add type casts for bytes() calls with mixed-type lists
- Add type: ignore comments for intentional test type assignments
- Apply black code formatting
- All quality gates now pass: mypy strict, pylint, black, pytest
Replace the manual _unittest_serdes_branch_coverage_tests dispatcher with directly collected _unittest_* tests so pytest executes each case natively. This removes hand-maintained invocation logic and makes failures easier to localize at the test-case level.

Use pytest parameterization for the previously manually looped scenarios (float special values, bool conversion edge cases, width/cast-mode matrices, byte container variants, API str/bytes permutations, and aligned bit roundtrips).

Add focused bit-level tests that cover the remaining _BitWriter overwrite branches and _BitReader unaligned out-of-bounds zero-extension branch, restoring 100% branch coverage for _serdes.py while keeping strict mypy compatibility via a typed parameterization helper.
@pavel-kirienko
Copy link
Member Author

I think it is in order to move the test files outside of the package. It seemed like a nice idea to keep them close but it's not really working. I don't want to do it in this changeset though.

…nds, and truncated-float overflow per Cyphal spec

- Bug 1: Add writer.align_to(schema.alignment_requirement) after UnionType/StructureType serialization
- Bug 1: Add reader.align_to(schema.alignment_requirement) after UnionType/StructureType deserialization
- Bug 2: Enforce _bit_limit in _BitReader.read_bits() with zero-extension guard
- Bug 3: Wrap struct.pack in try/except OverflowError, produce signed infinity for truncated overflow

All 77 existing tests pass. Evidence: .sisyphus/evidence/task-1-*.txt
…bounds, and truncated-float overflow

- Bug 1 tests (5): nested struct/union with sub-byte fields, already-aligned, nested in array, bool inner (exact evidence case)
- Bug 2 tests (4): zero-extension, parent offset preservation, remaining_bits accuracy, delimited short payload (exact evidence case)
- Bug 3 tests (3): truncated overflow to infinity (float16 & float32), NaN preservation, saturated non-regression

All 89 tests pass (77 old + 12 new). Coverage: 99%.
…entation

- Line 1362: Use outer.bit_length_set.min // 8 instead of hardcoded 3
- Lines 1541-1542: Verify exact IEEE 754 float32 max value (3.4028235e+38) instead of just checking finite positive

Addresses F4 scope fidelity review feedback for better test self-documentation.
@pavel-kirienko
Copy link
Member Author

You can ignore the transient CI instability, it is not currently related to the functional changes and I am looking into that.

@thirtytwobits
Copy link
Member

I'm not a huge fan of this. It pollues the purity of pydsdl as a front-end. Why not add this to Nunavut instead?
That also said, stay tuned. I'm about to post a proposal for a very different solution to serialization and deserialization. I have my AI working on the proposal all day today.

The CI failure on Python 3.14 was coming from the lint session running\n with Black 25.x.\n\n was still formatted according to an older Black style,\nso Black requested reformatting of the source file and its generated copies\nunder  and .\n\nReformat the file with Black 25.x so the check is stable across the\n3.14 jobs and no functional behavior changes are introduced.
@pavel-kirienko
Copy link
Member Author

I'm not a huge fan of this. It pollues the purity of pydsdl as a front-end. Why not add this to Nunavut instead?

I also originally wanted to avoid this on the grounds of purity, but then considering that the serialization logic takes only 800 lines to implement, I decided to make PyDSDL a one-stop shop. Nunavut is an option but then it is a code generation library, so you could also argue that it would be somewhat out of its scope there, and adding serialization to nnvg is probably going to be a little awkward considering its current scope.

Moving the implementation there is a no-brainer (Oh My OpenCode can do that autonomously I think) so it's not a question of effort.

A third alternative is just to publish a separate tiny package but then how many is too many :3

- Add _mk_delimited() helper to create DelimitedType instances
- Add _UNSIGNED_WIDTHS and _SIGNED_WIDTHS constants (19 and 18 widths)
- Add _roundtrip() and _roundtrip_assert() helpers for test validation
- All helpers follow existing pattern (_mk_structure/_mk_union style)
- Foundation for Wave 2-4 test expansion (Tasks 2-13)

Task: serdes-test-expansion/Task-1
Add comprehensive tests for DSDL serialization spec compliance:

Task 2 - Implicit zero extension (7 tests):
- Verify missing data reads as zero per spec §3.7.1.1
- Test struct truncation, empty data, multibyte fields, bools
- Test nested structs, fixed arrays, variable arrays

Task 3 - Implicit truncation (6 tests):
- Verify excess data silently ignored per spec
- Test struct/union/nested excess bytes and bits
- Verify first N fields correct despite trailing data

Task 4 - Delimited type compatibility (8 tests):
- Forward/backward compatibility via delimiter headers
- Old data→new schema (zero extension)
- New data→old schema (truncation)
- Nested delimited, union variants, array of delimited
- Header value validation, API roundtrip

Task 5 - Void deserialization semantics (3 tests):
- Non-zero void bits accepted during deserialization
- All void widths {1,2,3,4,5,7,8,16,32,64} tested
- Verify serialization always produces zeros

All 24 new tests pass. Total: 115 tests (91 original + 24 new).
Add comprehensive type coverage tests for DSDL serialization:

Task 6 - Systematic integer bit widths (7 tests, 112 variants):
- All representative unsigned widths (19 values)
- All representative signed widths (18 values)
- Boundary values: 0, 1, -1, min, max, min-1, max+1
- SATURATED and TRUNCATED cast mode verification
- Two's complement encoding validation
- Float-to-int rounding behavior

Task 7 - Float edge cases (11 tests, 31 variants):
- Negative zero (-0.0) sign bit preservation
- Denormalized values (smallest for 16/32/64)
- Max finite values (65504, 3.4e38, 1.8e308)
- Min positive normalized values
- SATURATED: clamp finite, passthrough NaN/inf
- TRUNCATED: overflow to inf, passthrough NaN
- Float16 precision boundary
- Bool→float coercion

Task 8 - UTF-8 multi-byte + byte arrays (9 tests):
- 2/3/4-byte UTF-8 characters (café, 日本語, 😀🎉)
- Empty strings, capacity boundaries
- Mixed ASCII + multi-byte
- Invalid UTF-8 rejection
- Byte arrays: empty, all 0x00-0xFF, capacity

Task 9 - Variable-length array length fields (6 tests):
- 8-bit length field (capacity ≤ 255)
- 16-bit length field (capacity 256-65535)
- 32-bit length field (capacity ≥ 65536)
- Capacity boundaries: 255→256, 65535→65536
- Wire format verification (little-endian)

All 271 tests pass (91 original + 180 new).
Add comprehensive complex scenario tests for DSDL serialization:

Task 10 - Non-byte-aligned primitive arrays (7 tests):
- bool[N] arrays with LSB-first bit packing
- uint3[4], uint5[3] sub-byte arrays
- Variable-length sub-byte arrays
- Mixed sub-byte struct {uint3, bool, uint5}
- Verified bit patterns: bool[8] [T,F,T,F,T,F,T,F] → 0x55

Task 11 - Union variant scaling + tag width (6 tests):
- Union tag width boundaries: 256 variants → 8-bit, 257 → 16-bit
- Tag width formula verification: 2^ceil(log2(max(8, (n-1).bit_length())))
- Roundtrip all variants for small unions {3, 4}
- Roundtrip variant 0 and max for large unions {256, 257}

Task 12 - Complex nested type roundtrips (8 tests):
- Deep nesting: 3-level and 4-level struct→struct→struct
- Arrays of composites: struct[N], union[N]
- Union with struct variants
- Mixed patterns: struct→union→struct, struct→array→struct
- Complex combined nesting

Task 13 - Mixed alignment, defaults, API edge cases (10 tests):
- Mixed alignment: byte + sub-byte fields (bit-packed)
- Alignment rules: primitives=1 (bit-packed), composites=8 (byte-aligned)
- Default value handling: all defaults, partial defaults
- API edge cases: empty struct, single-field struct, single-variant union
- Type coercions: int→float, list→tuple
- Error handling: invalid types, out-of-range values

All 309 tests pass (91 original + 218 new).
Move type: ignore comments to end of line for UnionType constructors
to satisfy pylint formatting requirements.

Verification results:
- All 309 serdes tests pass
- Full suite: 399 tests pass across all modules
- Lint: 9.98/10 rating (all sessions successful)
- mypy: Success, no issues found in 31 source files
- black: All files formatted correctly

Only new lint warning: C0302 (too many lines in module 3790/3000)
This is expected given we added 2195 lines of comprehensive tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants