Fix XML rendering crash with non-printable characters#8806
Fix XML rendering crash with non-printable characters#8806tautschnig wants to merge 1 commit intodiffblue:developfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #8806 +/- ##
========================================
Coverage 80.00% 80.01%
========================================
Files 1700 1700
Lines 188271 188322 +51
Branches 73 73
========================================
+ Hits 150632 150689 +57
+ Misses 37639 37633 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
My understanding is that this is still invalid XML. So we either error, or produce invalid XML. |
|
To make progress, how about encoding the null-character as |
Applied your proposal with small extensions, and backslashes are now also escaped. |
There was a problem hiding this comment.
Pull request overview
Fixes an XML rendering crash by changing how non-printable characters are emitted, and adds coverage to prevent regressions.
Changes:
- Update XML escaping to encode invalid XML 1.0 control characters as C-style escape sequences (instead of aborting).
- Add unit tests covering escaping behavior for attributes and node data.
- Add a CBMC regression test exercising
--trace --xml-uiwith non-printable characters in traces.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| unit/util/xml.cpp | Adds unit tests validating escaping rules for non-printable characters and backslashes. |
| src/util/xml.cpp | Changes escaping logic to avoid invariants/aborts on invalid characters and emits escape sequences. |
| regression/cbmc/xml-nonprintable-chars/test.desc | Adds a regression test description for the crash scenario (currently mismatched with new behavior). |
| regression/cbmc/xml-nonprintable-chars/main.c | Adds a small C program to generate a failing trace containing a non-printable byte. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Replace DATA_INVARIANT crash with proper escaping when non-printable characters appear in XML output. Characters valid in XML 1.0 (TAB, LF, CR) use numeric character references (e.g., 	). Characters invalid in XML 1.0 (null, other control chars, DEL) use C-style escape sequences (\0, \x01). Backslashes are escaped as \\ to avoid ambiguity. Hex escapes always use two digits to prevent misreading (e.g., \x01 followed by 'f' cannot be confused with \x1f). The shared escaping logic is extracted into a helper function to avoid duplication between escape() and escape_attribute(). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Handle non-printable characters in XML attributes by escaping them as numeric character references instead of aborting.
Co-authored-by: Kiro autonomous agent
Fixes: #7073