Skip to content

Fix XML rendering crash with non-printable characters#8806

Open
tautschnig wants to merge 1 commit intodiffblue:developfrom
tautschnig:fix-7073-xml
Open

Fix XML rendering crash with non-printable characters#8806
tautschnig wants to merge 1 commit intodiffblue:developfrom
tautschnig:fix-7073-xml

Conversation

@tautschnig
Copy link
Collaborator

Handle non-printable characters in XML attributes by escaping them as numeric character references instead of aborting.

Co-authored-by: Kiro autonomous agent

Fixes: #7073

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.01%. Comparing base (7ff43f7) to head (4a22a87).
⚠️ Report is 4 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8806   +/-   ##
========================================
  Coverage    80.00%   80.01%           
========================================
  Files         1700     1700           
  Lines       188271   188322   +51     
  Branches        73       73           
========================================
+ Hits        150632   150689   +57     
+ Misses       37639    37633    -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tautschnig tautschnig marked this pull request as ready for review December 9, 2025 14:34
@kroening
Copy link
Collaborator

My understanding is that this is still invalid XML. So we either error, or produce invalid XML.

@kroening
Copy link
Collaborator

To make progress, how about encoding the null-character as \0? This would be better than dropping the character, and better than aborting. The downside is that it is indistinguishable from the sequence "backslash" "zero" unless we wanted to also escape backslashes.

@kroening kroening assigned tautschnig and unassigned kroening Jan 18, 2026
Copilot AI review requested due to automatic review settings March 4, 2026 15:50
@tautschnig
Copy link
Collaborator Author

To make progress, how about encoding the null-character as \0? This would be better than dropping the character, and better than aborting. The downside is that it is indistinguishable from the sequence "backslash" "zero" unless we wanted to also escape backslashes.

Applied your proposal with small extensions, and backslashes are now also escaped.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an XML rendering crash by changing how non-printable characters are emitted, and adds coverage to prevent regressions.

Changes:

  • Update XML escaping to encode invalid XML 1.0 control characters as C-style escape sequences (instead of aborting).
  • Add unit tests covering escaping behavior for attributes and node data.
  • Add a CBMC regression test exercising --trace --xml-ui with non-printable characters in traces.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
unit/util/xml.cpp Adds unit tests validating escaping rules for non-printable characters and backslashes.
src/util/xml.cpp Changes escaping logic to avoid invariants/aborts on invalid characters and emits escape sequences.
regression/cbmc/xml-nonprintable-chars/test.desc Adds a regression test description for the crash scenario (currently mismatched with new behavior).
regression/cbmc/xml-nonprintable-chars/main.c Adds a small C program to generate a failing trace containing a non-printable byte.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace DATA_INVARIANT crash with proper escaping when non-printable
characters appear in XML output. Characters valid in XML 1.0 (TAB, LF,
CR) use numeric character references (e.g., 	). Characters invalid
in XML 1.0 (null, other control chars, DEL) use C-style escape
sequences (\0, \x01). Backslashes are escaped as \\ to avoid
ambiguity. Hex escapes always use two digits to prevent misreading
(e.g., \x01 followed by 'f' cannot be confused with \x1f).

The shared escaping logic is extracted into a helper function to avoid
duplication between escape() and escape_attribute().

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

xml rendering issue when encountering string "\x01"

4 participants