Skip to content

RFC: Adding progressive XML lint for the specs#726

Draft
ocampana-videotec wants to merge 3 commits intodevelopmentfrom
tc/progressive_xml_lint
Draft

RFC: Adding progressive XML lint for the specs#726
ocampana-videotec wants to merge 3 commits intodevelopmentfrom
tc/progressive_xml_lint

Conversation

@ocampana-videotec
Copy link
Copy Markdown
Collaborator

@ocampana-videotec ocampana-videotec commented Feb 18, 2026

While working on fixing #555 , several issues of indenting (replacing spaces with tabs or vice versa), encoding (UTF8 vs Windows-1252) and xml code quality became evident.

ONVIF will always have open PRs, therefore we cannot have a cleaning moment to reset the status of the doc. With this PR, I am presenting the idea of introducing progressive XML linting in the specs, based on the script .github\scripts\progressive_xml_lint.py and GitHub actions.

How progressive_xml_lint.py works

The script currently operates on files ending with .xml, .xsd and .wsdl. It uses white spaces for intending and at the time of this proposal it indents with four of them.

The script checks the difference with a base reference, and only on the changed lines it applies the following changes:

  • It strips useless spacing between words
  • It fixes the wrong Windows-1252 encoding with the ASCII or UTF8 equivalent
  • It skips indenting lines related to the <programlisting> tag, to avoid messing with code-related text
  • It cuts long lines exceeding a certain threshold (currently 120 characters)
  • It evaluates the indenting depth each row would have if the whole file would be linted and applies the correct spacing

The idea at the bottom of the script is that progressively, we should beautify commit after commit the specs and reach to a point where linting the whole file should not heavily impact our work.

The current configuration of the tool is

CONFIG = {
    "indent_char": " ",       # Use " " for spaces, "\t" for tabs
    "indent_size": 4,         # How many spaces per level?
    "max_line_length": 120,   # Maximum characters per line
    "extensions": ('.xml', '.xsd', '.wsdl'),
    # Map of { Bad_Byte : Replacement_Bytes }
    "replacements": {
        b'\x85': b'...',      # Horizontal ellipsis
        b'\x91': b"'",        # Left single quote
        b'\x92': b"'",        # Right single quote
        b'\x93': b'"',        # Left double quote
        b'\x94': b'"',        # Right double quote
        b'\x95': b'*',        # Bullet
        b'\x96': b'-',        # En dash
        b'\x97': b'-',        # Em dash
        b'\x84': b'"',        # Low double quote
        b'\x99': b'\xe2\x84\xa2' # Trademark (TM)
    }
}

but I do not pretend this would be the definitive one.

How to test it

The easiest way is to checkout locally the tc/progressive_xml_lint branch and to merge in one of your branches

PS C:\Users\ottavio\Projects\specs> git checkout -b tc/progressive_xml_lint origin/tc/progressive_xml_lint
branch 'tc/progressive_xml_lint' set up to track 'origin/tc/progressive_xml_lint'.
Switched to a new branch 'tc/progressive_xml_lint'
PS C:\Users\ottavio\Projects\specs> git checkout -b video/srtp-aes-gcm origin/video/srtp-aes-gcm
branch 'video/srtp-aes-gcm' set up to track 'origin/video/srtp-aes-gcm'.
Switched to a new branch 'video/srtp-aes-gcm'
PS C:\Users\ottavio\Projects\specs> git merge tc/progressive_xml_lint
Merge made by the 'ort' strategy.
 .github/scripts/progressive_lint.py | 211 ++++++++++++++++++++++++++++++++
 1 file changed, 211 insertions(+)
 create mode 100644 .github/scripts/progressive_xml_lint.py
PS C:\Users\ottavio\Projects\specs> python.exe .\.github\scripts\progressive_lint.py development
PS C:\Users\ottavio\Projects\specs> git diff 

Please, let me know what you think of the idea.

Copy link
Copy Markdown
Contributor

@melanconj melanconj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I'd say we should have a github action running automatically in the PRs to block (or auto-fix?) incorrect changes and ensure we properly apply it.
Otherwise, I'm all for it

@ocampana-videotec
Copy link
Copy Markdown
Collaborator Author

The final goal is to invoke this script by a GitHub action. This initial draft PR is meant to gather feedback from other members, both on the idea per se and on the script. @melanconj did you have the time to give it a try?

With this change, we make sure to run progressive_lint.py before the
wsdl checkers.
@ocampana-videotec
Copy link
Copy Markdown
Collaborator Author

@melanconj I just pushed an update of the PR, unifying the execution of the script and the two checkers based on gsoap and dotnet into a unique action. I will now debug it.

@melanconj
Copy link
Copy Markdown
Contributor

Just gave it a run, it seems to have shuffled a lot of files, and broke identation on some
image

I'll let you do the first debugging pass on it first though

@ocampana-videotec
Copy link
Copy Markdown
Collaborator Author

@melanconj I think this is the expected behavior. This is why I wrote

The script checks the difference with a base reference, and only on the changed lines it applies the following changes

The point is that I merged the PR to compile the PTZ specs in the GitHub action, therefore development and your branch increased the difference, but on the development side. I will dig into it, but could you please let me know what branch you are working on?

@ocampana-videotec
Copy link
Copy Markdown
Collaborator Author

@melanconj , I changed the logic. Instead of doing

git diff -U0 base_ref -- file_path

the script now does

git diff -U0 base_ref... -- file_path

Therefore it beautifies not all the different lines between the current branch and base_ref (usually development, which may have additional changes because of other merged PRs) but it beautifies the differences between the current branch and base_ref... which is de facto the moment when the current branch was created from development.

I tried it between development and SNMPV3 and it looks much better now. Do you want to give it a second try?

@melanconj
Copy link
Copy Markdown
Contributor

melanconj commented Feb 25, 2026

Yeah but I don't think "adds 6 indentation layers at once on a random <inlineequation>" is the intended behavior. It did that on quite a few things

@ocampana-videotec
Copy link
Copy Markdown
Collaborator Author

I think it is not on random lines. Did you try the updated version?

In any case, let's review how it works at the next F2F meeting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants