Conversation
Co-authored-by: nicholas.tindle <nicholas.tindle@agpt.co>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Pull request overview
This PR enhances the parser's robustness by adding explicit validation that rejects raw text outside of XML/HTML-like tags. Previously, such text would be silently ignored, but now it raises a descriptive ValueError with line and column information.
- Adds an
add_textmethod to theListclass that validates and rejects non-whitespace text outside of tags - Updates the parser to pass full
Tokenobjects (instead of just values) for better error reporting - Introduces comprehensive test coverage for the new validation behavior
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| gravitasml/parser.py | Adds add_text method to List class for validating text outside tags and updates the parse loop to pass Token objects |
| tests/test_parser.py | Adds two new test cases verifying ValueError is raised for text outside root tags and text-only documents |
Pin the backend to gravitasml tag 0.1.4 so the XML parser fix from Significant-Gravitas/gravitasml#15 ships with this PR.
Tests now expect ValueError when self-closing tags result in text outside of proper tags, aligning with PR #15's stricter validation.
* Initial plan * Implement no_parse filter syntax for preventing recursive parsing Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com> * Add comprehensive test cases for self-closing tags and unmatched tags with no_parse filter Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com> * Merge best features from both no_parse filter implementations Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com> * Apply Black code formatting to fix style issues Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com> * fix: move re import to module level, improve docstrings - Move 'import re' from inline in method to module level - Add docstring explaining underscore→space reversal in _reconstruct_original_tag - Clarify test docstring for token fallback whitespace behavior * fix: update self-closing tag tests for new text-outside-tags validation Tests now expect ValueError when self-closing tags result in text outside of proper tags, aligning with PR #15's stricter validation. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com> Co-authored-by: Nick Tindle <nick@ntindle.com>
This pull request improves the robustness of the parser by adding explicit error handling for unsupported raw text outside of tags, and introduces new tests to verify this behavior.
Parser error handling improvements:
add_textmethod to theListclass ingravitasml/parser.pythat raises aValueErrorwhen raw text is encountered outside of a tag, ensuring unsupported text is flagged early.Tokenobject toadd_textfor better error reporting, instead of just the text value.Test coverage enhancements:
tests/test_parser.pyto confirm that aValueErroris raised when parsing documents with text outside of root tags, or with only raw text and no tags.Note
Adds validation to raise a ValueError for raw text outside tags and updates tests to cover trailing/text-only cases.
gravitasml/parser.py):List.add_text(token: Token)to raiseValueErrorfor raw text outside tags with line/column info.Parser.parseto pass the fullTokentoadd_textinstead of just the string value.tests/test_parser.py):ValueErrorfor trailing text after a root tag and for text-only documents.Written by Cursor Bugbot for commit c5e0cac. This will update automatically on new commits. Configure here.