Skip to content

Add validation for text outside of tags#15

Merged
ntindle merged 1 commit intomainfrom
cursor/fix-parser-error-for-text-outside-root-gpt-5.1-codex-b3b9
Dec 9, 2025
Merged

Add validation for text outside of tags#15
ntindle merged 1 commit intomainfrom
cursor/fix-parser-error-for-text-outside-root-gpt-5.1-codex-b3b9

Conversation

@ntindle
Copy link
Member

@ntindle ntindle commented Dec 5, 2025

This pull request improves the robustness of the parser by adding explicit error handling for unsupported raw text outside of tags, and introduces new tests to verify this behavior.

Parser error handling improvements:

  • Added an add_text method to the List class in gravitasml/parser.py that raises a ValueError when raw text is encountered outside of a tag, ensuring unsupported text is flagged early.
  • Updated the parser logic to pass the whole Token object to add_text for better error reporting, instead of just the text value.

Test coverage enhancements:

  • Added two new tests in tests/test_parser.py to confirm that a ValueError is raised when parsing documents with text outside of root tags, or with only raw text and no tags.

Note

Adds validation to raise a ValueError for raw text outside tags and updates tests to cover trailing/text-only cases.

  • Parser (gravitasml/parser.py):
    • Add List.add_text(token: Token) to raise ValueError for raw text outside tags with line/column info.
    • Update Parser.parse to pass the full Token to add_text instead of just the string value.
  • Tests (tests/test_parser.py):
    • Add tests asserting ValueError for trailing text after a root tag and for text-only documents.

Written by Cursor Bugbot for commit c5e0cac. This will update automatically on new commits. Configure here.

Co-authored-by: nicholas.tindle <nicholas.tindle@agpt.co>
Copilot AI review requested due to automatic review settings December 5, 2025 16:01
@qodo-code-review
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Error Message Consistency

Ensure the error message format in add_text (including line/column presence) consistently matches expectations across different tokenization scenarios so tests and downstream consumers receive predictable messages.

def add_text(self, token: Token):
    """
    Handles raw text when the parser is not currently inside a node.
    """
    text = token.value
    if not text or text.isspace():
        return
    raise ValueError(
        f"Text outside of a tag is unsupported at line {token.line_num}, column {token.column}"
    )
Parser State Coverage

Verify that raw text encountered when current is neither List nor Node (e.g., at initialization or after closing the last tag) is always handled via List.add_text to avoid silent acceptance or inconsistent behavior.

for t in self.tokens:
    if t.type == "TEXT":
        if isinstance(self.current, List):
            self.current.add_text(t)
        elif isinstance(self.current, Node):
            self.current.value += t.value  # type: Node
    elif t.type == "TAG_OPEN":

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the parser's robustness by adding explicit validation that rejects raw text outside of XML/HTML-like tags. Previously, such text would be silently ignored, but now it raises a descriptive ValueError with line and column information.

  • Adds an add_text method to the List class that validates and rejects non-whitespace text outside of tags
  • Updates the parser to pass full Token objects (instead of just values) for better error reporting
  • Introduces comprehensive test coverage for the new validation behavior

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
gravitasml/parser.py Adds add_text method to List class for validating text outside tags and updates the parse loop to pass Token objects
tests/test_parser.py Adds two new test cases verifying ValueError is raised for text outside root tags and text-only documents

@ntindle ntindle merged commit d17f6ec into main Dec 9, 2025
21 checks passed
@ntindle ntindle deleted the cursor/fix-parser-error-for-text-outside-root-gpt-5.1-codex-b3b9 branch December 9, 2025 17:58
cursor bot pushed a commit to Significant-Gravitas/AutoGPT that referenced this pull request Dec 18, 2025
Pin the backend to gravitasml tag 0.1.4 so the XML parser fix
from Significant-Gravitas/gravitasml#15 ships with this PR.
ntindle added a commit that referenced this pull request Feb 2, 2026
Tests now expect ValueError when self-closing tags result in text outside
of proper tags, aligning with PR #15's stricter validation.
ntindle added a commit that referenced this pull request Feb 12, 2026
* Initial plan

* Implement no_parse filter syntax for preventing recursive parsing

Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com>

* Add comprehensive test cases for self-closing tags and unmatched tags with no_parse filter

Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com>

* Merge best features from both no_parse filter implementations

Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com>

* Apply Black code formatting to fix style issues

Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com>

* fix: move re import to module level, improve docstrings

- Move 'import re' from inline in method to module level
- Add docstring explaining underscore→space reversal in _reconstruct_original_tag
- Clarify test docstring for token fallback whitespace behavior

* fix: update self-closing tag tests for new text-outside-tags validation

Tests now expect ValueError when self-closing tags result in text outside
of proper tags, aligning with PR #15's stricter validation.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ntindle <8845353+ntindle@users.noreply.github.com>
Co-authored-by: Nick Tindle <nick@ntindle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants