feat(#6): add validate_oscal_content MCP tool by ethanolivertroy · Pull Request #58 · awslabs/mcp-server-for-oscal

ethanolivertroy · 2026-02-11T05:54:35Z

Summary

Adds a validate_oscal_content MCP tool that validates OSCAL JSON content through a 4-level pipeline: well-formedness, JSON Schema (Draft-07), trestle semantic checks, and optionally oscal-cli
Auto-detects OSCAL model type from the document root key, or accepts an explicit override
Gracefully degrades when oscal-cli is not installed or trestle doesn't support the model type (e.g., mapping-collection)

Details

Validation pipeline:

Level	What it checks	Implementation
1. Well-formedness	Valid JSON, is a dict	`json.loads()`
2. JSON Schema	Conforms to NIST OSCAL schema	`jsonschema.Draft7Validator` with bundled schemas
3. Trestle	Semantic checks via Pydantic models	`trestle.oscal.*` model instantiation
4. oscal-cli	Full NIST validation	`subprocess.run()` if on PATH

Key behaviors:

If Level 1 fails, Levels 2-4 are skipped
Errors capped at 20 per level to avoid overwhelming output
Overall valid: true only when all non-skipped levels pass
No new dependencies required (jsonschema is transitive, compliance-trestle is direct)

Files changed:

src/mcp_server_for_oscal/tools/validate_oscal_content.py — new tool
src/mcp_server_for_oscal/tools/utils.py — added ROOT_KEY_TO_MODEL_TYPE + load_oscal_json_schema()
src/mcp_server_for_oscal/main.py — import + registration
src/mcp_server_for_oscal/tools/README.md — documented new tool
tests/tools/test_validate_oscal_content.py — 39 tests across 6 test classes

Test plan

39 new unit tests covering all 4 validation levels, auto-detection, error capping, and end-to-end pipeline
Full test suite: 263 passed, 1 skipped (pre-existing), 0 regressions
Reviewer: start server, call validate_oscal_content with valid catalog JSON
Reviewer: call with malformed JSON, verify levels 2-4 skipped

Closes #6

Adds a multi-level OSCAL content validation tool that runs a 4-stage pipeline: well-formedness (JSON parsing), JSON Schema conformance (Draft-07 with bundled NIST schemas), trestle semantic checks (Pydantic model validation), and optionally oscal-cli full NIST validation. Auto-detects model type from root key, caps errors at 20 per level, and gracefully skips unavailable levels. Closes awslabs#6

Copilot

Pull request overview

Adds a new MCP tool to validate OSCAL JSON content end-to-end, with model auto-detection and optional external validator support, to help ensure LLM-generated OSCAL is structurally and semantically valid.

Changes:

Introduces validate_oscal_content tool implementing a 4-stage validation pipeline (parse → JSON Schema → trestle semantics → optional oscal-cli).
Adds utilities for OSCAL model root-key detection and loading bundled JSON schemas.
Registers the new tool in the server and documents it; adds a dedicated unit test suite.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/mcp_server_for_oscal/tools/validate_oscal_content.py`	Implements the multi-level validation pipeline and integrations (jsonschema/trestle/oscal-cli).
`src/mcp_server_for_oscal/tools/utils.py`	Adds root-key→model mapping and a helper to load bundled OSCAL JSON schemas.
`src/mcp_server_for_oscal/main.py`	Registers the new tool with the FastMCP server.
`src/mcp_server_for_oscal/tools/README.md`	Documents the new validation tool in the tools list.
`tests/tools/test_validate_oscal_content.py`	Adds unit tests covering detection, each validation level, and end-to-end behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T05:59:32Z

src/mcp_server_for_oscal/tools/validate_oscal_content.py

+            resolved_type = OSCALModelType(model_type)
+        except ValueError:
+            msg = f"Invalid model_type: '{model_type}'. Use list_oscal_models to see valid types."
+            try_notify_client_error(msg, ctx)
+            return {"valid": False, "model_type": model_type, "levels": levels, "error": msg}


This branch returns early on invalid model_type, but the response then contains only the Level 1 result in levels. Elsewhere (e.g., well-formedness failure) you always return a 4-entry levels list, which is much easier for clients to consume. Consider appending Levels 2–4 as skipped with an appropriate skip_reason before returning.

Fixed in a96aa34 — Added skipped Levels 2–4 with skip_reason on the invalid model_type early-exit path, so all code paths now return a consistent 4-entry levels list.

Copilot · 2026-02-11T05:59:32Z

src/mcp_server_for_oscal/tools/validate_oscal_content.py

+    else:
+        root_keys = [k for k in parsed if k != "$schema"]
+        msg = f"Cannot detect OSCAL model type from root keys: {root_keys}"
+        try_notify_client_error(msg, ctx)


When model type auto-detection fails, the tool returns early with only the Level 1 result in levels. For API consistency (and to match the documented 4-level pipeline), consider adding Level 2–4 entries marked as skipped (with a skip_reason like "Skipped due to undetectable model type") before returning.

Suggested change

try_notify_client_error(msg, ctx)

try_notify_client_error(msg, ctx)

# Model type could not be detected; skip remaining levels for API consistency

for lvl in ("json_schema", "trestle", "oscal_cli"):

levels.append(

_make_level(lvl, skipped=True, skip_reason="Skipped due to undetectable model type")

)

Fixed in a96aa34 — Added skipped Levels 2–4 with skip_reason="Skipped due to undetectable model type" before the early return, matching the well-formedness failure pattern.

Copilot · 2026-02-11T05:59:33Z

src/mcp_server_for_oscal/tools/validate_oscal_content.py

+    validator = jsonschema.Draft7Validator(schema)
+    raw_errors = list(validator.iter_errors(data))
+


Even though you cap reported schema errors to MAX_ERRORS_PER_LEVEL, this still materializes all validation errors via list(validator.iter_errors(...)), which can be expensive for large inputs. Consider iterating the generator and stopping after MAX_ERRORS_PER_LEVEL + 1 (enough to know whether to add the "showing N of M" warning) to avoid unnecessary work.

Fixed in a96aa34 — Replaced list(validator.iter_errors(data)) with list(itertools.islice(validator.iter_errors(data), MAX_ERRORS_PER_LEVEL + 1)). The warning message now says "more may exist" since we no longer know the exact total.

Copilot · 2026-02-11T05:59:33Z

src/mcp_server_for_oscal/tools/validate_oscal_content.py

+    Returns:
+        dict: Structured validation results with per-level detail
+    """
+    logger.debug("validate_oscal_content(model_type=%s, content_length=%d)", model_type, len(content))


len(content) in the debug log will raise a TypeError if the caller passes null/None for content (possible via JSON-RPC), which prevents Level 1 from returning a structured well-formedness error. Consider guarding the length computation (or moving this log after _validate_well_formedness succeeds) so invalid types are handled by the validation pipeline instead of crashing.

Suggested change

logger.debug("validate_oscal_content(model_type=%s, content_length=%d)", model_type, len(content))

content_length = len(content) if isinstance(content, str) else None

logger.debug("validate_oscal_content(model_type=%s, content_length=%s)", model_type, content_length)

Fixed in a96aa34 — Applied the suggested guard exactly: content_length = len(content) if isinstance(content, str) else None. Added test_none_content to confirm None input produces a well-formedness failure instead of crashing.

Copilot · 2026-02-11T05:59:33Z

tests/tools/test_validate_oscal_content.py

+"""
+
+import json
+from unittest.mock import AsyncMock, Mock, patch, MagicMock


Import of 'Mock' is not used.

Suggested change

from unittest.mock import AsyncMock, Mock, patch, MagicMock

from unittest.mock import AsyncMock, patch, MagicMock

Fixed in a96aa34 — Removed the unused Mock import.

Adapt to upstream refactor: move validate_oscal_content import and registration into _setup_tools(), and integrate tool docs into the rewritten README format.

dfkunstler · 2026-02-11T17:57:12Z

@ethanolivertroy ack. i am to review and merge this week. thank you!

- Always return 4 level entries on early exit (invalid/undetectable model type) - Guard debug logging against non-string content to prevent TypeError - Optimize Level 2 schema error collection with itertools.islice - Remove unused Mock import in tests - Add test_none_content and strengthen existing test assertions - Fix .gitignore newline corruption and add .firecrawl/ entry

dfkunstler · 2026-02-12T00:14:53Z

Code and tests look good to me so far, but i'm having trouble with the functionality as implemented. Specifically, my chat application (kiro-cli) can't find a way encode the contents of this OSCAL file such that it is safe to transmit and also valid. Claude needs to escape the content before it can send it to the MCP server, which results in an error like this:

Error executing tool validate_oscal_content: bad escape \p at position 2

This is a string escaping issue on my end when passing the content to the tool — the \p in the remarks field (...backwards compatible. Our goals...) was being interpreted as an escape
sequence rather than literal text. It's not an issue with your OSCAL document itself.

I get a similar error if i just paste the content into MCP Inspector and invoke the tool manually. If i wrap the content in double quotes, that error goes away but validation fails:

{
  "valid": false,
  "model_type": null,
  "levels": [
    {
      "level": "well_formedness",
      "valid": false,
      "errors": [
        "Invalid control character at: line 1 column 3 (char 2)"
      ],
      "warnings": [],
      "skipped": false,
      "skip_reason": null
    },
    {
      "level": "json_schema",
      "valid": true,
      "errors": [],
      "warnings": [],
      "skipped": true,
      "skip_reason": "Skipped due to well-formedness failure"
    },
    {
      "level": "trestle",
      "valid": true,
      "errors": [],
      "warnings": [],
      "skipped": true,
      "skip_reason": "Skipped due to well-formedness failure"
    },
    {
      "level": "oscal_cli",
      "valid": true,
      "errors": [],
      "warnings": [],
      "skipped": true,
      "skip_reason": "Skipped due to well-formedness failure"
    }
  ]
}

An additional consideration here is the potential size of OSCAL files, which can get very large. Have you considered taking a path to a file as input and loading it within the tool? That would prevent the encoding problem and preserve space in the context window. Thoughts? @ethanolivertroy

ethanolivertroy · 2026-02-12T12:30:46Z

Started digging into this last night with kiro-cli
I think the problem is in the differences with how Claude code CLI handles things versus other harnesses
I had some similar problems with some of my MCP based tools I originally developed for Claude code when I tried passing them to kimi 2.5-thinking that broke
Let me dig into this some more

…gex support (awslabs#6) - Add validate_oscal_file tool to validate OSCAL files from local or remote URIs - Implement _pattern_safe validator using regex module for ECMA-262 Unicode property escape support - Add _build_safe_validator to extend Draft7Validator with safe pattern handling - Add regex>=2024.0.0 dependency to pyproject.toml - Fix subprocess.run call to use check=False parameter for proper error handling

dfkunstler

The previously discussed error "bad escape \p at position 2" is due to an incompatibility between Python's built-in re module and OSCAL JSON schemas. I've addressed that by adding custom validator function in validate_oscal_content.py. Validation tools now work as expected when invoked from kiro-cli or MCP Inspector.

ethanolivertroy requested review from a team and Copilot February 11, 2026 05:54

Copilot started reviewing on behalf of ethanolivertroy February 11, 2026 05:55 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

merge upstream/main and resolve conflicts

da11b40

Adapt to upstream refactor: move validate_oscal_content import and registration into _setup_tools(), and integrate tool docs into the rewritten README format.

dfkunstler self-assigned this Feb 12, 2026

dfkunstler added 2 commits February 24, 2026 20:02

Merge branch 'main' into feat/validate-oscal-content

fc96be6

dfkunstler approved these changes Feb 25, 2026

View reviewed changes

dfkunstler merged commit d7d38af into awslabs:main Feb 25, 2026
1 check passed

dfkunstler mentioned this pull request Feb 25, 2026

JSON schema validation #44

Closed

-        try_notify_client_error(msg, ctx)
+        try_notify_client_error(msg, ctx)
+        # Model type could not be detected; skip remaining levels for API consistency
+        for lvl in ("json_schema", "trestle", "oscal_cli"):
+            levels.append(
+                _make_level(lvl, skipped=True, skip_reason="Skipped due to undetectable model type")
+            )

		validator = jsonschema.Draft7Validator(schema)
		raw_errors = list(validator.iter_errors(data))

	logger.debug("validate_oscal_content(model_type=%s, content_length=%d)", model_type, len(content))
	content_length = len(content) if isinstance(content, str) else None
	logger.debug("validate_oscal_content(model_type=%s, content_length=%s)", model_type, content_length)

	from unittest.mock import AsyncMock, Mock, patch, MagicMock
	from unittest.mock import AsyncMock, patch, MagicMock

Conversation

ethanolivertroy commented Feb 11, 2026

Summary

Details

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ethanolivertroy Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ethanolivertroy Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ethanolivertroy Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ethanolivertroy Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

ethanolivertroy Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

dfkunstler commented Feb 11, 2026

Uh oh!

dfkunstler commented Feb 12, 2026

Uh oh!

ethanolivertroy commented Feb 12, 2026

Uh oh!

dfkunstler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants