Skip to content

Conversation

@ben-edna
Copy link
Contributor

@ben-edna ben-edna commented Jan 7, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Manifest validation now disallows NULL and control characters in string fields.
  • Documentation

    • Updated manifest docs to note strings must not contain NULL or control characters.
  • Tests

    • Added fuzzing tests to improve manifest validation coverage.
  • Chores

    • Added development dependency for fuzzing and ignored local fuzzing state in version control.
  • Changelog

    • Added unreleased entries for fuzzing and manifest validation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 7, 2026

Walkthrough

Adds Hypothesis-based fuzz tests for manifest handling, introduces a SAFE_STR regex to forbid NULL/control characters in manifest string fields, adds Hypothesis as a dev dependency, updates documentation and changelog, and adds .hypothesis to .gitignore.

Changes

Cohort / File(s) Summary
Schema Validation
dfetch/manifest/schema.py, doc/manifest.rst
Add public constant SAFE_STR = Regex(r"^[^\x00-\x1F\x7F-\x9F]*$"); replace Str() with SAFE_STR for REMOTE_SCHEMA (name, url-base) and for fields in PROJECT_SCHEMA (name, optional dst, branch, tag, revision, url, repo-path, remote, patch, added optional src, and ignore as Seq(SAFE_STR)). Doc note added explaining no NULL/control chars.
Fuzzing Test Infrastructure
.gitignore, pyproject.toml, tests/test_fuzzing.py
Add .hypothesis to .gitignore; add hypothesis==6.150.0 to dev optional deps; add tests/test_fuzzing.py containing Hypothesis manifest generation strategy, validators, multiple tests (test_data_conforms_to_schema, test_manifest_can_be_created, test_check, test_update), Hypothesis profiles, and helper utilities.
Release Notes
CHANGELOG.rst
Add two unreleased entries under Release 0.12.0: "Add Fuzzing (#819)" and "Don't allow NULL or control characters in manifest (#114)".

Sequence Diagram

sequenceDiagram
    actor Hypothesis
    participant Strategy as Manifest Strategy
    participant Validator as StrictYAML Validator
    participant Manifest as Manifest Model
    participant CLI as dfetch Commands

    Hypothesis->>Strategy: generate manifest-like data
    Strategy-->>Hypothesis: return candidate data
    Hypothesis->>Validator: serialize/validate against schema
    alt Valid
        Validator-->>Hypothesis: validation passed
        Hypothesis->>Manifest: instantiate Manifest (may raise KeyError)
        alt Instantiation success
            Manifest-->>Hypothesis: instance created
            Hypothesis->>CLI: run check/update (in temp dir)
            CLI-->>Hypothesis: success or DfetchFatalException (caught)
        else KeyError
            Hypothesis-->>Hypothesis: tolerate/ignore KeyError
        end
    else Invalid
        Validator-->>Hypothesis: fail -> trigger shrinking
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Switch from pykwalify to StrictYAML #922: Modifies dfetch/manifest/schema.py and REMOTE/PROJECT schema definitions — directly related to the SAFE_STR/schema changes.
  • Fix #891 - Update docs #892: Alters manifest schema for projects.src and related string validation — related to the added src optional field and tightened string validation.

Suggested labels

development, enhancement

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add fuzzing' directly and clearly describes the main change in the pull request: introduction of a new fuzzing test suite using the Hypothesis framework.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3915afe and 6ccc317.

📒 Files selected for processing (2)
  • dfetch/manifest/schema.py
  • tests/test_fuzzing.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: build / build (windows-latest)
  • GitHub Check: build / build (ubuntu-latest)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: run / run (windows-latest, 3.11)
  • GitHub Check: run / run (macos-latest, 3.11)
  • GitHub Check: run / run (windows-latest, 3.9)
  • GitHub Check: run / run (windows-latest, 3.12)
  • GitHub Check: run / run (windows-latest, 3.14)
  • GitHub Check: run / run (macos-latest, 3.10)
  • GitHub Check: run / run (windows-latest, 3.13)
  • GitHub Check: run / run (macos-latest, 3.14)
  • GitHub Check: run / run (macos-latest, 3.12)
  • GitHub Check: run / run (windows-latest, 3.10)
  • GitHub Check: run / run (macos-latest, 3.9)
  • GitHub Check: run / test-cygwin
  • GitHub Check: test / test
  • GitHub Check: DevContainer Build & Test
🔇 Additional comments (12)
dfetch/manifest/schema.py (2)

7-8: LGTM! Regex correctly blocks control characters.

The SAFE_STR regex now properly excludes all control characters (0x00-0x1F and 0x7F-0x9F) as stated in the comment. This addresses the previous critical issue where only NULL bytes were blocked.


10-33: LGTM! Schema updates are comprehensive and consistent.

All string fields have been systematically updated to use SAFE_STR, providing consistent validation across the manifest schema. The changes maintain the existing structure while adding the control-character protection.

tests/test_fuzzing.py (10)

20-39: LGTM! Hypothesis profiles are well-configured.

The profile setup appropriately balances test thoroughness with execution time across different environments (CI, dev, manual). The deadline=None setting prevents flaky timeout failures during fuzzing.


42-50: LGTM! SAFE_TEXT strategy aligns with SAFE_STR validation.

The character generation strategy correctly excludes control characters (0x00-0x1F via min_codepoint, 0x7F-0x9F via blacklist), matching the SAFE_STR regex in the schema. The surrogate character exclusion prevents Unicode encoding issues.


53-61: LGTM! Number and optional string helpers are well-designed.

SAFE_NUMBER appropriately excludes NaN and infinity for practical testing, matching the schema's NUMBER definition. The opt_str() helper provides a clean abstraction for optional text fields.


64-77: LGTM! Remote entry strategy correctly models the schema.

The strategy appropriately generates remote entries with required non-empty name and url-base fields, plus optional default flag. The dict comprehension cleanly filters out None values.


79-114: LGTM! Project entry strategy comprehensively models the schema.

The project_entry strategy correctly generates all fields from PROJECT_SCHEMA. Line 81's fix to make ignore_list optional (st.none() | st.lists(...)) properly addresses the previous review concern, ensuring the ignore field can be omitted from generated manifests.


116-130: LGTM! Manifest strategy correctly composes the complete structure.

The manifest_strategy appropriately combines version, optional remotes, and required projects into the MANIFEST_SCHEMA structure. The conditional inclusion of remotes ensures the field is omitted when None, matching schema expectations.


133-145: LGTM! Schema conformance test is effectively designed.

The test leverages StrictYAML's validation to catch schema violations, allowing Hypothesis to shrink to minimal counterexamples. This provides strong assurance that the fuzzing strategy generates only schema-compliant data.


148-154: LGTM! Manifest construction test appropriately handles expected exceptions.

The test verifies that Manifest construction doesn't crash unexpectedly, while tolerating KeyError for cases where the Manifest class enforces additional constraints beyond the schema. This is appropriate for fuzzing-level validation.


157-176: LGTM! Integration tests provide valuable command-level fuzzing.

These tests effectively verify that the check and update commands handle fuzzed manifests gracefully without unexpected crashes. Suppressing DfetchFatalException is appropriate since some generated manifests may legitimately fail validation.


179-196: LGTM! Main block provides excellent developer tooling.

The script execution mode enables manual testing and debugging by generating examples, demonstrating YAML serialization, and validating round-trip parsing. This enhances the developer experience when working with the fuzzing infrastructure.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

ignore list should never be empty
@spoorcc spoorcc marked this pull request as ready for review January 7, 2026 18:32
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @dfetch/manifest/schema.py:
- Around line 7-8: The SAFE_STR regex currently only blocks NUL but the comment
claims it blocks all control chars; update the SAFE_STR pattern so it also
rejects ASCII control ranges 0x00–0x1F and 0x7F–0x9F (i.e., disallow bytes with
hex values 00..1F and 7F..9F) and adjust the comment to match, or if the
intended behavior is to only block NUL, change the comment to state that only
NUL bytes are disallowed.

In @tests/test_fuzzing.py:
- Line 79: The generated ignore_list is always present because it's defined as
st.lists(..., min_size=1); make it optional by allowing None like the other
optional fields: replace the current ignore_list assignment with
st.one_of(st.none(), st.lists(SAFE_TEXT, min_size=1, max_size=5)) so the dict
comprehension can filter out None and omit the ignore key; apply the same change
for the other occurrence of ignore_list referenced in the comment.
🧹 Nitpick comments (1)
tests/test_fuzzing.py (1)

146-152: Consider broadening exception handling for robustness.

The test only catches KeyError, but other exceptions like ValueError, TypeError, or strictyaml validation errors may be raised during Manifest construction. If these are expected failures for invalid fuzzed input, they should also be suppressed.

Suggested improvement
 @given(manifest_strategy)
 def test_manifest_can_be_created(data):
     """Validate by attempting to construct a Manifest."""
-    try:
-        Manifest(data)
-    except KeyError:
-        pass
+    with suppress(KeyError, ValueError, TypeError):
+        Manifest(data)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ac79aa and 3915afe.

📒 Files selected for processing (6)
  • .gitignore
  • CHANGELOG.rst
  • dfetch/manifest/schema.py
  • doc/manifest.rst
  • pyproject.toml
  • tests/test_fuzzing.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_fuzzing.py (5)
dfetch/__main__.py (1)
  • DfetchFatalException (25-26)
dfetch/manifest/manifest.py (3)
  • Manifest (103-356)
  • remotes (281-283)
  • projects (260-262)
dfetch/util/util.py (1)
  • in_directory (66-75)
dfetch/manifest/remote.py (1)
  • url (78-80)
dfetch/manifest/project.py (1)
  • vcs (432-434)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build / build (windows-latest)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (12)
.gitignore (1)

17-17: LGTM!

The .hypothesis directory is correctly ignored. This is a standard practice for projects using Hypothesis, as it stores test data and examples that should not be committed to version control.

CHANGELOG.rst (1)

7-8: LGTM!

The changelog entries clearly document the new fuzzing capability and the stricter manifest validation. The issue references (#819, #114) provide good traceability.

dfetch/manifest/schema.py (2)

10-33: Systematic replacement approach is good.

The replacement of Str() with SAFE_STR across all string fields in both REMOTE_SCHEMA and PROJECT_SCHEMA is thorough and consistent. This provides centralized validation once the regex pattern is corrected.


3-3: LGTM!

The import changes correctly reflect the shift from Str() to Regex-based validation.

doc/manifest.rst (1)

20-22: Documentation is correct, but implementation is incomplete.

The documentation correctly states that strings should not contain NULL or control characters. However, the current SAFE_STR regex in dfetch/manifest/schema.py only blocks NULL bytes, not control characters. The implementation must be updated to match this documentation (see comment on schema.py).

pyproject.toml (1)

89-89: No action required—Hypothesis 6.150.0 is the current stable release and has no known security vulnerabilities.

Version 6.150.0 is the latest release on PyPI (released January 6, 2026) with no public security advisories or high-severity CVEs. The pinned version is current and secure.

tests/test_fuzzing.py (6)

1-18: LGTM!

The imports are well-organized and appropriate for the fuzzing test suite.


20-39: LGTM!

The Hypothesis profile configuration is well-structured for different environments. The deadline=None setting is appropriate given the filesystem and subprocess operations in these tests.


117-128: LGTM!

The manifest strategy structure aligns well with the expected manifest schema format.


131-136: LGTM!

Clean helper function that leverages StrictYAML's validation.


155-174: LGTM!

Both test functions follow a consistent and appropriate pattern for fuzzing CLI commands with generated manifests. The use of suppress(DfetchFatalException) correctly handles expected failures from invalid manifest configurations.


177-194: LGTM!

The main block provides useful utilities for manual testing and debugging. The round-trip demonstration validates the strategy produces valid data.

One minor note: manifest_strategy.example() (line 181) bypasses Hypothesis's shrinking and database, but this is acceptable here for demonstration purposes.

This was linked to issues Jan 7, 2026
@spoorcc spoorcc merged commit a8fb6be into main Jan 7, 2026
41 checks passed
@spoorcc spoorcc deleted the fuzzing branch January 7, 2026 19:53
@coderabbitai coderabbitai bot mentioned this pull request Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add fuzzing Check manifest string parsing

3 participants