Add fuzzing #928

ben-edna · 2026-01-07T17:40:45Z

Summary by CodeRabbit

Bug Fixes
- Manifest validation now disallows NULL and control characters in string fields.
Documentation
- Updated manifest docs to note strings must not contain NULL or control characters.
Tests
- Added fuzzing tests to improve manifest validation coverage.
Chores
- Added development dependency for fuzzing and ignored local fuzzing state in version control.
Changelog
- Added unreleased entries for fuzzing and manifest validation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-07T17:40:51Z

Walkthrough

Adds Hypothesis-based fuzz tests for manifest handling, introduces a SAFE_STR regex to forbid NULL/control characters in manifest string fields, adds Hypothesis as a dev dependency, updates documentation and changelog, and adds .hypothesis to .gitignore.

Changes

Cohort / File(s)	Summary
Schema Validation `dfetch/manifest/schema.py`, `doc/manifest.rst`	Add public constant `SAFE_STR = Regex(r"^[^\x00-\x1F\x7F-\x9F]*$")`; replace `Str()` with `SAFE_STR` for `REMOTE_SCHEMA` (`name`, `url-base`) and for fields in `PROJECT_SCHEMA` (`name`, optional `dst`, `branch`, `tag`, `revision`, `url`, `repo-path`, `remote`, `patch`, added optional `src`, and `ignore` as `Seq(SAFE_STR)`). Doc note added explaining no NULL/control chars.
Fuzzing Test Infrastructure `.gitignore`, `pyproject.toml`, `tests/test_fuzzing.py`	Add `.hypothesis` to `.gitignore`; add `hypothesis==6.150.0` to dev optional deps; add `tests/test_fuzzing.py` containing Hypothesis manifest generation strategy, validators, multiple tests (`test_data_conforms_to_schema`, `test_manifest_can_be_created`, `test_check`, `test_update`), Hypothesis profiles, and helper utilities.
Release Notes `CHANGELOG.rst`	Add two unreleased entries under Release 0.12.0: "Add Fuzzing (`#819`)" and "Don't allow NULL or control characters in manifest (`#114`)".

Sequence Diagram

sequenceDiagram
    actor Hypothesis
    participant Strategy as Manifest Strategy
    participant Validator as StrictYAML Validator
    participant Manifest as Manifest Model
    participant CLI as dfetch Commands

    Hypothesis->>Strategy: generate manifest-like data
    Strategy-->>Hypothesis: return candidate data
    Hypothesis->>Validator: serialize/validate against schema
    alt Valid
        Validator-->>Hypothesis: validation passed
        Hypothesis->>Manifest: instantiate Manifest (may raise KeyError)
        alt Instantiation success
            Manifest-->>Hypothesis: instance created
            Hypothesis->>CLI: run check/update (in temp dir)
            CLI-->>Hypothesis: success or DfetchFatalException (caught)
        else KeyError
            Hypothesis-->>Hypothesis: tolerate/ignore KeyError
        end
    else Invalid
        Validator-->>Hypothesis: fail -> trigger shrinking
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Switch from pykwalify to StrictYAML #922: Modifies dfetch/manifest/schema.py and REMOTE/PROJECT schema definitions — directly related to the SAFE_STR/schema changes.
Fix #891 - Update docs #892: Alters manifest schema for projects.src and related string validation — related to the added src optional field and tightened string validation.

Suggested labels

development, enhancement

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add fuzzing' directly and clearly describes the main change in the pull request: introduction of a new fuzzing test suite using the Hypothesis framework.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3915afe and 6ccc317.

📒 Files selected for processing (2)

dfetch/manifest/schema.py
tests/test_fuzzing.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)

GitHub Check: build / build (windows-latest)
GitHub Check: build / build (ubuntu-latest)
GitHub Check: Codacy Static Code Analysis
GitHub Check: run / run (windows-latest, 3.11)
GitHub Check: run / run (macos-latest, 3.11)
GitHub Check: run / run (windows-latest, 3.9)
GitHub Check: run / run (windows-latest, 3.12)
GitHub Check: run / run (windows-latest, 3.14)
GitHub Check: run / run (macos-latest, 3.10)
GitHub Check: run / run (windows-latest, 3.13)
GitHub Check: run / run (macos-latest, 3.14)
GitHub Check: run / run (macos-latest, 3.12)
GitHub Check: run / run (windows-latest, 3.10)
GitHub Check: run / run (macos-latest, 3.9)
GitHub Check: run / test-cygwin
GitHub Check: test / test
GitHub Check: DevContainer Build & Test

🔇 Additional comments (12)

dfetch/manifest/schema.py (2)

7-8: LGTM! Regex correctly blocks control characters.

The SAFE_STR regex now properly excludes all control characters (0x00-0x1F and 0x7F-0x9F) as stated in the comment. This addresses the previous critical issue where only NULL bytes were blocked.

10-33: LGTM! Schema updates are comprehensive and consistent.

All string fields have been systematically updated to use SAFE_STR, providing consistent validation across the manifest schema. The changes maintain the existing structure while adding the control-character protection.

tests/test_fuzzing.py (10)

20-39: LGTM! Hypothesis profiles are well-configured.

The profile setup appropriately balances test thoroughness with execution time across different environments (CI, dev, manual). The deadline=None setting prevents flaky timeout failures during fuzzing.

42-50: LGTM! SAFE_TEXT strategy aligns with SAFE_STR validation.

The character generation strategy correctly excludes control characters (0x00-0x1F via min_codepoint, 0x7F-0x9F via blacklist), matching the SAFE_STR regex in the schema. The surrogate character exclusion prevents Unicode encoding issues.

53-61: LGTM! Number and optional string helpers are well-designed.

SAFE_NUMBER appropriately excludes NaN and infinity for practical testing, matching the schema's NUMBER definition. The opt_str() helper provides a clean abstraction for optional text fields.

64-77: LGTM! Remote entry strategy correctly models the schema.

The strategy appropriately generates remote entries with required non-empty name and url-base fields, plus optional default flag. The dict comprehension cleanly filters out None values.

79-114: LGTM! Project entry strategy comprehensively models the schema.

The project_entry strategy correctly generates all fields from PROJECT_SCHEMA. Line 81's fix to make ignore_list optional (st.none() | st.lists(...)) properly addresses the previous review concern, ensuring the ignore field can be omitted from generated manifests.

116-130: LGTM! Manifest strategy correctly composes the complete structure.

The manifest_strategy appropriately combines version, optional remotes, and required projects into the MANIFEST_SCHEMA structure. The conditional inclusion of remotes ensures the field is omitted when None, matching schema expectations.

133-145: LGTM! Schema conformance test is effectively designed.

The test leverages StrictYAML's validation to catch schema violations, allowing Hypothesis to shrink to minimal counterexamples. This provides strong assurance that the fuzzing strategy generates only schema-compliant data.

148-154: LGTM! Manifest construction test appropriately handles expected exceptions.

The test verifies that Manifest construction doesn't crash unexpectedly, while tolerating KeyError for cases where the Manifest class enforces additional constraints beyond the schema. This is appropriate for fuzzing-level validation.

157-176: LGTM! Integration tests provide valuable command-level fuzzing.

These tests effectively verify that the check and update commands handle fuzzed manifests gracefully without unexpected crashes. Suppressing DfetchFatalException is appropriate since some generated manifests may legitimately fail validation.

179-196: LGTM! Main block provides excellent developer tooling.

The script execution mode enables manual testing and debugging by generating examples, demonstrating YAML serialization, and validating round-trip parsing. This enhances the developer experience when working with the fuzzing infrastructure.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix typo

ignore list should never be empty

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @dfetch/manifest/schema.py:
- Around line 7-8: The SAFE_STR regex currently only blocks NUL but the comment
claims it blocks all control chars; update the SAFE_STR pattern so it also
rejects ASCII control ranges 0x00–0x1F and 0x7F–0x9F (i.e., disallow bytes with
hex values 00..1F and 7F..9F) and adjust the comment to match, or if the
intended behavior is to only block NUL, change the comment to state that only
NUL bytes are disallowed.

In @tests/test_fuzzing.py:
- Line 79: The generated ignore_list is always present because it's defined as
st.lists(..., min_size=1); make it optional by allowing None like the other
optional fields: replace the current ignore_list assignment with
st.one_of(st.none(), st.lists(SAFE_TEXT, min_size=1, max_size=5)) so the dict
comprehension can filter out None and omit the ignore key; apply the same change
for the other occurrence of ignore_list referenced in the comment.

🧹 Nitpick comments (1)

tests/test_fuzzing.py (1)
146-152: Consider broadening exception handling for robustness.

The test only catches KeyError, but other exceptions like ValueError, TypeError, or strictyaml validation errors may be raised during Manifest construction. If these are expected failures for invalid fuzzed input, they should also be suppressed.
Suggested improvement
 @given(manifest_strategy)
 def test_manifest_can_be_created(data):
     """Validate by attempting to construct a Manifest."""
-    try:
-        Manifest(data)
-    except KeyError:
-        pass
+    with suppress(KeyError, ValueError, TypeError):
+        Manifest(data)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ac79aa and 3915afe.

📒 Files selected for processing (6)

.gitignore
CHANGELOG.rst
dfetch/manifest/schema.py
doc/manifest.rst
pyproject.toml
tests/test_fuzzing.py

🧰 Additional context used

🧬 Code graph analysis (1)

tests/test_fuzzing.py (5)

dfetch/__main__.py (1)

DfetchFatalException (25-26)

dfetch/manifest/manifest.py (3)

Manifest (103-356)

remotes (281-283)

projects (260-262)

dfetch/util/util.py (1)

in_directory (66-75)

dfetch/manifest/remote.py (1)

url (78-80)

dfetch/manifest/project.py (1)

vcs (432-434)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build / build (windows-latest)
GitHub Check: Codacy Static Code Analysis

🔇 Additional comments (12)

.gitignore (1)

17-17: LGTM!

The .hypothesis directory is correctly ignored. This is a standard practice for projects using Hypothesis, as it stores test data and examples that should not be committed to version control.

CHANGELOG.rst (1)

7-8: LGTM!

The changelog entries clearly document the new fuzzing capability and the stricter manifest validation. The issue references (#819, #114) provide good traceability.

dfetch/manifest/schema.py (2)

10-33: Systematic replacement approach is good.

The replacement of Str() with SAFE_STR across all string fields in both REMOTE_SCHEMA and PROJECT_SCHEMA is thorough and consistent. This provides centralized validation once the regex pattern is corrected.

3-3: LGTM!

The import changes correctly reflect the shift from Str() to Regex-based validation.

doc/manifest.rst (1)

20-22: Documentation is correct, but implementation is incomplete.

The documentation correctly states that strings should not contain NULL or control characters. However, the current SAFE_STR regex in dfetch/manifest/schema.py only blocks NULL bytes, not control characters. The implementation must be updated to match this documentation (see comment on schema.py).

pyproject.toml (1)

89-89: No action required—Hypothesis 6.150.0 is the current stable release and has no known security vulnerabilities.

Version 6.150.0 is the latest release on PyPI (released January 6, 2026) with no public security advisories or high-severity CVEs. The pinned version is current and secure.

tests/test_fuzzing.py (6)

1-18: LGTM!

The imports are well-organized and appropriate for the fuzzing test suite.

20-39: LGTM!

The Hypothesis profile configuration is well-structured for different environments. The deadline=None setting is appropriate given the filesystem and subprocess operations in these tests.

117-128: LGTM!

The manifest strategy structure aligns well with the expected manifest schema format.

131-136: LGTM!

Clean helper function that leverages StrictYAML's validation.

155-174: LGTM!

Both test functions follow a consistent and appropriate pattern for fuzzing CLI commands with generated manifests. The use of suppress(DfetchFatalException) correctly handles expected failures from invalid manifest configurations.

177-194: LGTM!

The main block provides useful utilities for manual testing and debugging. The round-trip demonstration validates the strategy produces valid data.

One minor note: manifest_strategy.example() (line 181) bypasses Hypothesis's shrinking and database, but this is acceptable here for demonstration purposes.

dfetch/manifest/schema.py

tests/test_fuzzing.py

ben-edna added 3 commits January 7, 2026 17:33

Add fuzzing

c3e0dea

Only accept safe strings (no strange chars)

d6b0cd9

Use hardcoded schema

49cda94

ben-edna temporarily deployed to testpypi January 7, 2026 17:41 — with GitHub Actions Inactive

Update test_fuzzing.py

c128ed5

fix typo

spoorcc temporarily deployed to testpypi January 7, 2026 17:47 — with GitHub Actions Inactive

Update test_fuzzing.py

3915afe

ignore list should never be empty

spoorcc temporarily deployed to testpypi January 7, 2026 18:23 — with GitHub Actions Inactive

spoorcc marked this pull request as ready for review January 7, 2026 18:32

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

dfetch/manifest/schema.py Outdated Show resolved Hide resolved

tests/test_fuzzing.py Outdated Show resolved Hide resolved

Review comments

6ccc317

spoorcc temporarily deployed to testpypi January 7, 2026 19:16 — with GitHub Actions Inactive

This was linked to issues Jan 7, 2026

Check manifest string parsing #114

Closed

Add fuzzing #819

Closed

spoorcc merged commit a8fb6be into main Jan 7, 2026
41 checks passed

spoorcc deleted the fuzzing branch January 7, 2026 19:53

coderabbitai bot mentioned this pull request Jan 7, 2026

Allow multiple patches in manifest (fixes #897) #931

Merged

coderabbitai bot mentioned this pull request Jan 14, 2026

Various cleanups & fixes #956

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fuzzing #928

Add fuzzing #928

Uh oh!

ben-edna commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add fuzzing #928

Add fuzzing #928

Uh oh!

Conversation

ben-edna commented Jan 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ben-edna commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 7, 2026 •

edited

Loading