Skip to content

Move to a typed choice sequence mutator#53

Merged
Liam-DeVoe merged 8 commits intoZac-HD:masterfrom
Liam-DeVoe:tcs-mutator
Feb 6, 2025
Merged

Move to a typed choice sequence mutator#53
Liam-DeVoe merged 8 commits intoZac-HD:masterfrom
Liam-DeVoe:tcs-mutator

Conversation

@Liam-DeVoe
Copy link
Copy Markdown
Collaborator

Ditches the buffer mutator in favor of an (extremely simple) tcs mutator.

TBH I have relatively low confidence that the logic in corpus.py is correct after this change. I see more test writing in my future...

@Liam-DeVoe
Copy link
Copy Markdown
Collaborator Author

yeah in fact adding hypothesis to the tox -e check job dependencies turns up several bugs in my implementation - mostly mixing args of types IRNode and ChoiceT in cov.py. I will add some real tests here before considering merging

@Liam-DeVoe Liam-DeVoe force-pushed the tcs-mutator branch 2 times, most recently from af84f1a to 5dc390c Compare February 5, 2025 18:51
@Liam-DeVoe
Copy link
Copy Markdown
Collaborator Author

OK, I'm much more confident in these changes now that we have some tests. Still have some type hints to take care of.

Copy link
Copy Markdown
Owner

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Comment on lines 58 to +61
is_hypothesis_file = belongs_to(hypothesis)
is_hypofuzz_file = belongs_to(hypofuzz)
is_pytest_file = belongs_to(pytest)
is__pytest_file = belongs_to(_pytest)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we should check that belongs_to() memoizes so that the returned functions share a cache.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each belongs_to has its own cache, not shared. Would sharing help? Knowing that an fpath: str does or doesn't belong to pytest doesn't tell us anything about whether it belongs to _pytest, right?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we have is_hypothesis_file = belongs_to(hypothesis) defined in both Hypothesis and HypoFuzz, and iirc those are cached separately. Not a big deal either way, of course.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Yeah, we probably should make belongs_to share a cache, and make it lru while we're at it.

return getattr(cd.provider, f"draw_{choice_type}")(**kwargs)


class HypofuzzProvider(PrimitiveProvider):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that #36 will involve going notably further, in putting much of the corpus logic etc. into the provider, and setting lifetime = "test_function". That shouldn't be in this PR, just want to avoid any confusion for future readers.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll see how much we can move into the provider...Hypofuzz wants control over e.g. intercepting interesting results, and how many examples are generated, so I'm not sure we can fully move to settings(backend="hypofuzz")(test_function)(**extra_kw). I think we either give up control over control flow from hypofuzz, or add provider hooks that alter control flow (on_failing_example_found which can raise a control flow exception "actually, I don't want you to shrink this"). I'd prefer not to do the latter; I think provider should be almost entirely read-only, not explicitly writing back to hypothesis control flow.

Fully moving to a provider could be ok if we accept the flow is "when a process fuzzes f, run a constant 1k examples and then defer back to our process pool, and any failures are shrunk immediately".

...except now that I write this, I don't know how we would run a distill phase with this proposed structure. We'd need to keep a reference to the underlying function/kwargs around, and avoiding that is kinda the point of the backend.

@Liam-DeVoe Liam-DeVoe merged commit 412015d into Zac-HD:master Feb 6, 2025
13 checks passed
@Liam-DeVoe Liam-DeVoe deleted the tcs-mutator branch February 6, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants