-
Notifications
You must be signed in to change notification settings - Fork 587
Reproduce testcase locally #4976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job overall! Thanks for making the code very robust and thoroughly adding types and docstrings.
I'm still a bit worried about running untrusted code locally though, but it is something to discuss in further steps.
os.environ['CONFIG_DIR_OVERRIDE'] = os.path.abspath(args.config_dir) | ||
local_config.ProjectConfig().set_environment() | ||
environment.set_bot_environment() | ||
os.environ['LOG_TO_CONSOLE'] = 'True' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be better to create a method in environment module to set these two env vars (e.g., set_local_log_only
).
Since this is not done in any script currently, I think we should set it in butler.py (or run.py) possibly with an argument to let the logs go to GCP it if the user wants to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could pass an argument to
Line 432 in fb94beb
def _setup(): |
And set it there, what do you think? Something like
def _setup(args):
"""Set up configs and import paths."""
# ...
if args.local_logging:
from clusterfuzz._internal.system import environment
environment.set_local_log_only()
I've addressed this in #4988
Args: | ||
args: Parsed command-line arguments. | ||
""" | ||
os.environ['CONFIG_DIR_OVERRIDE'] = os.path.abspath(args.config_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it would be nice to have all these in a setup method (e.g., _setup_reproduce
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Addressing in #4988
from clusterfuzz._internal.datastore import data_handler | ||
from clusterfuzz._internal.datastore import data_types | ||
from clusterfuzz._internal.datastore import ndb_init | ||
from clusterfuzz._internal.datastore.data_types import Fuzzer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually don't import classes, just modules. (from ...datastore import data_type
).
go/pystyle#imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Addressing in #4988
Returns: | ||
True if setup was successful, False otherwise. | ||
""" | ||
fuzzer: Optional[Fuzzer] = data_types.Fuzzer.query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for typing the code! However, for internal variables, we try to type it only when it is very difficult to infer its type (here the type is in the query itself), as typing all vars would hinder readability and python's flexibility.
go/pystyle#typing-variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for Python 3.10+ it is recommended to use the union type |
instead of the Optional
syntax.
go/pystyle#none-type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Addressing in #4988
_EXECUTABLE_PERMISSIONS = 0o750 | ||
|
||
|
||
def _setup_fuzzer(fuzzer_name: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be oversimplifying things, but I wonder if we could have used methods already available for this. It seems that we are "reinventing the wheel" having to create another setup_fuzzer method, which I imagine copies a lot of the work done in other setup methods.
Maybe, as this is a local thing, we could create a centralized setup_local_fuzzer
in some other utils/fuzzer module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, there is significant overlap with the existing logic in setup.py
.
The key reason for not reusing it directly is that its flow is designed for untrusted tasks and ultimately downloads things from a signed URL:
if not storage.download_signed_url_to_file(update_input.fuzzer_download_url, |
My approach instead follows the pattern used by trusted tasks (like unpack_task
), which use the blobs
module for direct downloads.
I agree that centralizing this logic into a setup_local_fuzzer
is a great idea, and I can refactor it into a shared utils module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
try: | ||
_, testcase_file_path = setup._get_testcase_file_and_path(testcase) | ||
if not blobs.read_blob_to_disk(testcase.fuzzed_keys, testcase_file_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but what about trying to use the minimized_keys
instead of the fuzzed_keys
? How does this work for other tasks, such as progression task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses
def _get_testcase_key_and_archive_status(testcase): |
which checks whether the testcase has minimized keys or not, if so, returns the minimized key. I will reproduce this behavior here. Thanks!
Args: | ||
args: Parsed command-line arguments. | ||
""" | ||
testcase: Optional[Testcase] = data_handler.get_testcase_by_id( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
logs.error(f'Testcase with ID {args.testcase_id} not found.') | ||
return | ||
|
||
job: Optional[Job] = data_types.Job.query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
logs.error(f'Failed to setup fuzzer {testcase.fuzzer_name}. Exiting.') | ||
return | ||
|
||
ok, testcase_file_path = _setup_testcase_locally(testcase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ok
is not a very insightful variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Addressing in #4988
logs.error(f'Fuzzer {fuzzer_name} not found.') | ||
return False | ||
|
||
environment.set_value('UNTRUSTED_CONTENT', fuzzer.untrusted_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a warning to users about running untrusted code on their machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Addressing in #4988
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this feature! :)
+1 to @ViniciustCosta comments.
shell.clear_testcase_directories() | ||
except Exception as e: | ||
logs.error(f'Error clearing testcase directories: {e}') | ||
return False, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, using a separate ok
boolean in a return, such as (ok, value)
, is generally unneeded. It complicates the return type hint and is redundant. Returning None
to signal failure is idiomatic in Python. If value can be None
to indicate an issue, the ok
boolean is superfluous because value is None
already conveys the failure state. For truly exceptional failure conditions, raising custom exceptions is often the most Pythonic approach, as this allows the caller to handle different failure types distinctly.
Also, we don't use this "status return pattern" in the codebase, so this would make it non-standard. Please fix this in your follow-up refactor. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed in #4988
self.mock.open.return_value.__enter__.return_value = self.mock_archive_reader | ||
|
||
# Common mock fuzzer object | ||
self.mock_fuzzer = mock.MagicMock(spec=Fuzzer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always use mock.create_autospec
with instance=True
and spec_set=True
. This makes mocks safer by ensuring they mimic the original object's API, catching errors if methods are misspelled or called with incorrect signatures. This applies generally in the file.
go/python-tips/049
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge thanks for the tip! Applying that on #4988
@@ -0,0 +1,391 @@ | |||
# Copyright 2025 Google LLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing your code’s dependencies with mocks can make unit tests easier to write and faster to run. However, among other problems, using mocks can lead to tests that are less effective at catching bugs. go/tott/697
It's okay to use mocks, but always reflect on whether there is a better option: go/choose-test-double
Writing this comment as a general advice! :)
This PR adds a new
reproduce
command to thebutler
script, enabling local reproduction of a ClusterFuzz testcase. This functionality is a foundational component that will be migrated to a new, standalone CLI tool in the future. The current implementation requires a local clone of the ClusterFuzz repository with all dependencies configured.Changes Made
A new command was added to the
butler
script that reproduces a testcase. Given atestcase-id
and aconfig-dir
, the command performs the following actions:test_for_crash_with_retries
to check for a crash.test_for_reproducibility
and displays the final output.Note: Support for testcase exceptions (e.g., data bundles and launch scripts) is out of scope for this PR and will be addressed in follow-up work.
How to Test Manually
1. Authentication
First, you must authenticate using Google Cloud Application Default Credentials. Execute the following command in your terminal:
gcloud auth application-default login
2. Usage
Once authenticated, run the
reproduce
command with the following format:python butler.py reproduce --config-dir=<path_to_config> --testcase-id=<testcase_id>
Important: Please ensure the testcase you are using does not require a launch script, as this is not yet supported.
Testing
Unit tests for this feature have been added in
/src/clusterfuzz/_internal/tests/core/local/butler/reproduce_test.py
.Future Work
The following improvements are planned as follow-ups to this PR: