-
Notifications
You must be signed in to change notification settings - Fork 162
feat: dictionary provider for Strings and Integrals #975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
958d54a to
d164e3c
Compare
5eac7ec to
5069fb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds dictionary provider support to the mutation framework, enabling users to provide custom dictionary values for fuzzing through the @DictionaryProvider annotation. The implementation:
- Introduces
MutatorRuntimeclass to provide runtime information (including the fuzz test method) to mutators - Adds
@DictionaryProviderannotation that references static methods returningStream<?>of dictionary values - Updates all
MutatorFactoryimplementations to acceptMutatorRuntimeparameter - Implements dictionary support for String and integral mutators using weighted sampling
- Adds
SamplingUtilswith Vose's alias method for efficient O(1) weighted sampling - Includes comprehensive tests for the new functionality
Reviewed Changes
Copilot reviewed 85 out of 85 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
DictionaryProvider.java |
New annotation for specifying dictionary provider methods with probability control |
MutatorRuntime.java |
New runtime info class providing fuzz test method to mutators |
DictionaryProviderSupport.java |
Helper methods to extract dictionary values from provider methods |
IgnoreRecursiveConflicts.java |
Meta-annotation to allow duplicate annotations during type hierarchy propagation |
SamplingUtils.java |
Weighted sampling utilities using Vose's alias method |
StringMutatorFactory.java |
Implements dictionary support for String mutators |
IntegralMutatorFactory.java |
Implements dictionary support for integral type mutators |
MutatorFactory.java |
Updated interface to accept MutatorRuntime parameter |
ArgumentsMutator.java |
Forwards method-level @DictionaryProvider annotations to parameters |
TestSupport.java |
Adds helper methods for creating dummy MutatorRuntime in tests |
| All other factory files | Updated to pass through MutatorRuntime parameter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/main/java/com/code_intelligence/jazzer/mutation/utils/IgnoreRecursiveConflicts.java
Outdated
Show resolved
Hide resolved
src/main/java/com/code_intelligence/jazzer/mutation/utils/IgnoreRecursiveConflicts.java
Outdated
Show resolved
Hide resolved
tests/src/test/java/com/example/DictionaryProviderFuzzerLongString.java
Outdated
Show resolved
Hide resolved
...in/java/com/code_intelligence/jazzer/mutation/mutator/libfuzzer/LibFuzzerMutatorFactory.java
Outdated
Show resolved
Hide resolved
87f2aba to
41633c8
Compare
In addition to primitive arrays, these types are now also supported: - List<Integer> [] - List<Integer> [][][]
Allow annotations to be inherited at multiple levels of the type hierarchy, enabling both broad and specific configuration of mutators. Use case: Configure mutators that share common types. For example, annotate a fuzz test method to apply default settings to all String mutators, while still allowing individual String parameters to override those settings with different values. Without this feature, an annotation could only appear once in the inheritance chain, preventing this layered configuration approach.
Enables easy tweaking of probabilities for indidual mutation functions in the future.
For now it only stores the fuzz test method
41633c8 to
0044fb9
Compare
This is just the enabling work. Methods and types annotated by @DictionaryProvider recursively propagate this annotation down the type hierarchy by default (can set to be for the annotated type only). Any mutator can now be adapted to use the user-provided values this annotation points to.
The StringMutatorFactory now extracts applicable Strings from the @DictionaryProvider and uses them during mutation according to the pInv of the last @DictionaryProvider annotation it found on this type.
After adding @DictionaryProvider to IntegralMutatorFactory, the selection of mutation functions now does an addition step that runs through weightedSampler, that selects whether to stay in the selection or do an additional step and select the alias.
Some tests have too strict expectations on mutator output and are way off from their true probabilities, and simply running the stress test for more iterations, or with a different seed will result in failed tests due to variance.
Changing the usage of PRNG in the mutators can affect duration of some tests. Slow GH runners are especially affected.
0044fb9 to
84707f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool feature!
I'm not entirely sure that the ability to have different dictionaries per field warrants the added complexity & state in the MutatorRuntime. Is this an essential part of the custom dictionary in your opinion? Having a single @DictionaryProvider per fuzz test could simplify the mutator instantiation and maybe even make it simpler to use for users.
Otherwise my main concern would be the duplication with DictionaryEntry and similar annotations (see other comment).
| * The values don't need to match the type of the annotated method or parameter exactly. The | ||
| * mutation framework will extract only the values that are compatible with the target type. | ||
| */ | ||
| String[] value() default {""}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message could be better if we remove the default value which forces users to specify value.
| * This {@code DictionaryProvider} will be used with probability {@code 1/p} by the mutator | ||
| * responsible for fitting types. Not all mutators respect this probability. | ||
| */ | ||
| int pInv() default 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this make sense as a float setting?
| DictionaryProvider[] providers = | ||
| Arrays.stream(type.getAnnotations()) | ||
| .filter(a -> a instanceof DictionaryProvider) | ||
| .map(a -> (DictionaryProvider) a) | ||
| .toArray(DictionaryProvider[]::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the getAnnotationsByType work here as well?
| Arrays.stream(providers) | ||
| .map(DictionaryProvider::value) | ||
| .flatMap(Arrays::stream) | ||
| .filter(name -> !name.isEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Why filter out empty strings? IMO an error message that no such method is found would be cleaner.
| if (providerMethodNames.isEmpty()) { | ||
| return Optional.empty(); | ||
| } | ||
| Map<String, Method> fuzzTestMethods = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this variable threw me off when reading this. Could something like dictionaryProviderMethods or similar work? fuzzTestMethods sounds like the method annotated with @FuzzTest.
| throw validationError; | ||
| } | ||
| MutationRuntime.fuzzTestMethod = method; | ||
| DictionaryProvider[] typeDictionaries = method.getAnnotationsByType(DictionaryProvider.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: This probably belongs to the next commit.
| .collect(Collectors.toCollection(HashSet::new)); | ||
| for (Annotation annotation : extraAnnotations) { | ||
| boolean added = existingAnnotationTypes.add(annotation.annotationType()); | ||
| if (annotation.annotationType().isAnnotationPresent(IgnoreRecursiveConflicts.class)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the entire purpose of this function seems to be checking for the recursive conflict maybe we can exit early in the first lines of this function if IgnoreRecursiveConflicts.class is set?
| .map( | ||
| elementMutator -> | ||
| new ArrayMutator<>(elementMutator, propagatedElementClazz, minLength, maxLength)); | ||
| Type rawType = propagatedElementType.getType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change relevant to the DictionaryProvider? If not, it could make sense to split it from this PR.
| import java.lang.annotation.Target; | ||
|
|
||
| /** | ||
| * Provides dictionary values to user-selected mutator types. Currently supported mutators are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that users might be confused on when to use the DictionaryProvider and when to use other dictionary annotations like DictionaryEntry, DictionaryFile.
"user-selected mutator types" might also not be descriptive enough and end with users annotating a myFuzzTest(byte[] data) fuzz test with @DictionaryProvider without realizing that the values are not used.
Would it make sense to add the @DictionaryProvider values in addition to the libFuzzer dictionary file? Otherwise I could foresee situations where one wants to specify the same dictionary entries with @DictionaryProvider and @DictionaryEntry.
| return Optional.empty(); | ||
| } | ||
| Map<String, Method> fuzzTestMethods = | ||
| Arrays.stream(MutationRuntime.fuzzTestMethod.getDeclaringClass().getDeclaredMethods()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing global state in an unsuspecting *Support function is quite surprising IMO. Could we perform the "collect all possible dictionary provider methods" once before constructing the mutator and then pass those in (e.g. via mutator factory constructor)? Maybe even a dedicated mutator in the chain of possible mutators that only handles the custom dictionary and then delegates to an underlying type mutator?
This allows the users to annotate fuzz tests methods or any type by
@DictionaryProviderand provide values directly to the type mutators. Currently, onlyStringand Integral mutators make use of this feature, but this feature makes sense for other types as well.The main motivation is to work around libFuzzer's TORC (table of recent compares) limitation of 64 bytes. libFuzzer dictionaries suffer from the same limitation. But with this PR, the issue below is found in no time.