Skip to content

Conversation

@oetr
Copy link
Contributor

@oetr oetr commented Oct 10, 2025

This allows the users to annotate fuzz tests methods or any type by @DictionaryProvider and provide values directly to the type mutators. Currently, only String and Integral mutators make use of this feature, but this feature makes sense for other types as well.

The main motivation is to work around libFuzzer's TORC (table of recent compares) limitation of 64 bytes. libFuzzer dictionaries suffer from the same limitation. But with this PR, the issue below is found in no time.

  public static Stream<?> myDict() {
    return Stream.of(
        "0123456789abcdef".repeat(50),
        "sitting duck suprime".repeat(53),
        // We can mix all kinds of values in the same dictionary.
        // Each mutator only takes the values it can use.
        123);
  }

  @FuzzTest
  // Just propagate the dictionary to all types of the fuzz test method that can use it.
  @DictionaryProvider(
      value = {"myDict"},
      // Don't want to wait, force String mutators to use dictionary values every other time.
      pInv = 2)
  public static void fuzzerTestOneInput(
      @NotNull @WithUtf8Length(max = 10000) String data,
      @NotNull @WithUtf8Length(max = 10000) String data2) {
    /*
     * libFuzzer's table of recent compares only allows 64 bytes, so asking the fuzzer to construct
     * these long strings would run for a very very long time without finding them. With a
     * DictionaryProvider this problem is trivial, because we can directly provide these long strings to
     * the fuzzer, and also force that they are used more often by setting pInv to a low value.
     */
    if (data.equals("0123456789abcdef".repeat(50))
        && data2.equals("sitting duck suprime".repeat(53))) {
      throw new FuzzerSecurityIssueLow("Found the long string!");
    }
  }

@oetr oetr force-pushed the CIF-1785-dictionary-provider branch 15 times, most recently from 958d54a to d164e3c Compare October 16, 2025 23:50
@oetr oetr force-pushed the CIF-1785-dictionary-provider branch 3 times, most recently from 5eac7ec to 5069fb3 Compare October 28, 2025 19:20
@oetr oetr changed the title feat: dictionary provider for selected types feat: dictionary provider for Strings and Integrals Oct 28, 2025
@oetr oetr marked this pull request as ready for review October 28, 2025 19:25
Copilot AI review requested due to automatic review settings October 28, 2025 19:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds dictionary provider support to the mutation framework, enabling users to provide custom dictionary values for fuzzing through the @DictionaryProvider annotation. The implementation:

  • Introduces MutatorRuntime class to provide runtime information (including the fuzz test method) to mutators
  • Adds @DictionaryProvider annotation that references static methods returning Stream<?> of dictionary values
  • Updates all MutatorFactory implementations to accept MutatorRuntime parameter
  • Implements dictionary support for String and integral mutators using weighted sampling
  • Adds SamplingUtils with Vose's alias method for efficient O(1) weighted sampling
  • Includes comprehensive tests for the new functionality

Reviewed Changes

Copilot reviewed 85 out of 85 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
DictionaryProvider.java New annotation for specifying dictionary provider methods with probability control
MutatorRuntime.java New runtime info class providing fuzz test method to mutators
DictionaryProviderSupport.java Helper methods to extract dictionary values from provider methods
IgnoreRecursiveConflicts.java Meta-annotation to allow duplicate annotations during type hierarchy propagation
SamplingUtils.java Weighted sampling utilities using Vose's alias method
StringMutatorFactory.java Implements dictionary support for String mutators
IntegralMutatorFactory.java Implements dictionary support for integral type mutators
MutatorFactory.java Updated interface to accept MutatorRuntime parameter
ArgumentsMutator.java Forwards method-level @DictionaryProvider annotations to parameters
TestSupport.java Adds helper methods for creating dummy MutatorRuntime in tests
All other factory files Updated to pass through MutatorRuntime parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@oetr oetr force-pushed the CIF-1785-dictionary-provider branch 3 times, most recently from 87f2aba to 41633c8 Compare October 30, 2025 12:29
oetr added 5 commits October 30, 2025 13:30
In addition to primitive arrays, these types are now also supported:
- List<Integer> []
- List<Integer> [][][]
Allow annotations to be inherited at multiple levels of the type hierarchy,
enabling both broad and specific configuration of mutators.

Use case: Configure mutators that share common types. For example, annotate
a fuzz test method to apply default settings to all String mutators, while
still allowing individual String parameters to override those settings with
different values.

Without this feature, an annotation could only appear once in the inheritance
chain, preventing this layered configuration approach.
Enables easy tweaking of probabilities for indidual mutation functions
in the future.
For now it only stores the fuzz test method
@oetr oetr force-pushed the CIF-1785-dictionary-provider branch from 41633c8 to 0044fb9 Compare October 30, 2025 12:30
oetr added 6 commits October 30, 2025 13:42
This is just the enabling work. Methods and types annotated by
@DictionaryProvider recursively propagate this annotation down the
type hierarchy by default (can set to be for the annotated type only).

Any mutator can now be adapted to use the user-provided values
this annotation points to.
The StringMutatorFactory now extracts applicable Strings from the
@DictionaryProvider and uses them during mutation according to
the pInv of the last @DictionaryProvider annotation it found on this type.
After adding @DictionaryProvider to IntegralMutatorFactory, the selection of
mutation functions now does an addition step that runs through
weightedSampler, that selects whether to stay in the selection or
do an additional step and select the alias.
Some tests have too strict expectations on mutator output and are way
off from their true probabilities, and simply running the stress test for
more iterations, or with a different seed will result in failed tests
due to variance.
Changing the usage of PRNG in the mutators can affect duration of some
tests. Slow GH runners are especially affected.
@oetr oetr force-pushed the CIF-1785-dictionary-provider branch from 0044fb9 to 84707f2 Compare October 30, 2025 12:42
Copy link
Contributor

@simonresch simonresch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool feature!

I'm not entirely sure that the ability to have different dictionaries per field warrants the added complexity & state in the MutatorRuntime. Is this an essential part of the custom dictionary in your opinion? Having a single @DictionaryProvider per fuzz test could simplify the mutator instantiation and maybe even make it simpler to use for users.
Otherwise my main concern would be the duplication with DictionaryEntry and similar annotations (see other comment).

* The values don't need to match the type of the annotated method or parameter exactly. The
* mutation framework will extract only the values that are compatible with the target type.
*/
String[] value() default {""};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message could be better if we remove the default value which forces users to specify value.

* This {@code DictionaryProvider} will be used with probability {@code 1/p} by the mutator
* responsible for fitting types. Not all mutators respect this probability.
*/
int pInv() default 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this make sense as a float setting?

Comment on lines +58 to +62
DictionaryProvider[] providers =
Arrays.stream(type.getAnnotations())
.filter(a -> a instanceof DictionaryProvider)
.map(a -> (DictionaryProvider) a)
.toArray(DictionaryProvider[]::new);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the getAnnotationsByType work here as well?

Arrays.stream(providers)
.map(DictionaryProvider::value)
.flatMap(Arrays::stream)
.filter(name -> !name.isEmpty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Why filter out empty strings? IMO an error message that no such method is found would be cleaner.

if (providerMethodNames.isEmpty()) {
return Optional.empty();
}
Map<String, Method> fuzzTestMethods =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this variable threw me off when reading this. Could something like dictionaryProviderMethods or similar work? fuzzTestMethods sounds like the method annotated with @FuzzTest.

throw validationError;
}
MutationRuntime.fuzzTestMethod = method;
DictionaryProvider[] typeDictionaries = method.getAnnotationsByType(DictionaryProvider.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This probably belongs to the next commit.

.collect(Collectors.toCollection(HashSet::new));
for (Annotation annotation : extraAnnotations) {
boolean added = existingAnnotationTypes.add(annotation.annotationType());
if (annotation.annotationType().isAnnotationPresent(IgnoreRecursiveConflicts.class)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the entire purpose of this function seems to be checking for the recursive conflict maybe we can exit early in the first lines of this function if IgnoreRecursiveConflicts.class is set?

.map(
elementMutator ->
new ArrayMutator<>(elementMutator, propagatedElementClazz, minLength, maxLength));
Type rawType = propagatedElementType.getType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change relevant to the DictionaryProvider? If not, it could make sense to split it from this PR.

import java.lang.annotation.Target;

/**
* Provides dictionary values to user-selected mutator types. Currently supported mutators are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that users might be confused on when to use the DictionaryProvider and when to use other dictionary annotations like DictionaryEntry, DictionaryFile.
"user-selected mutator types" might also not be descriptive enough and end with users annotating a myFuzzTest(byte[] data) fuzz test with @DictionaryProvider without realizing that the values are not used.

Would it make sense to add the @DictionaryProvider values in addition to the libFuzzer dictionary file? Otherwise I could foresee situations where one wants to specify the same dictionary entries with @DictionaryProvider and @DictionaryEntry.

return Optional.empty();
}
Map<String, Method> fuzzTestMethods =
Arrays.stream(MutationRuntime.fuzzTestMethod.getDeclaringClass().getDeclaredMethods())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing global state in an unsuspecting *Support function is quite surprising IMO. Could we perform the "collect all possible dictionary provider methods" once before constructing the mutator and then pass those in (e.g. via mutator factory constructor)? Maybe even a dedicated mutator in the chain of possible mutators that only handles the custom dictionary and then delegates to an underlying type mutator?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants