feat: --image-source flag to add noise generation and custom dirs by ajcasagrande · Pull Request #710 · ai-dynamo/aiperf

ajcasagrande · 2026-02-25T16:58:21Z

Move AudioDefaults, ImageDefaults, and VideoDefaults out of the shared config_defaults module and into their respective config classes to improve cohesion. Add ImageSource enum and --image-source CLI option to support noise-based and custom directory image generation modes alongside the existing assets-based approach.

Summary by CodeRabbit

Release Notes

New Features
- Added --image-source CLI option supporting multiple image input modes: bundled assets, dynamically generated noise, or custom directory paths.
- Expanded image generation with flexible source selection and improved defaults for audio, image, and video configurations.
Documentation
- Added documentation for the new image source CLI option.
Bug Fixes
- Enhanced validation to prevent misconfigured media generation settings and provide clearer error guidance.

…lasses Move AudioDefaults, ImageDefaults, and VideoDefaults out of the shared config_defaults module and into their respective config classes to improve cohesion. Add ImageSource enum and --image-source CLI option to support noise-based and custom directory image generation modes alongside the existing assets-based approach. Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

github-actions · 2026-02-25T16:58:34Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8a9f290d5281dbcdfca264bdfa44b3e7bc869a0b

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@8a9f290d5281dbcdfca264bdfa44b3e7bc869a0b

Last updated for commit: 8a9f290 • Browse code

codecov · 2026-02-25T17:06:06Z

Codecov Report

❌ Patch coverage is 94.80519% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/aiperf/common/config/video_config.py	75.00%	1 Missing and 1 partial ⚠️
src/aiperf/dataset/generator/image.py	94.44%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

ajcasagrande · 2026-02-25T17:07:08Z

src/aiperf/common/config/__init__.py

 from aiperf.common.config.base_config import BaseConfig
 from aiperf.common.config.cli_parameter import CLIParameter, DisableCLI
 from aiperf.common.config.config_defaults import (
-    AudioDefaults,


I removed the defaults file and inline the defaults directly to reduce cognitive load

I thought we generally like relegating magic numbers into these default classes?

coderabbitai · 2026-02-25T17:07:48Z

Walkthrough

This pull request refactors media generation configuration by removing centralized defaults classes and embedding defaults directly into config fields. It introduces validation methods ensuring generation is only enabled when required options are provided, adds an ImageSource enum supporting multiple image generation modes (ASSETS, NOISE, directory paths), and updates the image generator to dynamically select image sources. Tests are updated to reflect these structural changes.

Changes

Cohort / File(s)	Summary
Configuration Defaults Removal `src/aiperf/common/config/__init__.py`, `src/aiperf/common/config/config_defaults.py`	Removed exports of `AudioDefaults`, `ImageDefaults`, and `VideoDefaults` classes along with associated enum imports (`AudioFormat`, `ImageFormat`, `VideoFormat`, `VideoSynthType`), consolidating defaults into individual config modules.
Audio Configuration `src/aiperf/common/config/audio_config.py`	Added inline field defaults (batch_size, format, depths, sample_rates, num_channels, length.mean, length.stddev), introduced `audio_enabled()` helper and `_validate_audio_options()` validator to enforce that audio options require generation to be enabled.
Image Configuration `src/aiperf/common/config/image_config.py`	Added inline field defaults (batch_size, format, height/width mean/stddev), introduced new `source` field supporting `ImageSource` enum or filesystem `Path`, added `images_enabled()` helper and `_validate_image_options()` validator.
Video Configuration `src/aiperf/common/config/video_config.py`	Added inline field defaults across all public fields (batch_size, duration, fps, dimensions, synth_type, format, codec), introduced `videos_enabled()` helper and `_validate_video_options()` validator; removed `VideoDefaults` dependency.
Enums Expansion `src/aiperf/common/enums/enums.py`, `src/aiperf/common/enums/__init__.py`	Added new `ImageSource` enum with `ASSETS` and `NOISE` members, exported publicly.
Random Generator `src/aiperf/common/random_generator.py`	Updated `integers()` method signature to accept optional `dtype` parameter (default `np.int64`) and forward to NumPy's `Generator.integers()`.
Synthesis and Generation `src/aiperf/dataset/composer/synthetic.py`, `src/aiperf/dataset/generator/image.py`	Refactored generation enables checks to use new helper methods (`images_enabled()`, `audio_enabled()`, `videos_enabled()`); image generator now supports three source modes with dynamic loading via `_load_source_images_from_disk()`, `_create_from_source_images()`, and `_create_from_noise()` methods.
Configuration and Composer Tests `tests/unit/common/config/test_audio_config.py`, `tests/unit/common/config/test_image_config.py`, `tests/unit/dataset/composer/test_synthetic_composer.py`	Updated tests to remove `AudioDefaults` and `ImageDefaults` imports, added test coverage for new `*_enabled()` helpers and validation logic, verified configuration defaults align with inlined values.
Image and Video Generator Tests `tests/unit/dataset/generator/test_image_generator.py`, `tests/unit/dataset/generator/test_video_generator.py`	Expanded image generator tests to cover new source modes (`ImageSource.NOISE`, custom directory paths), replaced `_sample_source_image()` calls with `_create_from_source_images()`, switched video config construction to use `model_construct()` in edge-case test.
Documentation `docs/cli_options.md`	Added CLI documentation for new `--image-source` option describing ASSETS, NOISE, and custom directory path modes with examples.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 No more scattered defaults to find,
Inline they nestle, by design!
ASSETS, NOISE, and paths galore,
Images dance with options in store,
Validation ensures all is right,
Configuration shines crystal bright! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.97% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main change: adding a new --image-source flag for noise generation and custom directory support.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

tests/unit/dataset/generator/test_image_generator.py (1)

356-362: Verify that not hasattr(generator, "config") is the intended contract for disabled generators.

This test asserts that ImageGenerator does not set self.config when images are disabled (early return in __init__). This is a somewhat fragile assertion—if BaseGenerator.__init__ or any future refactor sets config, this test breaks. Consider asserting a more explicit disabled-state indicator if one exists, or adding a brief comment documenting why this assertion is appropriate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/dataset/generator/test_image_generator.py` around lines 356 - 362,
The test currently relies on the absence of an attribute by asserting not
hasattr(generator, "config"), which is fragile if BaseGenerator.__init__ or
future refactors add that attribute; update the test in
TestImageGeneratorDisabled.test_disabled_images_skips_init to assert an explicit
disabled-state indicator instead (e.g., assert generator.disabled is True or
assert generator.enabled is False or check generator.initialized is False) using
whatever explicit flag exists on ImageGenerator/ImageConfig, and if no such flag
exists either add a short comment explaining why the hasattr check is the
intended contract or add a small explicit property (e.g., disabled or
initialized) in ImageGenerator.__init__ to signal the early-return state and
assert that instead.

src/aiperf/dataset/composer/synthetic.py (1)

30-39: Pre-existing: validation doesn't account for video-only payloads.

The guard at lines 30-34 raises if prompt, image, and audio are all disabled—but doesn't check self.include_video. If video is the only enabled input, this would incorrectly raise ValueError. This is pre-existing (not introduced by this PR), but worth noting since include_video was just wired up.

Proposed fix

         if (
             not self.include_prompt
             and not self.include_image
             and not self.include_audio
+            and not self.include_video
         ):
             raise ValueError(
                 "All synthetic data are disabled. "
-                "Please enable at least one of prompt, image, or audio by "
+                "Please enable at least one of prompt, image, audio, or video by "
                 "setting the mean to a positive value."
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/aiperf/dataset/composer/synthetic.py` around lines 30 - 39, The
validation that raises ValueError when all synthetic inputs are disabled
currently checks only self.include_prompt, self.include_image, and
self.include_audio; update that guard in the synthetic data validation block to
also consider self.include_video (i.e., require at least one of
self.include_prompt, self.include_image, self.include_audio, or
self.include_video to be True) so a video-only payload does not incorrectly
trigger the error; modify the conditional that constructs the exception to
include self.include_video or refactor to assert any([self.include_prompt,
self.include_image, self.include_audio, self.include_video]) before raising.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/aiperf/common/config/audio_config.py`:
- Around line 155-158: The current check raises whenever audio options were
provided while audio_enabled() is False, which wrongly rejects an explicit
disable via batch_size==0; update the logic around the raise to allow the
explicit disable case by: compute whether only batch_size was provided (e.g.,
only_batch_size_provided = audio_options_set and all other audio option fields
like audio_length_mean are unset/None), then change the condition to raise only
when not self.audio_enabled() and audio_options_set and not (self.batch_size ==
0 and only_batch_size_provided); reference the existing audio_enabled(),
audio_options_set, and self.batch_size symbols when making this change.

In `@src/aiperf/common/config/image_config.py`:
- Around line 167-170: The validator is rejecting an explicit disable via
"--image-batch-size 0"; update the check around images_enabled() so providing
image options is allowed when the user explicitly set batch size to 0.
Concretely, in the block that raises when not self.images_enabled() and
image_options_set, add a condition to skip the error if the batch-size flag is
explicitly zero (e.g. check self.image_batch_size == 0 or the equivalent
attribute/flag used to store --image-batch-size); alternatively implement
images_enabled() to return False when batch size == 0 and ensure the raise only
happens if images are truly enabled. Reference images_enabled(),
image_options_set and the image-batch-size flag/attribute when making the
change.

In `@src/aiperf/common/config/video_config.py`:
- Around line 35-38: The current check raises when video generation is disabled
but any video option was provided, which incorrectly rejects the explicit
disable flag `--video-batch-size 0`; update the conditional in the validator so
it does not raise if the user explicitly set video_batch_size to 0. Concretely,
change the guard using videos_enabled() and video_options_set to also allow the
case self.video_batch_size == 0 (e.g., only raise if not self.videos_enabled()
and video_options_set and self.video_batch_size != 0), referencing
videos_enabled(), video_options_set, and the video_batch_size attribute.

In `@src/aiperf/common/random_generator.py`:
- Around line 122-134: The integers method is missing type annotations for size,
dtype and the return value; update the integers signature in class
RandomGenerator (method integers) to annotate size as Optional[Union[int,
Tuple[int, ...]]], dtype as Optional[np.typing.DTypeLike] (or np.dtype | None),
and the return type as Union[int, np.ndarray]; also add the necessary imports
(Optional, Union, Tuple from typing and np.typing as needed) so the function has
full parameter and return type hints.

In `@src/aiperf/dataset/generator/image.py`:
- Around line 67-77: The code currently collects all directory entries into
image_paths and attempts to open each with Image.open causing failures on
subdirectories or non-image files; update the logic in the generator so
image_paths only includes regular files with image-like extensions or add a safe
open loop that checks Path(path).is_file() and wraps Image.open(path) in a
try/except (catching OSError/UnidentifiedImageError) to skip non-image files,
appending only successfully opened images into the images list; reference
image_paths, source_path, Image.open, and images to locate where to apply the
filtering and error-handling change.
- Around line 34-37: In the ImageGenerator constructor (__init__) do not return
before required instance state is set; instead always initialize self.config and
any source/factory attributes (e.g., image source factory) and set an enabled
flag (e.g., self.images_enabled = config.images_enabled()) so generate() can
safely inspect state; if images are disabled, log via self.debug and early-exit
behavior should rely on the flag rather than leaving members uninitialized so
later calls to generate() won't raise AttributeError.
- Around line 60-79: The synchronous disk I/O in _load_source_images_from_disk
(glob + Image.open + img.copy) must be moved out of __init__ into an
async-friendly lifecycle: create an async method (e.g., async_load_source_images
or load_source_images_async) that performs the work on a threadpool (use
asyncio.to_thread to call the existing _load_source_images_from_disk logic or
move the per-file open/copy into the thread) and update any startup code to
await that method from the appropriate async initialization hook; ensure the
debug(...) call remains after loading and that exceptions
(FileNotFoundError/NotADirectoryError/ValueError) are preserved and raised from
the async loader so callers can handle them.

---

Nitpick comments:
In `@src/aiperf/dataset/composer/synthetic.py`:
- Around line 30-39: The validation that raises ValueError when all synthetic
inputs are disabled currently checks only self.include_prompt,
self.include_image, and self.include_audio; update that guard in the synthetic
data validation block to also consider self.include_video (i.e., require at
least one of self.include_prompt, self.include_image, self.include_audio, or
self.include_video to be True) so a video-only payload does not incorrectly
trigger the error; modify the conditional that constructs the exception to
include self.include_video or refactor to assert any([self.include_prompt,
self.include_image, self.include_audio, self.include_video]) before raising.

In `@tests/unit/dataset/generator/test_image_generator.py`:
- Around line 356-362: The test currently relies on the absence of an attribute
by asserting not hasattr(generator, "config"), which is fragile if
BaseGenerator.__init__ or future refactors add that attribute; update the test
in TestImageGeneratorDisabled.test_disabled_images_skips_init to assert an
explicit disabled-state indicator instead (e.g., assert generator.disabled is
True or assert generator.enabled is False or check generator.initialized is
False) using whatever explicit flag exists on ImageGenerator/ImageConfig, and if
no such flag exists either add a short comment explaining why the hasattr check
is the intended contract or add a small explicit property (e.g., disabled or
initialized) in ImageGenerator.__init__ to signal the early-return state and
assert that instead.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3bc710b and 8a9f290.

📒 Files selected for processing (16)

docs/cli_options.md
src/aiperf/common/config/__init__.py
src/aiperf/common/config/audio_config.py
src/aiperf/common/config/config_defaults.py
src/aiperf/common/config/image_config.py
src/aiperf/common/config/video_config.py
src/aiperf/common/enums/__init__.py
src/aiperf/common/enums/enums.py
src/aiperf/common/random_generator.py
src/aiperf/dataset/composer/synthetic.py
src/aiperf/dataset/generator/image.py
tests/unit/common/config/test_audio_config.py
tests/unit/common/config/test_image_config.py
tests/unit/dataset/composer/test_synthetic_composer.py
tests/unit/dataset/generator/test_image_generator.py
tests/unit/dataset/generator/test_video_generator.py

💤 Files with no reviewable changes (3)

tests/unit/dataset/composer/test_synthetic_composer.py
src/aiperf/common/config/init.py
src/aiperf/common/config/config_defaults.py

coderabbitai · 2026-02-25T17:07:51Z

src/aiperf/common/config/audio_config.py

+        if not self.audio_enabled() and audio_options_set:
+            raise ValueError(
+                "Audio generation is disabled but audio options were provided. Please set `--audio-batch-size` and `--audio-length-mean` to enable audio generation."
+            )


⚠️ Potential issue | 🟠 Major

batch_size=0 should be accepted as an explicit disabled state.

Line 155 treats any explicitly set audio option as invalid when disabled, which also rejects batch_size=0. Since audio_enabled() uses batch_size > 0, this should be a valid explicit disable path.

Proposed fix

`@model_validator`(mode="after") def _validate_audio_options(self) -> Self: """Validate the audio options.""" + if self.batch_size == 0: + return self + audio_options_set = {*self.model_fields_set, *self.length.model_fields_set} if not self.audio_enabled() and audio_options_set: raise ValueError( "Audio generation is disabled but audio options were provided. Please set `--audio-batch-size` and `--audio-length-mean` to enable audio generation." ) return self

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 156-158: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiperf/common/config/audio_config.py` around lines 155 - 158, The current check raises whenever audio options were provided while audio_enabled() is False, which wrongly rejects an explicit disable via batch_size==0; update the logic around the raise to allow the explicit disable case by: compute whether only batch_size was provided (e.g., only_batch_size_provided = audio_options_set and all other audio option fields like audio_length_mean are unset/None), then change the condition to raise only when not self.audio_enabled() and audio_options_set and not (self.batch_size == 0 and only_batch_size_provided); reference the existing audio_enabled(), audio_options_set, and self.batch_size symbols when making this change.

coderabbitai · 2026-02-25T17:07:51Z

src/aiperf/common/config/image_config.py

+        if not self.images_enabled() and image_options_set:
+            raise ValueError(
+                "Image generation is disabled but image options were provided. Please set `--image-width-mean` and `--image-height-mean` to enable image generation."
+            )


⚠️ Potential issue | 🟠 Major

Allow explicit disable via --image-batch-size 0.

Line 167 currently rejects explicitly setting batch_size=0, even though Line 116 documents it as a disable mechanism. This makes a valid disable flow fail at validation time.

Proposed fix

`@model_validator`(mode="after") def _validate_image_options(self) -> Self: """Validate the image options.""" + if self.batch_size == 0: + return self + image_options_set = { *self.model_fields_set, *self.width.model_fields_set, *self.height.model_fields_set, } if not self.images_enabled() and image_options_set: raise ValueError( - "Image generation is disabled but image options were provided. Please set `--image-width-mean` and `--image-height-mean` to enable image generation." + "Image generation is disabled but image options were provided. Please set `--image-batch-size` (>0), `--image-width-mean`, and `--image-height-mean` to enable image generation." ) return self

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 168-170: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiperf/common/config/image_config.py` around lines 167 - 170, The validator is rejecting an explicit disable via "--image-batch-size 0"; update the check around images_enabled() so providing image options is allowed when the user explicitly set batch size to 0. Concretely, in the block that raises when not self.images_enabled() and image_options_set, add a condition to skip the error if the batch-size flag is explicitly zero (e.g. check self.image_batch_size == 0 or the equivalent attribute/flag used to store --image-batch-size); alternatively implement images_enabled() to return False when batch size == 0 and ensure the raise only happens if images are truly enabled. Reference images_enabled(), image_options_set and the image-batch-size flag/attribute when making the change.

coderabbitai · 2026-02-25T17:07:51Z

src/aiperf/common/config/video_config.py

+        if not self.videos_enabled() and video_options_set:
+            raise ValueError(
+                "Video generation is disabled but video options were provided. Please set `--video-width` and `--video-height` to enable video generation."
+            )


⚠️ Potential issue | 🟠 Major

Permit explicit disable with --video-batch-size 0.

Line 35 rejects disabled configs whenever any field was explicitly set. That makes --video-batch-size 0 invalid, despite Line 55 documenting it as the disable path.

Proposed fix

`@model_validator`(mode="after") def _validate_video_options(self) -> Self: """Validate the video options.""" + if self.batch_size == 0: + return self + video_options_set = self.model_fields_set if not self.videos_enabled() and video_options_set: raise ValueError( - "Video generation is disabled but video options were provided. Please set `--video-width` and `--video-height` to enable video generation." + "Video generation is disabled but video options were provided. Please set `--video-batch-size` (>0), `--video-width`, and `--video-height` to enable video generation." ) return self

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 36-38: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiperf/common/config/video_config.py` around lines 35 - 38, The current check raises when video generation is disabled but any video option was provided, which incorrectly rejects the explicit disable flag `--video-batch-size 0`; update the conditional in the validator so it does not raise if the user explicitly set video_batch_size to 0. Concretely, change the guard using videos_enabled() and video_options_set to also allow the case self.video_batch_size == 0 (e.g., only raise if not self.videos_enabled() and video_options_set and self.video_batch_size != 0), referencing videos_enabled(), video_options_set, and the video_batch_size attribute.

coderabbitai · 2026-02-25T17:07:51Z

src/aiperf/common/random_generator.py

+    def integers(self, low: int, high: int | None = None, size=None, dtype=np.int64):
        """Generate random integers from [low, high) using NumPy.

        Args:
            low: Lowest integer (inclusive), or if high is None, then [0, low)
            high: Highest integer (exclusive), optional
            size: Output shape, optional
+            dtype: Desired NumPy dtype for the result (default: np.int64)

        Returns:
            Random integer or array of integers
        """
-        return self._numpy_rng.integers(low, high, size=size)
+        return self._numpy_rng.integers(low, high, size=size, dtype=dtype)


⚠️ Potential issue | 🟠 Major

Add missing type hints to integers signature and return type.

This updated method still leaves size, dtype, and the return type unannotated, which breaks the repo’s Python typing rule.

Proposed fix

+from numpy.typing import DTypeLike + - def integers(self, low: int, high: int | None = None, size=None, dtype=np.int64): + def integers( + self, + low: int, + high: int | None = None, + size: int | tuple[int, ...] | None = None, + dtype: DTypeLike = np.int64, + ) -> int | np.ndarray: """Generate random integers from [low, high) using NumPy.

As per coding guidelines: "Add type hints on ALL functions (params and return)".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def integers(self, low: int, high: int | None = None, size=None, dtype=np.int64):

"""Generate random integers from [low, high) using NumPy.

Args:

low: Lowest integer (inclusive), or if high is None, then [0, low)

high: Highest integer (exclusive), optional

size: Output shape, optional

dtype: Desired NumPy dtype for the result (default: np.int64)

Returns:

Random integer or array of integers

"""

return self._numpy_rng.integers(low, high, size=size)

return self._numpy_rng.integers(low, high, size=size, dtype=dtype)

from numpy.typing import DTypeLike

def integers(

self,

low: int,

high: int | None = None,

size: int | tuple[int, ...] | None = None,

dtype: DTypeLike = np.int64,

) -> int | np.ndarray:

"""Generate random integers from [low, high) using NumPy.

Args:

low: Lowest integer (inclusive), or if high is None, then [0, low)

high: Highest integer (exclusive), optional

size: Output shape, optional

dtype: Desired NumPy dtype for the result (default: np.int64)

Returns:

Random integer or array of integers

"""

return self._numpy_rng.integers(low, high, size=size, dtype=dtype)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiperf/common/random_generator.py` around lines 122 - 134, The integers method is missing type annotations for size, dtype and the return value; update the integers signature in class RandomGenerator (method integers) to annotate size as Optional[Union[int, Tuple[int, ...]]], dtype as Optional[np.typing.DTypeLike] (or np.dtype | None), and the return type as Union[int, np.ndarray]; also add the necessary imports (Optional, Union, Tuple from typing and np.typing as needed) so the function has full parameter and return type hints.

coderabbitai · 2026-02-25T17:07:52Z

src/aiperf/dataset/generator/image.py

+        if not config.images_enabled():
+            self.debug(lambda: "Images are disabled, skipping image generation")
+            return
+


⚠️ Potential issue | 🟠 Major

Avoid returning from __init__ before required state is initialized.

Line 34 exits constructor before key members are guaranteed (self.config, source factory state). If generate() is called later, this can fail with AttributeError instead of a controlled disabled-path error.

Proposed fix

def __init__(self, config: ImageConfig, **kwargs): super().__init__(**kwargs) + self.config = config + self._images_enabled = config.images_enabled() - if not config.images_enabled(): + if not self._images_enabled: self.debug(lambda: "Images are disabled, skipping image generation") return # Separate RNGs for independent concerns self._dimensions_rng = rng.derive("dataset.image.dimensions") self._format_rng = rng.derive("dataset.image.format") - self.config = config

def generate(self, *args, **kwargs) -> str: + if not self._images_enabled: + raise RuntimeError("Image generation is disabled for the current configuration")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiperf/dataset/generator/image.py` around lines 34 - 37, In the ImageGenerator constructor (__init__) do not return before required instance state is set; instead always initialize self.config and any source/factory attributes (e.g., image source factory) and set an enabled flag (e.g., self.images_enabled = config.images_enabled()) so generate() can safely inspect state; if images are disabled, log via self.debug and early-exit behavior should rely on the flag rather than leaving members uninitialized so later calls to generate() won't raise AttributeError.

coderabbitai · 2026-02-25T17:07:52Z

src/aiperf/dataset/generator/image.py

+    def _load_source_images_from_disk(self, source_path: Path) -> list[Image.Image]:
+        """Load source images from the given directory."""
+        if not source_path.exists():
+            raise FileNotFoundError(f"The directory '{source_path}' does not exist.")
+        if not source_path.is_dir():
+            raise NotADirectoryError(f"The path '{source_path}' is not a directory.")
+
+        image_paths = sorted(glob.glob(str(source_path / "*")))
        if not image_paths:
            raise ValueError(
-                f"No source images found in '{source_images_dir}'. "
-                "Please ensure the source_images directory contains at least one image file."
+                f"No source images found in '{source_path}'. "
+                "Please ensure the directory contains at least one image file."
            )

-        self._source_images = []
+        images: list[Image.Image] = []
        for path in image_paths:
            with Image.open(path) as img:
-                self._source_images.append(img.copy())
-        self.debug(
-            lambda: f"Pre-loaded {len(self._source_images)} source images into memory"
-        )
+                images.append(img.copy())
+        self.debug(lambda: f"Pre-loaded {len(images)} source images from disk")
+        return images


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's check if the file exists and read the relevant lines if [ -f "src/aiperf/dataset/generator/image.py" ]; then wc -l src/aiperf/dataset/generator/image.py echo "=== Lines 50-90 ===" sed -n '50,90p' src/aiperf/dataset/generator/image.py else echo "File not found, searching for it..." find . -name "image.py" -path "*/dataset/generator/*" fi

Repository: ai-dynamo/aiperf

Length of output: 1998

🏁 Script executed:

#!/bin/bash # Check the full class structure to understand the context if [ -f "src/aiperf/dataset/generator/image.py" ]; then echo "=== Full file (checking size first) ===" wc -l src/aiperf/dataset/generator/image.py echo "=== Class definition and method signatures ===" sed -n '1,100p' src/aiperf/dataset/generator/image.py fi

Repository: ai-dynamo/aiperf

Length of output: 4342

🏁 Script executed:

#!/bin/bash # Search for where _load_source_images_from_disk is called rg "_load_source_images_from_disk" --type py -B 3 -A 3

Repository: ai-dynamo/aiperf

Length of output: 1858

🏁 Script executed:

#!/bin/bash # Find BaseGenerator to understand the class structure fd "base\.py" dataset/generator/ --type f

Repository: ai-dynamo/aiperf

Length of output: 165

🏁 Script executed:

#!/bin/bash # Read the base generator file if [ -f "src/aiperf/dataset/generator/base.py" ]; then wc -l src/aiperf/dataset/generator/base.py cat -n src/aiperf/dataset/generator/base.py fi

Repository: ai-dynamo/aiperf

Length of output: 1261

🏁 Script executed:

#!/bin/bash # Check if there are any async patterns or lifecycle hooks in BaseGenerator or related classes rg "class BaseGenerator|@on_init|@on_start|async def" src/aiperf/dataset/generator/ --type py -B 2 -A 5

Repository: ai-dynamo/aiperf

Length of output: 623

🏁 Script executed:

#!/bin/bash # Also check if ImageGenerator or similar components use AIPerfLifecycleMixin rg "AIPerfLifecycleMixin|BaseComponentService" src/aiperf/dataset/ --type py

Repository: ai-dynamo/aiperf

Length of output: 553

🏁 Script executed:

#!/bin/bash # Search for ImageGenerator usage and instantiation rg "ImageGenerator" --type py -B 3 -A 3

Repository: ai-dynamo/aiperf

Length of output: 18572

🏁 Script executed:

#!/bin/bash # Check DatasetManager to see how it uses generators rg "class DatasetManager|ImageGenerator" src/aiperf/dataset/dataset_manager.py --type py -A 10 -B 2

Repository: ai-dynamo/aiperf

Length of output: 628

🏁 Script executed:

#!/bin/bash # Find where DatasetManager or ImageGenerator is instantiated in async context rg "ImageGenerator\(|DatasetManager" --type py | head -20

Repository: ai-dynamo/aiperf

Length of output: 2193

🏁 Script executed:

#!/bin/bash # Check how BaseComposer is used and if it's instantiated in async context rg "BaseComposer|class.*Composer" src/aiperf/dataset/composer/ --type py -B 2 -A 10 | head -80

Repository: ai-dynamo/aiperf

Length of output: 4205

🏁 Script executed:

#!/bin/bash # Check if DatasetManager instantiation happens in async context rg "class DatasetManager" src/aiperf/dataset/dataset_manager.py -A 30 | head -50

Repository: ai-dynamo/aiperf

Length of output: 1313

🏁 Script executed:

#!/bin/bash # Check if DatasetManager.__init__ is async or if instantiation happens in async lifecycle rg "@on_init|async def __init__|def __init__" src/aiperf/dataset/dataset_manager.py -A 5

Repository: ai-dynamo/aiperf

Length of output: 217

Move image loading out of __init__ to avoid blocking I/O during initialization.

The _load_source_images_from_disk method performs synchronous I/O operations (glob and image decoding) in the constructor. This blocks initialization and violates the requirement to use async/await for all I/O operations. Move image loading to a deferred initialization pattern or lifecycle hook that can properly handle asynchronous operations.

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 63-63: Avoid specifying long messages outside the exception class

(TRY003)

[warning] 65-65: Avoid specifying long messages outside the exception class

(TRY003)

[warning] 69-72: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiperf/dataset/generator/image.py` around lines 60 - 79, The synchronous disk I/O in _load_source_images_from_disk (glob + Image.open + img.copy) must be moved out of __init__ into an async-friendly lifecycle: create an async method (e.g., async_load_source_images or load_source_images_async) that performs the work on a threadpool (use asyncio.to_thread to call the existing _load_source_images_from_disk logic or move the per-file open/copy into the thread) and update any startup code to await that method from the appropriate async initialization hook; ensure the debug(...) call remains after loading and that exceptions (FileNotFoundError/NotADirectoryError/ValueError) are preserved and raised from the async loader so callers can handle them.

coderabbitai · 2026-02-25T17:07:52Z

src/aiperf/dataset/generator/image.py

+        image_paths = sorted(glob.glob(str(source_path / "*")))
        if not image_paths:
            raise ValueError(
-                f"No source images found in '{source_images_dir}'. "
-                "Please ensure the source_images directory contains at least one image file."
+                f"No source images found in '{source_path}'. "
+                "Please ensure the directory contains at least one image file."
            )

-        self._source_images = []
+        images: list[Image.Image] = []
        for path in image_paths:
            with Image.open(path) as img:
-                self._source_images.append(img.copy())
-        self.debug(
-            lambda: f"Pre-loaded {len(self._source_images)} source images into memory"
-        )
+                images.append(img.copy())


⚠️ Potential issue | 🟠 Major

Filter/skip non-image entries for custom source directories.

Line 67 includes every directory entry (*), and Line 76 tries to open each as an image. A custom folder containing subdirectories or non-image files will currently hard-fail generation.

Proposed fix

-from PIL import Image +from PIL import Image, UnidentifiedImageError ... - image_paths = sorted(glob.glob(str(source_path / "*"))) + image_paths = sorted(glob.glob(str(source_path / "*"))) if not image_paths: raise ValueError( f"No source images found in '{source_path}'. " "Please ensure the directory contains at least one image file." ) images: list[Image.Image] = [] for path in image_paths: - with Image.open(path) as img: - images.append(img.copy()) + path_obj = Path(path) + if not path_obj.is_file(): + continue + try: + with Image.open(path_obj) as img: + images.append(img.copy()) + except UnidentifiedImageError: + self.debug(lambda: f"Skipping non-image file '{path_obj}'") + + if not images: + raise ValueError( + f"No readable image files found in '{source_path}'. " + "Please ensure the directory contains valid image files." + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

image_paths = sorted(glob.glob(str(source_path / "*")))

if not image_paths:

raise ValueError(

f"No source images found in '{source_images_dir}'. "

"Please ensure the source_images directory contains at least one image file."

f"No source images found in '{source_path}'. "

"Please ensure the directory contains at least one image file."

)

self._source_images = []

images: list[Image.Image] = []

for path in image_paths:

with Image.open(path) as img:

self._source_images.append(img.copy())

self.debug(

lambda: f"Pre-loaded {len(self._source_images)} source images into memory"

)

images.append(img.copy())

image_paths = sorted(glob.glob(str(source_path / "*")))

if not image_paths:

raise ValueError(

f"No source images found in '{source_path}'. "

"Please ensure the directory contains at least one image file."

)

images: list[Image.Image] = []

for path in image_paths:

path_obj = Path(path)

if not path_obj.is_file():

continue

try:

with Image.open(path_obj) as img:

images.append(img.copy())

except UnidentifiedImageError:

self.debug(lambda: f"Skipping non-image file '{path_obj}'")

if not images:

raise ValueError(

f"No readable image files found in '{source_path}'. "

"Please ensure the directory contains valid image files."

)

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 69-72: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiperf/dataset/generator/image.py` around lines 67 - 77, The code currently collects all directory entries into image_paths and attempts to open each with Image.open causing failures on subdirectories or non-image files; update the logic in the generator so image_paths only includes regular files with image-like extensions or add a safe open loop that checks Path(path).is_file() and wraps Image.open(path) in a try/except (catching OSError/UnidentifiedImageError) to skip non-image files, appending only successfully opened images into the images list; reference image_paths, source_path, Image.open, and images to locate where to apply the filtering and error-handling change.

ajcasagrande · 2026-02-25T17:08:43Z

src/aiperf/common/config/audio_config.py

+    @model_validator(mode="after")
+    def _validate_audio_options(self) -> Self:
+        """Validate the audio options."""
+        audio_options_set = {*self.model_fields_set, *self.length.model_fields_set}
+        if not self.audio_enabled() and audio_options_set:
+            raise ValueError(
+                "Audio generation is disabled but audio options were provided. Please set `--audio-batch-size` and `--audio-length-mean` to enable audio generation."
+            )
+        return self
+
+    def audio_enabled(self) -> bool:
+        """Check if audio is enabled."""
+        return self.length.mean > 0 and self.batch_size > 0


Added basic sanity checks to prevent users from using multi modal options without enabling that modality

matthewkotila · 2026-02-28T00:54:15Z

src/aiperf/common/config/image_config.py

+    ]
+
+    @model_validator(mode="after")
+    def _validate_image_options(self) -> Self:


do we have warnings for if someone supplies a --input-file that, say, has images, and they also try to use some synthetic image options?

matthewkotila

Looks good overall, just some small questions

ajcasagrande requested a review from matthewkotila February 25, 2026 16:58

github-actions bot added the feat label Feb 25, 2026

ajcasagrande commented Feb 25, 2026

View reviewed changes

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

ajcasagrande commented Feb 25, 2026

View reviewed changes

matthewkotila reviewed Feb 28, 2026

View reviewed changes

Conversation

ajcasagrande commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions bot commented Feb 25, 2026

Try out this PR

Uh oh!

codecov bot commented Feb 25, 2026

Codecov Report

Uh oh!

ajcasagrande Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

matthewkotila Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Feb 25, 2026

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

ajcasagrande Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

matthewkotila Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

matthewkotila left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ajcasagrande commented Feb 25, 2026 •

edited by coderabbitai bot

Loading