Visual Regression Testing planning and RFC #921

rootnotez · 2025-10-29T01:24:58Z

rootnotez
Oct 29, 2025

Problem Statement

projectM produces complex audio-reactive visual output through a sophisticated rendering pipeline that processes preset files (shaders and equations), audio input, and texture assets. Changes to any part of this system—shader variable processing (warp, zoom, etc.), audio analysis (FFT, beat detection), rendering algorithms, or preset parsing—can introduce subtle visual regressions that are difficult to detect through manual inspection alone.

Currently, there is no automated way to validate that visual output remains consistent across code changes. With preset collections ranging from 10K (Cream of the Crop) to 130K+ (MegaPack), manual regression testing is impractical—a single code change could break rendering for dozens of presets in ways that only become apparent withspecific preset-audio combinations.

Overall Proposal

Create an automated visual regression testing framework that:

Controls inputs: Preset files, audio data (from files), texture assets, and rendering parameters
Captures outputs: Rendered frames or video sequences from the projectM rendering pipeline
Compares results: Validates output against known reference data using visual comparison techniques
Reports differences: Identifies regressions and provides diagnostics for investigation

The framework would support both targeted regression tests (testing specific shader variables or features) and broad compatibility testing (scanning large preset collections for crashes or obvious visual errors).

As projectM is just a library, probably need a frontend to run tests.

Design Principles

Tests may use unambiguous, visually obvious presets designed to exercise specific features. For example:

A "warp test" preset that makes warp distortion the dominant visual element
An "FFT response" preset where frequency data directly drives obvious visual changes
A "beat detection" preset with clear visual pulses synchronized to detected beats

This makes regressions immediately apparent through visual comparison rather than requiring fine-grained pixel analysis.

Key Capabilities Needed

1. Audio Input from Files

Already Exists

2. Frame/Video Capture

Can projectM write direct to file?

3. Batch Preset Processing

What's Needed: Test harness to iterate through preset collections, load each preset, render frames with test audio, and capture output.

4. Visual Comparison Methodology

What's Needed: Mechanism to compare captured output against reference images/videos. Options include:

Pixel-perfect comparison (exact match)
Perceptual hashing (tolerates minor differences)
Image similarity metrics (SSIM, MSE)
Hybrid approach (exact for critical tests, tolerance for others)

5. Deterministic Execution

Current State: projectM uses time-based random number generation in multiple locations (TimeKeeper, PresetState, shader random values, transition selection, texture selection). The SetFrameTime() API exists to control time values, but random seeds are not controllable.

What's Needed: Ability to produce identical output for identical inputs. This requires controlling all sources of non-determinism (see Research section below).

Test Scenarios

The framework should enable these categories of tests:

Targeted Feature Tests

Create specialized test presets that isolate and emphasize specific functionality:

Shader variable regression: Presets that make warp, zoom, rotation, decay, etc. visually dominant
Audio reactivity: Presets where bass/mid/treble levels or FFT bins directly control visual parameters
Beat detection: Presets with obvious beat-synchronized effects
Per-pixel effects: Presets showcasing blur, brighten, darken, invert operations
Waveform rendering: Each waveform type rendered in isolation
Custom shapes/waves: User-defined elements with known expected output

Preset Compatibility Tests

Scan large preset collections to detect:

Crashes or rendering failures
Shader compilation errors
Performance regressions (frame time tracking)
Visual anomalies (black screens, extreme flickering, etc.)

Integration Tests

Preset transitions (soft cuts vs hard cuts)
Texture loading and application
Multi-frame sequences showing temporal consistency
Audio synchronization over extended playback

Areas Requiring Further Research

The following areas need investigation before implementation:

1. Visual Comparison Techniques

Perceptual hashing: Which algorithm(s) are suitable for rendered visuals? (pHash, dHash, average hash?)
Tolerance thresholds: What level of pixel difference is acceptable for "passing" tests?
Storage efficiency: How to manage reference data for thousands of presets?
Failure visualization: How to present differences to developers for debugging?

2. Achieving Deterministic Output

RNG control: Can all random number generators be seeded deterministically? Research identified 8+ sources of randomness throughout the codebase (TimeKeeper, PresetState, MilkdropShader, PresetTransition, TransitionShaderManager, TextureManager, etc.)
API design: Should seed control be exposed through public API, build-time flag, or environment variable?
GPU variance: Can OpenGL rendering produce pixel-identical output across runs on same hardware? Different hardware?
Floating-point precision: Are calculations deterministic at the precision required for visual comparison?

3. Reference Data Management

Storage strategy: Store reference images, hashes, or both?
Versioning: How to update references when intentional visual changes are made?
Baseline generation: Workflow for creating initial reference data
CI/CD integration: Where to store reference data (repository, separate storage, cloud?)

4. Audio Test Data

Format selection: What audio formats for test files? (raw PCM, WAV, compressed?)
Test audio characteristics: Synthetic tones vs real music? Silent, sine waves, pink noise, beat patterns?
Duration: How long should test clips be? (1 second, 10 seconds, full preset duration?) Can size on disk be saved by looping?

5. Test Preset Design

Coverage: What specific features need dedicated test presets?
Authoring workflow: How to create unambiguous test presets?
Documentation: How to document what each test preset validates?
Maintenance: Who maintains test presets as projectM evolves?

Secondary Benefits

Beyond regression testing, this infrastructure would enable:

Thumbnail Generation

Automated rendering of preset thumbnails for:

Preset browser UIs in frontends
Preset repository websites
Preset quality curation

Performance Benchmarking

Framework could track:

Frame render times across preset collections
Shader compilation times
Memory usage patterns
Performance regressions in new code

Preset Quality Scoring

Analyze rendered output to detect:

Presets that produce blank screens
Excessively noisy or flickering presets
Audio-unresponsive presets
Duplicate or near-duplicate presets

Video Output Pipeline

While not a primary goal, having frame capture infrastructure could support:

Generating video demonstrations of presets
Machine Learning dataset generation

Integration Points

Existing test infrastructure: Builds on current test framework in tests/ directory
API usage: Uses public C and C++ APIs (no internal library modifications required initially)
Optional enhancements: Deterministic RNG control may require library API additions

Phasing

While this RFC doesn't prescribe specific implementation phases, feedback should address:

Which capabilities are most valuable to implement first?
What are acceptable trade-offs (e.g., determinism vs implementation complexity)?
What level of test coverage is the target (critical paths vs exhaustive)?

Request for Feedback

This RFC seeks community input on:

Feasibility: Are there technical blockers or challenges not addressed here?
Approach: Is this the right overall strategy, or are there better alternatives?
Priorities: Which test scenarios or capabilities are most valuable?
Resources: Who can contribute to design, implementation, or test preset creation?
Use cases: What specific regressions have occurred that this would catch?
Research areas: Insights on perceptual hashing, determinism, or other technical questions?

Please provide feedback through:

projectM Discord server
Github Discussions

Links and Notes

Research References

projectM API documentation: src/api/include/projectM-4/
Playlist library: src/playlist/
Rendering pipeline: src/libprojectM/Renderer/
Audio system: src/libprojectM/Audio/
Existing tests: tests/

Preset Collections

Cream of the Crop Pack: ~10,000 presets (projectM default)
Classic projectM Presets: ~4,000 presets
MegaPack: 130,000+ presets (4.08GB)
Test presets: presets/tests/ (43 existing test cases)

Related Discussions

(Add links to relevant GitHub issues, Discord threads, or mailing list discussions)

Technical Notes

VTK may be someone to look to for visual regression testing - https://open.cdash.org/tests/2246242649

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

projectM Visualizer

Visual Regression Testing planning and RFC #921

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

projectM Visualizer

Visual Regression Testing planning and RFC #921

Uh oh!

rootnotez Oct 29, 2025

Problem Statement

Overall Proposal

Design Principles

Key Capabilities Needed

1. Audio Input from Files

2. Frame/Video Capture

3. Batch Preset Processing

4. Visual Comparison Methodology

5. Deterministic Execution

Test Scenarios

Targeted Feature Tests

Preset Compatibility Tests

Integration Tests

Areas Requiring Further Research

1. Visual Comparison Techniques

2. Achieving Deterministic Output

3. Reference Data Management

4. Audio Test Data

5. Test Preset Design

Secondary Benefits

Thumbnail Generation

Performance Benchmarking

Preset Quality Scoring

Video Output Pipeline

Integration Points

Phasing

Request for Feedback

Links and Notes

Research References

Preset Collections

Related Discussions

Technical Notes

Replies: 0 comments

rootnotez
Oct 29, 2025