Skip to content

Conversation

@GaussianWonder
Copy link

@GaussianWonder GaussianWonder commented Jan 9, 2026

Notes

After this whole sharade, this is the anticlimactic conclusion:

Name Time
Original impl +clones ~900 µs
Original impl -clones ~400 µs
Scalar ~92 µs
SimdFixed ~122 µs
SimdFit ~166 µs

If scalar implementation is properly inlined, given appropiate compiler options the result seems to be already vetorized,
so the benchmark results are:

I barely did as good as the compiler.

SIMD ops are significantly better, faster, and more efficient when data is aligned to cache line or vector register boundaries,
which is probably the time difference observed in sync impls.

The parallel versions are even closer:

Name Time
ScalarPar ~51.5 µs
SimdFixedPar ~52.5 µs
SimdFitPar ~53.25 µs

Changelog

rust nightly

  • test feature

    Used for internal benchmarks, not tracked by criterion.

  • portable_simd feature

    Used for simd implementations.

xtask pattern

Will be used to generate test/dev assets (and optionally ci/cd).

fix bayer matrices

BAYER2 and BAYER3 pattern were wrong.

vscode settings

  • format on save
  • default formatter rust-lang.rust-analyzer

module structure refator

x_utils.rs -> utils/x.rs

traits

/// Core trait for applying a transform to data
///
/// Generic over the right-hand side type `Rhs`, similar to `Add`, `Sub`, etc.
pub trait Transform<Rhs = ()> {
    /// Apply the transform to the given data
    fn apply(&mut self, rhs: &mut Rhs);
}

see transform.rs and impls in bayer_transform.rs, example of usage can be observed here:

  1. src\tests\bayer_strategy.rs:60@apply_strategy
  2. benches\bayer_transform_utils.rs:46@benchmark_strategy

The property of reusing the same transform can only be seen in benchmarks.

The property to swap input arguments between transform calls is available
and checked against the borrow checker, but no code-example is documented
(see bayer_transform.rs:30@BayerArgs::replace_input)

structs

pub struct Texture<T> {
    width: u32,
    height: u32,
    buffer: Vec<T>,
}
pub struct TextureRef<'a, T> {
    width: u32,
    height: u32,
    buffer: &'a [T],
}
  • Textures can be owned structures or borrowed from other containers.
  • All Texture types implement AsRef<[T]> which is a trait agnostic of owned container type.

This allows for fewer allocations / conversions in some scenarios.
(i.e. video processing, where references to bytes can be tossed around).

and +1 variant: see texture.rs.

This pairs well with Transforms.

iterator things

see utils/iterator.rs

most likely this will be deleted later, it does not really help and it is not used.

experimental test to check against performance loss when using custom iterators for
index par iter vs parallel processing by par chunking.

error

src\error.rs contains a flexible DitherpunkerError and an associated Result type.

Implementing From<possible_error_type> for DitherpunkerError allows seamless ? usage with
mixed error types in the same function body.

crates

  • itertools

    Not specifically required, but handy

  • multiversion

    Suggest simd width for a given data type to autodetect strategy

  • num-traits

    Used to describe num utils constrained on num ops

  • image@GaussianWonder/image

    [patch.crates-io]
    image = { git = "https://github.com/GaussianWonder/image", branch = "pub-enlargable-v0.25.9" }

    patch issue: make Enlargable trait pub.

    Can be used to generically describe ops available on image::ImageBuffer<..., T> for any Texture<T>

Benches Plots

Sync

bayer_transform_comparison

Par

bayer_transform_par_comparison

TODO WIP

  • try as_simd()

    safe wrapper around slice::align_to. this is to check if performance is lost because the simd ops are not performed on aligned items within the buffers.

  • remove json, use serde

  • remove unnecessary bayer strategies

  • replace all Result types with error::Result

  • generate blue noise

  • save all assets as textures instead of static arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant