Skip to content

Error Handling policy violation#726

Merged
sidd-27 merged 13 commits intokornia:mainfrom
namanguptagit:fix/error-handling-policy-violation
Mar 13, 2026
Merged

Error Handling policy violation#726
sidd-27 merged 13 commits intokornia:mainfrom
namanguptagit:fix/error-handling-policy-violation

Conversation

@namanguptagit
Copy link
Contributor

@namanguptagit namanguptagit commented Feb 21, 2026

📝 Description

This PR resolves several critical performance and library safety issues by making interpolate_pixel operations fully infallible, moving validation checks out of the hot paths to eliminate massive per-pixel Result unpacking overhead in routines like resize and warp. It simplifies the parallel processing iteration signatures by removing turbofish constraints (E), and structurally ensures zero panics within the runtime library by replacing unreachable!() macros and removing infallible Default traits holding .expect() wrappers. Additionally, it prevents dynamic string allocations during interpolation failures by passing strongly-typed enums through the ImageError variants directly, and properly propagates start_tracking_depth error states throughout the smolvlm modules.
⚠️ Issue Link Required: This PR must be linked to an approved and assigned issue. See Contributing Guide for details.
#725
Fixes/Relates to: # 725

Important:

  • Ensure you are assigned to the linked issue before submitting this PR
  • This PR should strictly implement what the linked issue describes
  • Do not include changes beyond the scope of the linked issue

🛠️ Changes Made

  • Removed Result overhead from interpolate_pixel hot paths: Made image interpolation completely infallible (crates/kornia-imgproc/src/interpolation/interpolate.rs), eliminating the per-pixel Result generation and unpacking bottleneck to resolve performance regressions. Moved validation of InterpolationMode to the top-level callers (crates/kornia-imgproc/src/resize.rs, crates/kornia-imgproc/src/interpolation/remap.rs, crates/kornia-imgproc/src/warp/affine.rs).
  • Removed overly generic turbofish from Iterators: Removed E generic payload constraints from par_iter_rows_resample and par_iter_rows_spatial_mapping in crates/kornia-imgproc/src/parallel.rs to simplify development and eliminate repetitive turbofish casts Ok::<(), ImageError>(()).
  • String Allocation Avoidance: Evolved UnsupportedInterpolation variant in crates/kornia-image/src/error.rs to accept an InterpolationMode typed struct directly, eradicating costly per-error dynamic string heap allocations.
  • Fixed Silent Error Discards: Replaced let _ = ... pattern with the proper ? operator over start_tracking_depth and stop_tracking_depth across crates/kornia-vlm/src/smolvlm/text_model.rs and crates/kornia-vlm/src/smolvlm/vision_model.rs.
  • DRY Bound Checks: Transformed slice bounds verification inside crates/kornia-imgproc/src/features/responses.rs to construct PixelIndexOutOfBounds errors modularly using internal inline helpers.
  • [BREAKING] Removed Default Traits: Removed Default traits completely for JpegTurboEncoder and JpegTurboDecoder in crates/kornia-io/src/jpegturbo.rs as downstream implementations could fail, enforcing ::new()?to comply with the project library expect()/unwrap() constraints.

🧪 How Was This Tested?

  • Unit Tests: Standard workspace units pass cleanly. Confirmed all image boundaries accurately return correct exceptions on edge boundaries in algorithms like HarrisResponse tests.
  • Manual Verification: Ran pixi run rust-fmt and pixi run rust-clippy. Gated unused module elements across Linux-centric example deployments (examples/orb_detector, examples/ros-z-nodes, examples/foxglove, examples/smol_vlm2_video) via explicit #[cfg(target_os = "linux")] annotations to guarantee cross-OS macOS build checks run spotlessly under -D warnings.
  • Performance/Edge Cases: Verified hot processing operations (resize, warp, remap) immediately short-circuit out on unsupported algorithms rather than delaying exception discovery sequentially over individual array cells.

🕵️ AI Usage Disclosure

Check one of the following:

  • 🟢 No AI used.
  • 🟡 AI-assisted: I used AI for boilerplate/refactoring but have manually reviewed and tested every line.
  • 🔴 AI-generated: (Note: These PRs may be subject to stricter scrutiny or immediate closure if the logic is not explained).

🚦 Checklist

  • I am assigned to the linked issue (required before PR submission)
  • The linked issue has been approved by a maintainer
  • This PR strictly implements what the linked issue describes (no scope creep)
  • I have performed a self-review of my code (no "ghost" variables or hallucinations).
  • My code follows the existing style guidelines of this project.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • (Optional) I have attached screenshots/recordings for UI changes.

💭 Additional Context

All revisions structurally reinforce the library's runtime stability by ensuring unsupported pixel algorithms naturally abort execution prior to tensor iteration, guaranteeing predictable execution speeds under heavy loads.

@namanguptagit
Copy link
Contributor Author

Hey @edgarriba can you take a look at this pr whenever you have some spare time .

Copy link
Collaborator

@sidd-27 sidd-27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: PR #726 — Error Handling Policy Violation

This PR systematically replaces .unwrap(), .expect(), panic!(), and unimplemented!() across 29 files with proper error propagation. The intent aligns perfectly with the project's error handling policy. However, the execution has several issues that need addressing.


🔴 Issues

1. PR description is completely empty — no checklist items filled
The template is untouched: no linked issue, no changes listed, no testing described, no AI disclosure. Per contributing guidelines, the PR description must include what changed, why, and how it was tested. This is a 30-file PR — the description matters.

2. let _ = ctx.text_introspector.start_tracking_depth() silently discards errors (text_model.rs, vision_model.rs)

let _ = ctx.text_introspector.start_tracking_depth();
// ...
let _ = ctx.text_introspector.stop_tracking_depth();

You converted the panics in start_tracking_depth()/stop_tracking_depth() to return Result, but then silently discard the error with let _ =. This is worse than the original panic — at least a panic tells you something went wrong. Either propagate the error with ? or keep the panic if the invariant is truly unrecoverable.

3. Removed Default impls for JpegTurboDecoder/JpegTurboEncoder without migration path (jpegturbo.rs)
The Default implementations were removed entirely. Any downstream code calling JpegTurboDecoder::default() or JpegTurboEncoder::default() will now fail to compile. This is a breaking change that should be documented. Consider keeping Default but having it delegate to new().expect("...") if the intent is to keep a convenience constructor, or explicitly note the breaking change.

4. interpolate_pixel now returns Result for a fundamentally hot-path function (interpolate.rs)

InterpolationMode::Lanczos => Err(kornia_image::ImageError::UnsupportedInterpolation(...)),
InterpolationMode::Bicubic => Err(kornia_image::ImageError::UnsupportedInterpolation(...)),

This is called per-pixel in resize/warp/remap. Wrapping every pixel interpolation in Result adds overhead in tight loops. The unsupported modes should be validated once at the start of the operation (e.g., in resize(), remap(), warp_affine()), not checked per-pixel. Then interpolate_pixel can stay infallible for the supported modes.

5. par_iter_rows_resample signature change is unnecessarily generic (parallel.rs)

pub fn par_iter_rows_resample<const C: usize, A: ImageAllocator, E: Send>(
    ...
    f: impl Fn(&f32, &f32, &mut [f32]) -> Result<(), E> + Send + Sync,
) -> Result<(), E>

Making the error type generic (E: Send) means callers need turbofish annotations (Ok::<(), ImageError>(())) everywhere. Since this is an internal utility and all callers use ImageError, just use ImageError directly.

6. New UnsupportedInterpolation error variant takes a String (error.rs)
This allocates on every error construction. Use the enum directly:

UnsupportedInterpolation(InterpolationMode),

🟡 Suggestions

7. PixelIndexOutOfBounds used for slice bounds errors in HarrisResponse (responses.rs)
The error variant is semantically about pixel indices, but it's being used for internal buffer slice bounds. Consider a more appropriate variant or at least document this usage. The repeated ok_or_else blocks with identical error construction could be extracted into a helper.

8. fps_counter.rs — Good defensive fix
The .unwrap() on front()/back() was replaced with a clean if let pattern. Also added a duration > 0 guard. Well done.

9. partial_cmp().unwrap() to total_cmp() (registration/ops.rs, draw.rs)
Clean fix. total_cmp is the right approach for f32 sorting — handles NaN correctly without panicking.

10. Drop impls now use let _ = self.close() (gstreamer)
Appropriate — panicking in Drop is bad. Silently ignoring close errors in destructors is the standard Rust pattern.

11. distance_transform_vanilla return type change
Good — but this is a test-only function ("NOTE: only for testing"). The .unwrap() was fine here per guidelines. Low priority but not harmful.


✅ What's Good

  • Consistent application of ? propagation across the codebase
  • New error variants are descriptive (MutexPoisoned, JoinThreadError, ImageProcessError)
  • total_cmp replacements are correct and idiomatic
  • VLM preprocessor .expect() to .ok_or_else() with proper error types
  • GStreamer Drop impls fixed correctly

Summary

The direction is right — removing panics from library code is important. But the execution needs work:

  1. Fill out the PR description (linked issue, changes, testing)
  2. Don't silently discard errors with let _ = — propagate or justify
  3. Validate interpolation mode once, not per-pixel
  4. Document the breaking Default removal
  5. Simplify the generic error type in par_iter_rows_resample

Verdict: Needs changes before merge.

@sidd-27
Copy link
Collaborator

sidd-27 commented Mar 6, 2026

@namanguptagit address the review comments, as well as fix the merge errors if youtr intrested in working on this

@namanguptagit
Copy link
Contributor Author

@namanguptagit address the review comments, as well as fix the merge errors if youtr intrested in working on this

Yes will address your comments and fix the merge changes as soon as possible.

@namanguptagit
Copy link
Contributor Author

Hey @sidd-27 ,please take a look whenever you have some bandwidth i have addressed the issues and your comments .Added Documentation as well.

@sidd-27
Copy link
Collaborator

sidd-27 commented Mar 7, 2026

Review - Needs Major Revisions ❌

Good intent, but significant implementation issues need addressing.

⚠️ Critical Issues (Must Fix):

1. Empty PR Description

  • Template completely unfilled - no linked issue, changes, or testing described
  • For a 35-file breaking change, documentation is essential

2. Silent Error Discarding

let _ = ctx.text_introspector.start_tracking_depth(); // ❌ Worse than panic!

Converting panics to Result then ignoring with let _ = defeats the purpose. Either propagate with ? or keep the original panic.

3. Breaking Changes Undocumented

  • Removed Default impls for JpegTurboDecoder/JpegTurboEncoder
  • Any downstream ::default() calls will break
  • Need migration path or explicit breaking change notice

4. Performance Regression

// ❌ Per-pixel Result wrapping in hot path
InterpolationMode::Lanczos => Err(UnsupportedInterpolation(...))

interpolate_pixel is called per-pixel. Validate interpolation mode once at function entry, not per-pixel.

🔧 Technical Issues:

5. Unnecessary Complexity

// ❌ Forces turbofish everywhere
f: impl Fn(...) -> Result<(), E> + Send + Sync

Generic error type E in par_iter_rows_resample is overkill - just use ImageError.

6. String Allocation in Errors

UnsupportedInterpolation(String) // ❌ Allocates per error

Use UnsupportedInterpolation(InterpolationMode) instead.

What's Good:

  • Systematic .unwrap()? propagation
  • Proper total_cmp() for f32 sorting (handles NaN)
  • Good defensive fixes in fps_counter.rs
  • Clean GStreamer Drop implementations

📋 Action Items:

  1. Fill PR description with issue link, changes summary, testing approach
  2. Fix silent error discarding - propagate or justify keeping panic
  3. Document breaking changes or provide migration path
  4. Move interpolation validation to function entry, not per-pixel
  5. Simplify generic error types where unnecessary
  6. Use enum variants instead of String allocation in errors

Verdict: Major revisions needed

The direction (removing panics from library code) is absolutely right, but the execution needs significant refinement. @sidd-27's detailed review covers the technical issues well.

@namanguptagit: Please address the above issues, especially the performance and breaking change concerns.

@namanguptagit namanguptagit requested a review from sidd-27 March 8, 2026 00:01
@namanguptagit
Copy link
Contributor Author

@sidd-27 i have addressed your comments

@namanguptagit
Copy link
Contributor Author

@sidd-27 can you review please

Copy link
Collaborator

@sidd-27 sidd-27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review (Post March 7 Commits)

The author addressed most of the feedback from my previous review. Here's the updated status:

✅ Resolved

  1. PR description — Now comprehensive with changes, testing, AI disclosure, and checklist filled out.
  2. let _ = silently discarding errors — Fixed. start_tracking_depth() / stop_tracking_depth() now return Result and are properly propagated with ? in both text_model.rs and vision_model.rs.
  3. interpolate_pixel Result overhead in hot path — Fixed well. Created an infallible interpolate_pixel_fast for internal use, with validate_interpolation() called once at the top of resize(), remap(), warp_affine(), and warp_perspective().
  4. Generic E in par_iter_rows_resample — Fixed. Removed the generic error type, simplified signatures.
  5. UnsupportedInterpolation takes String — Fixed. Now takes InterpolationMode directly.

🔴 Remaining Issues

1. interpolate_pixel_fast silently returns 0.0 for unsupported modes

InterpolationMode::Lanczos | InterpolationMode::Bicubic => 0.0,

Since this function is only called after validate_interpolation() rejects these modes, this branch should be unreachable. Silently returning 0.0 masks bugs if someone later calls interpolate_pixel_fast without validation. Use unreachable!() or debug_assert! here — the validation guarantee makes a panic acceptable for this invariant violation:

InterpolationMode::Lanczos | InterpolationMode::Bicubic => {
    debug_assert!(false, "unsupported mode should have been caught by validate_interpolation");
    0.0
}

2. Scope creep — #[cfg(target_os = "linux")] changes are unrelated

The cfg gating across foxglove, orb_detector, ros-z-nodes, smol_vlm_video, and smol_vlm2_video examples is a cross-platform build fix, not an error handling policy change. Per project guidelines: "one concern per PR." These should be split into a separate PR.

3. Breaking Default removal — no migration path

The [BREAKING] tag in the description is good, but downstream users calling JpegTurboDecoder::default() or JpegTurboEncoder::default() will hit compile errors with no guidance. Consider adding a note in the doc comments pointing to ::new() as the replacement.

🟢 CI Status

  • Rust lint, clippy, fmt, check — all passing ✅
  • All rust tests (debug + release, x86 + aarch64) — passing ✅
  • Python tests (3.8–3.14) — passing ✅
  • C++ tests — passing ✅
  • "Validate PR Requirements" failure — CI permissions issue (403), not a code problem. The linked issue format # 725 isn't matched by the regex expecting Fixes #725.
  • "py3.14t-linux" failure — Free-threaded Python 3.14 build, likely pre-existing.

Summary

Core error handling changes are solid and well-executed. Address the silent 0.0 fallback and split the cfg(target_os) changes into a separate PR, then this should be good to go pending maintainer re-approval.

@namanguptagit
Copy link
Contributor Author

@sidd-27 addressed these commets as well now .

@sidd-27
Copy link
Collaborator

sidd-27 commented Mar 12, 2026

✅ LGTM - Final Review

Excellent work addressing the feedback! The core error handling improvements are now solid and well-executed.

🎯 Issues Resolved

interpolate_pixel_fast silent fallback fixed perfectly:

debug_assert!(false, "unsupported mode should have been caught by validate_interpolation");

This is exactly what I recommended - catches invariant violations in debug while staying performant in release.

✅ Breaking changes documented clearly:

/// **Note on migration:** The Default trait implementation was removed. Please use JpegTurboDecoder::new() instead.

Downstream users now have a clear migration path.

✅ All major technical concerns from previous reviews addressed:

  • Error propagation with ? instead of silent discarding ✅
  • Hot path performance preserved with validate_interpolation() at entry points ✅
  • Error variants take typed enums instead of String allocations ✅
  • Generic error types simplified ✅

📝 Minor Note

Removing the cfg(target_os = "linux") guards will break Windows/macOS builds for the affected examples, but splitting that to a separate cross-platform PR makes perfect sense to keep this PR focused.

🏆 Summary

The error handling policy violations are comprehensively fixed. The implementation is technically sound, performance-conscious, and maintains good API ergonomics. The author has been responsive to feedback throughout the review process.

Recommendation: Ready for merge on the core functionality. 🚀

@namanguptagit
Copy link
Contributor Author

If everything looks good can we get it merged ?

@sidd-27 sidd-27 requested a review from edgarriba March 13, 2026 05:17
@edgarriba
Copy link
Member

LGTM — @sidd-27 can you merge? Thanks!

@sidd-27 sidd-27 merged commit 8feb094 into kornia:main Mar 13, 2026
19 of 23 checks passed
@namanguptagit
Copy link
Contributor Author

Hi @sidd-27 , as per comments regarding the cross platform build issue i have created a new issue and raised a pr for the same : #814 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants