Skip to content

Conversation

@vaijira
Copy link
Contributor

@vaijira vaijira commented Nov 16, 2025

Validate your PR with burn.

It is important that you make sure that you don't introduce any bugs in burn.

Instructions

  • Create a new branch or fork of the burn repo
  • Update the main Cargo.toml with this PR hash.
  • Fix any broken tests or compilation errors in burn.
  •  Submit a PR in burn with your fixes and link it here.

@vaijira vaijira force-pushed the rhypot branch 2 times, most recently from fb904f3 to 2d49684 Compare November 29, 2025 11:50
Removing conditional branches when possible.
@vaijira
Copy link
Contributor Author

vaijira commented Nov 29, 2025

Ready for review, to avoid underflow/overflow the following algorithms are:

  • Hypot(lhs, rhs)
    let a = abs(lhs);
    let b = abs(rhs);
    let max_val = max(a, b);
    var max_val_safe = max_val;
    if (max_val == 0.0) {{ max_val_safe = 1.0; }}
    let min_val = min(a, b);
    let t = min_val / max_val_safe;
    return max_val * sqrt(fma(t, t, 1.0));

  • Rhypot(lhs, rhs)
    let a = abs(lhs);
    let b = abs(rhs);
    let max_val = max(a, b);
    var max_val_safe = max_val;
    if (max_val == 0.0) {{ max_val_safe = 1.0; }}
    let min_val = min(a, b);
    let t = min_val / max_val_safe;
    return inverseSqrt(fma(t, t, 1.0)) / max_val;

Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have a cubecl function for that instruction instead of re-implementing it for every compilers. Note that I'm not against having specialized implementation for a compiler, but we can use a cubecl function as default. Here's an example:

/// Returns the value at `index` in tensor within bounds.
#[cube]
pub fn read_tensor_checked<C: CubePrimitive>(
    tensor: Tensor<C>,
    index: u32,
    #[comptime] unroll_factor: u32,
) -> C {
    let len = tensor.buffer_len() * unroll_factor;
    let in_bounds = index < len;
    let index = Min::min(index, len);

    select(in_bounds, tensor.read_unchecked(index), C::cast_from(0u32))
}

Called as follow by compilers:

read_tensor_checked::expand::<Line<NumericExpand<0>>>(
        &mut scope,
        list.into(),
        index.into(),
        op.unroll_factor,
    )
    .expand

@vaijira
Copy link
Contributor Author

vaijira commented Dec 3, 2025

The problem is that for CUDA that has built-in instructions you really want to call hypotf and rhypotf.

@nathanielsimard
Copy link
Member

The problem is that for CUDA that has built-in instructions you really want to call hypotf and rhypotf.

You can use that instruction for cuda and fallback on the default implementation for other compilers. other compiler can expand the cubecl function instead of recreating it with their own IR.

@vaijira
Copy link
Contributor Author

vaijira commented Dec 4, 2025

I think it could be reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants