Skip to content

Conversation

@shajder
Copy link
Contributor

@shajder shajder commented Sep 15, 2025

Related to #1364

Comment on lines +247 to +255
for (size_t j = 0; j < max_wg_size; j++)
ref_vals[j] = get_random_float(min_range, max_range, d);

for (size_t i = 0; i < (size_t)n_elems; i += max_wg_size)
{
size_t work_group_size = std::min(max_wg_size, n_elems - i);
memcpy(&inptr[i], ref_vals.data(),
sizeof(Type) * work_group_size);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this means we'll be using the same input data for all work-groups. While that's not inherently bad, it does mean we're doing redundant testing by running multiple work-groups.

Some options to consider:

  1. Do nothing, these tests don't run for very long so a little redundant testing is fine.
  2. Test with different random data for each work-group instead, but verify against the worst-case max_err.
  3. Only test a single work-group.

Copy link
Contributor Author

@shajder shajder Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Test with different random data for each work-group instead, but verify against the worst-case max_err.

Please keep in mind that the max_err may be different for other random sets. That's why I propagated the same data across other workgroups to avoid such computations during the verification phase. Since it is not a performance load and the code is consistent with integral types perhaps it may stay as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards "do nothing", but I don't think this is consistent with the integral types. As far as I can tell:

  • This code, for the floating-point types, generates a set of random data for the "max work-group size", then stamps out the same data for each work-group, until each of the "n_elems" has a value.
  • The code below, for the integral types, generates a random value for each of the "n_elems", which means that each work-group will have a different set of random data.

Is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, your description is correct. In other words kernels and CL calls stays the same for integral and floating-point types but generation and verification part is not consistent indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants