In particular, trying to test conversion of sum reduction in #793, we get a sequence of scalar operations and shuffles (due to wave pass lowering those) rather than the sum itself; we would like to test that we can generate the sum operation when requested. Down the line, we can lower it to gpu.subgroup_reduction that already has lowerings, assuming no further wave transformation need the scalar+shuffle form.