-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add an fma intrinsic #8900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add an fma intrinsic #8900
Changes from 5 commits
cdb9e67
c622de9
9ea8c63
22f6765
2f8558b
5ae7b14
55e8956
fcfa871
9b23ae6
cbc59b8
82f24c7
649ac3e
7656742
6acb53c
cd32a7e
7be6b66
db7969b
1dfc9f1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -201,6 +201,7 @@ class CodeGen_Vulkan_Dev : public CodeGen_GPU_Dev { | |
| {"fast_pow_f32", GLSLstd450Pow}, | ||
| {"floor_f16", GLSLstd450Floor}, | ||
| {"floor_f32", GLSLstd450Floor}, | ||
| {"fma", GLSLstd450Fma}, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seeing this, sitting between the others, makes me wonder: should we have fma_f32, fma_f64, fma_f16 functions instead of simply "fma"? Halide uses these suffixes pretty much everywhere. Edit: hmm, it seems that this is not the case for the strict_float intrinsics. We didn't do strict_add_f32. And I think that's fine. |
||
| {"log_f16", GLSLstd450Log}, | ||
| {"log_f32", GLSLstd450Log}, | ||
| {"sin_f16", GLSLstd450Sin}, | ||
|
|
@@ -1190,9 +1191,14 @@ void CodeGen_Vulkan_Dev::SPIRV_Emitter::visit(const Call *op) { | |
| e.accept(this); | ||
| } | ||
| } else if (op->is_strict_float_intrinsic()) { | ||
| // TODO: Enable/Disable RelaxedPrecision flags? | ||
| Expr e = unstrictify_float(op); | ||
| e.accept(this); | ||
| if (op->is_intrinsic(Call::strict_fma)) { | ||
| Expr builtin_call = Call::make(op->type, "fma", op->args, Call::PureExtern); | ||
| builtin_call.accept(this); | ||
| } else { | ||
| // TODO: Enable/Disable RelaxedPrecision flags? | ||
| Expr e = unstrictify_float(op); | ||
| e.accept(this); | ||
| } | ||
| } else if (op->is_intrinsic(Call::IntrinsicOp::sorted_avg)) { | ||
| internal_assert(op->args.size() == 2); | ||
| // b > a, so the following works without widening: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious: what's the point of casting? It looks like this would make it accept
long double, but actually not respect the required precision (which is hard on SSE fp either way).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was for float16 support. It's not quite right doing it in a wider type though - the rounding on the wider fma might result in a tie when casting back to the narrow type, and that tie may break in a different direction than directly rounding the fma result to the narrow type. Not sure how to handle this. A static assert that T is a double or a float? What should the C backend do if you use a float16 fma call?