Conversation
Ispc produces performance warnings about divisions. These are valid, but harmless, since they only affect the non-bulk 'trailing' part of the loop. Removing --werror prevents them from breaking the build.
|
Cool! Thanks for pushing this. I am happy to merge these changes. Just a couple of questions. No pressure, but it would be great if you wanted to update the README. In that case, should we include the loopy generated ispc code in the repo? Also, I tried to run loopy (via a Not sure what I did wrong. Is it something easy to fix? |
You know, it's probably a good time to convert this to git subtree, to avoid this unnecessary paper cut.
OK, will do. Including the loopy code is complicated by it being type-specific. Since the SIMD widths change between double and single, it's not that fixable without major surgery to the generated code (doable, but kind of defeats the purpose). What I'll likely do is wire the loopy generation into the Makefile. LMK if that's an unappealing thought. |
inducer
left a comment
There was a problem hiding this comment.
Here's a first stab. Still missing a few things. If you know how to fix the make thing, I'd be grateful!
2444b27 to
b0cbb59
Compare
b0cbb59 to
ac8c306
Compare
Oh yea, I forgot. It has been a few years.
This is even better! |
This looks great. I see a similar performance bump on my laptop running the code. |
|
There's not currently an islpy binary wheel for Python 3.13. This should fix that, after I roll another islpy release: inducer/islpy#154. |
ac8c306 to
9574e1e
Compare
|
Alright, put the dtype switch in place for loopy. Doesn't seem to make a difference bandwidth-wise, but it's a useful sanity check at any rate. |
|
Also, should have said: Ready to go from my end. |
|
Thanks for pushing this! |
Had the opportunity to dust this off for https://relate.cs.illinois.edu/course/cs598apk-s25/. 🙂
Probably best read commit-by-commit.
The old streaming store logic in loopy was pretty busted, and some implicit/invalid assumptions that went into it were broken in the interim. Should be all better now: inducer/loopy#915. The fix specifically for loopy here is minor.
I've also switched the Makefile over to default to gcc, since icc is no longer a thing, and fixed some warnings in the default/non-streaming-store ispc code.
To do
New results
(My laptop, Raptor Lake.)
GCC:
ISPC as in this repo:
Loopy-generated ISPC, with streaming stores.
I can also update the README to tell more of the story if you like. (cf. #1)