Skip to content

Dust off for 2025#3

Merged
lcw merged 12 commits intolcw:masterfrom
inducer:2025-loopy-compat
Feb 14, 2025
Merged

Dust off for 2025#3
lcw merged 12 commits intolcw:masterfrom
inducer:2025-loopy-compat

Conversation

@inducer
Copy link
Copy Markdown
Contributor

@inducer inducer commented Feb 12, 2025

Had the opportunity to dust this off for https://relate.cs.illinois.edu/course/cs598apk-s25/. 🙂

Probably best read commit-by-commit.

The old streaming store logic in loopy was pretty busted, and some implicit/invalid assumptions that went into it were broken in the interim. Should be all better now: inducer/loopy#915. The fix specifically for loopy here is minor.

I've also switched the Makefile over to default to gcc, since icc is no longer a thing, and fixed some warnings in the default/non-streaming-store ispc code.

To do

  • Implement double/single switch for loopy
  • Fix Makefile inelegance
  • Better structure for README

New results

(My laptop, Raptor Lake.)

GCC:

-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           45522.2     0.024955     0.014059     0.048783
Scale:          45675.6     0.026626     0.014012     0.049540
Add:            48929.8     0.031140     0.019620     0.057046
Triad:          48190.2     0.033958     0.019921     0.068733
-------------------------------------------------------------

ISPC as in this repo:

-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           49654.3     0.016360     0.012889     0.042988
Scale:          49305.2     0.015706     0.012980     0.036412
Add:            52266.6     0.018986     0.018367     0.022731
Triad:          52407.4     0.018479     0.018318     0.019473
-------------------------------------------------------------

Loopy-generated ISPC, with streaming stores.

-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           66312.4     0.012202     0.009651     0.031067
Scale:          66909.8     0.011952     0.009565     0.029441
Add:            64114.7     0.015848     0.014973     0.020580
Triad:          63969.8     0.015343     0.015007     0.016134
-------------------------------------------------------------

I can also update the README to tell more of the story if you like. (cf. #1)

Ispc produces performance warnings about divisions.
These are valid, but harmless, since they only affect
the non-bulk 'trailing' part of the loop.
Removing --werror prevents them from breaking the build.
@lcw
Copy link
Copy Markdown
Owner

lcw commented Feb 12, 2025

Cool! Thanks for pushing this. I am happy to merge these changes. Just a couple of questions.

No pressure, but it would be great if you wanted to update the README. In that case, should we include the loopy generated ispc code in the repo?

Also, I tried to run loopy (via a git clone and pip install .) on my laptop and ran into this error message

.env ❯ python gen-loopy.py
/home/lucas/research/code/c/stream_ispc/.env/lib/python3.12/site-packages/pytools/persistent_dict.py:52: RecommendedHashNotFoundWarning: Unable to import recommended hash 'siphash24.siphash13', falling back to 'hashlib.sha256'. Run 'python3 -m pip install siphash24' to install the recommended hash.
  warn("Unable to import recommended hash 'siphash24.siphash13', "
/home/lucas/research/code/c/stream_ispc/.env/lib/python3.12/site-packages/loopy/kernel/creation.py:2619: LoopyWarning: 'lang_version' was not passed to make_function(). To avoid this warning, pass lang_version=(2018, 2) in this invocation. (Or say 'from loopy.version import LOOPY_USE_LANGUAGE_VERSION_2018_2' in the global scope of the calling frame.)
  tunit = make_function(*args, **kwargs)
Traceback (most recent call last):
  File "/home/lucas/research/code/c/stream_ispc/gen-loopy.py", line 71, in <module>
    main()
  File "/home/lucas/research/code/c/stream_ispc/gen-loopy.py", line 66, in main
    print(gen_code(knl))
          ^^^^^^^^^^^^^
  File "/home/lucas/research/code/c/stream_ispc/gen-loopy.py", line 25, in gen_code
    return lp.generate_code_v2(knl).all_code()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/research/code/c/stream_ispc/.env/lib/python3.12/site-packages/loopy/codegen/__init__.py", line 560, in generate_code_v2
    result = code_gen_cache[input_t_unit]
             ~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/home/lucas/research/code/c/stream_ispc/.env/lib/python3.12/site-packages/pytools/persistent_dict.py", line 597, in __getitem__
    return self.fetch(key)
           ^^^^^^^^^^^^^^^
  File "/home/lucas/research/code/c/stream_ispc/.env/lib/python3.12/site-packages/pytools/persistent_dict.py", line 750, in fetch
    stored_key, value = self._fetch(keyhash)
                        ^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/research/code/c/stream_ispc/.env/lib/python3.12/site-packages/pytools/persistent_dict.py", line 743, in _fetch_uncached
    key, value = pickle.loads(row[0])
                 ^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'loopy.target.c.compyte'

Not sure what I did wrong. Is it something easy to fix?

@inducer
Copy link
Copy Markdown
Contributor Author

inducer commented Feb 13, 2025

Not sure what I did wrong. Is it something easy to fix?

git clone --recursive or git submodule update --init. 🙂

You know, it's probably a good time to convert this to git subtree, to avoid this unnecessary paper cut.

No pressure, but it would be great if you wanted to update the README. In that case, should we include the loopy generated ispc code in the repo?

OK, will do. Including the loopy code is complicated by it being type-specific. Since the SIMD widths change between double and single, it's not that fixable without major surgery to the generated code (doable, but kind of defeats the purpose). What I'll likely do is wire the loopy generation into the Makefile. LMK if that's an unappealing thought.

@inducer inducer marked this pull request as draft February 14, 2025 00:06
Copy link
Copy Markdown
Contributor Author

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a first stab. Still missing a few things. If you know how to fix the make thing, I'd be grateful!

@lcw lcw force-pushed the 2025-loopy-compat branch from b0cbb59 to ac8c306 Compare February 14, 2025 21:40
@lcw
Copy link
Copy Markdown
Owner

lcw commented Feb 14, 2025

Not sure what I did wrong. Is it something easy to fix?

git clone --recursive or git submodule update --init. 🙂

You know, it's probably a good time to convert this to git subtree, to avoid this unnecessary paper cut.

Oh yea, I forgot. It has been a few years.

OK, will do. Including the loopy code is complicated by it being type-specific. Since the SIMD widths change between double and single, it's not that fixable without major surgery to the generated code (doable, but kind of defeats the purpose). What I'll likely do is wire the loopy generation into the Makefile. LMK if that's an unappealing thought.

This is even better!

@lcw
Copy link
Copy Markdown
Owner

lcw commented Feb 14, 2025

Here's a first stab. Still missing a few things. If you know how to fix the make thing, I'd be grateful!

This looks great. I see a similar performance bump on my laptop running the code.

@inducer
Copy link
Copy Markdown
Contributor Author

inducer commented Feb 14, 2025

There's not currently an islpy binary wheel for Python 3.13. This should fix that, after I roll another islpy release: inducer/islpy#154.

@inducer inducer marked this pull request as ready for review February 14, 2025 22:37
@inducer
Copy link
Copy Markdown
Contributor Author

inducer commented Feb 14, 2025

Alright, put the dtype switch in place for loopy. Doesn't seem to make a difference bandwidth-wise, but it's a useful sanity check at any rate.

@inducer
Copy link
Copy Markdown
Contributor Author

inducer commented Feb 14, 2025

Also, should have said: Ready to go from my end.

@lcw lcw merged commit 20e8b86 into lcw:master Feb 14, 2025
@lcw
Copy link
Copy Markdown
Owner

lcw commented Feb 14, 2025

Thanks for pushing this!

@inducer inducer deleted the 2025-loopy-compat branch February 14, 2025 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants