How to handle different devices / autotune (shared memory sizes) #248
blefaudeux
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm wondering about a Tritonic way to handle different classes of devices, P100/V100/A100 for instance. In the excellent Matmul example there's a practical example of that, since some autotune configuration will OOM on a P100 and produce a useful
It's not a big issue given that removing the biggest block sizes fixes it, and one could query at runtime the cuda device being present and adjust the presets accordingly, but I was wondering whether building that into Triton langua would make sense (or maybe that it's already there and I missed it)
Beta Was this translation helpful? Give feedback.
All reactions