Once I do tune_kernel("my_kernel", ...), how do I get the object of the compiled function that has the best runtime, which can be called again? Something like this:
func = tune_kernel("my_kernel", kernel_str, ..., args, ...)
func(args). # call again with args
Essentially an autotuner that returns the best compiled version. Thanks!