- Install requirements as
pip install -r requirements.txt
Please install running the following command from the root folder:
pip install -e .
Please note that installation does not automatically install dependancies. You must install the dependancies before installing the package.
You can run evaluations in the following two ways:
- Command line run:
geak-eval -f PATH_TO_FOLDER_OR_FILE -o NAME_OF_OUTPUT_FILE -ds tbgfor Tritonbench-G-v1geak-eval -f PATH_TO_FOLDER_OR_FILE -o NAME_OF_OUTPUT_FILE -ds rocmfor ROCm
- From python script: the following is a bare minimum example, for a detail example please see
geak_eval/run.py.from geak_eval.evaluators.interface import get_evaluatorsevaluator = get_evaluators["tbg"]() # for TritonBenchG evalevaluator = get_evaluators["rocm"]() # for ROCm evalcall_status, exec_status, stdout, stderr = evaluator(generated_code, log_root=PATH_TO_LOG, file_name="kernel.py", atol=1e-5, rtol=1e-2) # run evaluations
1_exec_acc.pyfile in TritonBench did not accurately compare the outputs of two Triton files.- The execution was purely done using subprocess call for both generated and ground truth files.
- The seed consistancy is violated.
- The outputs of the two Triton runs are compared using stdout string comparison, which is not always correct.
- Around ground truth 150 files do not
print(result_gold)line, hence the eval framework is essentially comapring the two null strings. - Some of the ground truth files (e.g.
context_attn_bloom.py) does not even haveresult_gold = test_*()line at the end. So the call accuracy run using this file0_call_acc.pyjust blindly assumes that the call was success. - 7 kernel files (originally provided) run into
memory access faults, we have fixed them.
- Use
torch.allcloseto compare two runs (ground truth and generated). - Fix ground truth files to include
result_gold = test_*(). - Ensure consistent seed across files.
We have also integrated performance measurement into the framework. Kernel evaluation flow is as follows:
- Check if the kernel is callable: run the test function of the kernel.
- If the kernel is callable then check if the kernel matches ground truthe by comparing outputs of the generated kernel on know tests.
- If the generated kernel is correct: run the performance evaluation.
Please raise github issue or PR for any issues/help or contributions!
You can contribute in the following ways:
- Add new kernels for evaluations:
- Add the dataset of new kernels under
geak_eval/data. - Add the path of this new dataset in
geak_eval.constants. - Add an evaluator interface for this new dataset in
geak_eval.evaluators.interface. - Add an evaluator to be run by the interface in
geak_eval.evaluators. The evaluator is a function that only runs python call and does not run if imported as a module. Theevaluator(e.g.TB_correctness.py) is run by itsinterface(e.g.interface.TestAllCloseEvaluatorTBG).
- Add the dataset of new kernels under
- You can add new metrics for evaluator to work with in
geak_eval.metrics. - You can add new performance eval metrics for your (or existing) dataset under
geak_eval.perf.
- [2025-07-16] Added autotune compatible ROCm kernels and naive softmax, use
-tpargument with path to this folder as below:geak-eval eval -f PATH_TO_EVAL_FOLDER -o RESULT_NAME -ds rocm -tp geak_eval/data/ROCm/data/ROCm_v1_autotunenaive_softmax.pykernel from rocm blog is added to this repo.- Use
-cargument to directly run evaluations on python triton code file(s)/folder instead of json-based parsing.
Our repo has found the following repos as helpful: