GEAK-eval: Evaluation Framework for Improved TritonBench and ROCmBench

Dependancy installation

Install requirements as pip install -r requirements.txt

Installation

Please install running the following command from the root folder:

pip install -e .

Please note that installation does not automatically install dependancies. You must install the dependancies before installing the package.

Running evaluation

You can run evaluations in the following two ways:

Command line run:
- geak-eval -f PATH_TO_FOLDER_OR_FILE -o NAME_OF_OUTPUT_FILE -ds tbg for Tritonbench-G-v1
- geak-eval -f PATH_TO_FOLDER_OR_FILE -o NAME_OF_OUTPUT_FILE -ds rocm for ROCm
From python script: the following is a bare minimum example, for a detail example please see geak_eval/run.py.
- from geak_eval.evaluators.interface import get_evaluators
- evaluator = get_evaluators["tbg"]() # for TritonBenchG eval
- evaluator = get_evaluators["rocm"]() # for ROCm eval
- call_status, exec_status, stdout, stderr = evaluator(generated_code, log_root=PATH_TO_LOG, file_name="kernel.py", atol=1e-5, rtol=1e-2) # run evaluations

Issues with existing TritonBench evaluation framework

1_exec_acc.py file in TritonBench did not accurately compare the outputs of two Triton files.
The execution was purely done using subprocess call for both generated and ground truth files.
The seed consistancy is violated.
The outputs of the two Triton runs are compared using stdout string comparison, which is not always correct.
Around ground truth 150 files do not print(result_gold) line, hence the eval framework is essentially comapring the two null strings.
Some of the ground truth files (e.g. context_attn_bloom.py) does not even have result_gold = test_*() line at the end. So the call accuracy run using this file 0_call_acc.py just blindly assumes that the call was success.
7 kernel files (originally provided) run into memory access faults, we have fixed them.

We have fixed these issues as follows:

Use torch.allclose to compare two runs (ground truth and generated).
Fix ground truth files to include result_gold = test_*().
Ensure consistent seed across files.

We have also integrated performance measurement into the framework. Kernel evaluation flow is as follows:

Check if the kernel is callable: run the test function of the kernel.
If the kernel is callable then check if the kernel matches ground truthe by comparing outputs of the generated kernel on know tests.
If the generated kernel is correct: run the performance evaluation.

Help/support/contribute:

Please raise github issue or PR for any issues/help or contributions!

You can contribute in the following ways:

Add new kernels for evaluations:
- Add the dataset of new kernels under geak_eval/data.
- Add the path of this new dataset in geak_eval.constants.
- Add an evaluator interface for this new dataset in geak_eval.evaluators.interface.
- Add an evaluator to be run by the interface in geak_eval.evaluators. The evaluator is a function that only runs python call and does not run if imported as a module. The evaluator (e.g. TB_correctness.py) is run by its interface (e.g. interface.TestAllCloseEvaluatorTBG).
You can add new metrics for evaluator to work with in geak_eval.metrics.
You can add new performance eval metrics for your (or existing) dataset under geak_eval.perf.

Updates

[2025-07-16] Added autotune compatible ROCm kernels and naive softmax, use -tp argument with path to this folder as below:
- geak-eval eval -f PATH_TO_EVAL_FOLDER -o RESULT_NAME -ds rocm -tp geak_eval/data/ROCm/data/ROCm_v1_autotune
- naive_softmax.py kernel from rocm blog is added to this repo.
- Use -c argument to directly run evaluations on python triton code file(s)/folder instead of json-based parsing.

Credits:

Our repo has found the following repos as helpful:

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
geak_eval		geak_eval
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GEAK-eval: Evaluation Framework for Improved TritonBench and ROCmBench

Dependancy installation

Installation

Running evaluation

Issues with existing TritonBench evaluation framework

We have fixed these issues as follows:

Help/support/contribute:

Updates

Credits:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

AMD-AGI/GEAK-eval

Folders and files

Latest commit

History

Repository files navigation

GEAK-eval: Evaluation Framework for Improved TritonBench and ROCmBench

Dependancy installation

Installation

Running evaluation

Issues with existing TritonBench evaluation framework

We have fixed these issues as follows:

Help/support/contribute:

Updates

Credits:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages