Fix operation filtering and integrate KernelAgent with BackendBench test cases #111

Laurawly · 2025-08-22T21:35:39Z

This PR improves KernelAgent integration with BackendBench by fixing operation filtering to use exact matching (previously "relu" can map to "leaky_relu") and passing BackendBench test cases to KernelAgent for better test generation quality. Also added kernelAgent run script which needs reviews from @msaroufim. After aligned on the script we can remove it.

- Modified KernelAgent backend to use BackendBench test cases when available - Added method to convert BackendBench tests to KernelAgent format - Updated main.py to pass test cases to KernelAgent - Added list of 77 core TorchBench ops and convenient run script - Fixed import path for KernelAgent submodule This ensures KernelAgent uses real test inputs from BackendBench instead of generating synthetic tests, improving validation quality for the 77 core operations that appear in both PyTorch's core set and TorchBench traces.

- Modified filter logic in data_loaders.py to extract operation names and do exact matching - Prevents substring matches (e.g., 'relu' no longer matches 'leaky_relu') - Applied fix to all three filter locations: parquet loading, trace file parsing, and trace stream parsing - Now --ops 'relu' will only match aten.relu.default, not leaky_relu variants This ensures precise operation selection when running specific ops with KernelAgent.

- Add run_core_ops.sh: Runs KernelAgent on 77 core TorchBench operators - Captures individual operation scores - Organizes successful kernels into DirectoryBackend structure - Creates detailed failure analysis report - Uses timestamped directories to prevent overwriting - Add run_single_op.sh: Test script for running single operations - Useful for debugging and quick tests - Creates organized output with scores in README files - Scripts create organized_TIMESTAMP/ directories with: - RUN_SUMMARY.md with overall results - Individual op directories with README showing scores - Properly named kernels for DirectoryBackend compatibility

run_core_ops.sh

PaliC

Thanks this is great to see :)
A few small things but it's mostly good.

Regarding the output / result format, if you have the time see if you can modify save_verbose_results to be more useful to you / look something more like the format your using. I feel like if BackendBench is showing useful errors to you while developing KernelFalcon, then others will appreciate them as well.

Just ping me if you are doing this as I'm currently planning on doing some UX changes myself and I'd rather avoid merge conflicts. Also feel free to just tell me to make it look like the format in your script haha.

BackendBench/data_loaders.py

core_torchbench_ops.py

BackendBench/backends/kernel_agent.py

core_torchbench_ops.py

run_core_ops.sh

msaroufim · 2025-08-22T23:15:20Z

Regarding the op directory structure could we reuse the work done here https://github.com/meta-pytorch/BackendBench/pull/90/files#diff-df945c7d441794a1bfa57450cd0d72f864581ed43a7dacce67284ed34cf29e7b and if that script is not useful we should fix it

For now structure looks reasonable, thank you for fixing a correctness issue as well with naming, will not comment on style for now and just focus on content once we get the first LLM generated backend

- Move scripts to scripts/ folder as requested by PaliC - Replace shell scripts with Python implementation using logging - Reuse PR #90's clean_op_name_for_directory function - Keep TORCHBENCH_CORE_OPS list but document it better - Remove hardcoded shell scripts in favor of Python script This addresses Mark's comment about reusing PR #90's work and PaliC's suggestions for better code organization.

- Capture correctness and performance scores from output - Save scores in operation README and global scores.json - Include configuration details in score tracking

…ation - Create KernelAgentFP16Backend that filters test cases to only FP16/BF16 dtypes - Add classification of 143 TorchBench ops into Triton-friendly (85) and problematic (58) - Move TORCHBENCH_CORE_OPS to constants.py as requested in PR review - Replace shell scripts with Python implementations using logging - Add single-op and batch scripts for KernelAgent testing This addresses dtype compatibility issues where operations like sub achieved only 0.81 correctness due to int64 and scalar test cases. With FP16/BF16 filtering, we expect near 1.0 correctness for Triton-friendly operations.

…tegration - Categorize all 143 TorchBench operations into Triton-friendly (88), capable (34), and challenging (21) - Add FP16/BF16 filtering to eval.py for better Triton compatibility - Update KernelAgent backend to use PR #90 directory structure - Consolidate scripts and move to BackendBench/scripts/ - Replace print statements with proper logging - Remove experimental kernel_agent_fp16 backend in favor of filtering flag - Add comprehensive operation classification based on Triton compiler analysis

…ration - Implement reviewer's suggestion to use serialize_args format - Replace manual tensor recreation with T(...) -> torch.randn(...) conversion - Support all tensor dtypes (int, bool, complex, float) - Remove redundant import in data_loaders.py - Run ruff format on all modified files

PaliC · 2025-08-25T16:30:42Z

BackendBench/scripts/triton_friendly_ops.py

+
+# ✅ TRITON-FRIENDLY: Easy wins with good expected performance
+# These ops have static tiled loop nests, affine index maps, coalesced access patterns
+TRITON_FRIENDLY_OPS = [


@Laurawly @msaroufim do you want me to earmark these specifically in the test set?

PaliC · 2025-08-25T16:31:20Z

BackendBench/constants.py

+# The 77 core TorchBench operators
+# This list is derived from analysis of which operators appear most frequently
+# in TorchBench workloads and are considered high-priority for optimization
+TORCHBENCH_CORE_OPS = [


@Laurawly @msaroufim similarly is it useful to say what a core op is in the test set?

Generated high-performance Triton kernels using KernelAgent with GPT models. Successfully generated implementations for: - Unary operations: abs, cos, sin, exp, log2, sqrt, rsqrt, reciprocal, neg, floor, round, erf, sgn - Activation functions: relu, relu_, sigmoid, sigmoid_, tanh, gelu, elu, silu, silu_, hardtanh, hardtanh_, hardsigmoid, hardswish_, leaky_relu, leaky_relu_ - Binary operations: add, add_, sub, rsub, mul, mul_, div, div_, pow - Matrix operations: mm, bmm, addmm - Other operations: _softmax, _log_softmax, _log_softmax_backward_data Each implementation includes optimized Triton kernels with proper memory access patterns and README documentation.

Clean up generated kernel directories by removing auto-generated README files. Keeping only the implementation files and operation summaries.

Laurawly added 3 commits August 22, 2025 12:19

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 22, 2025

Laurawly added 3 commits August 22, 2025 14:42

style: Run ruff format on core_torchbench_ops.py

9084d45

style: Run ruff format on kernel_agent.py and data_loaders.py

58ed0a7

chore: Add license header to core_torchbench_ops.py

3f5d5a6

PaliC reviewed Aug 22, 2025

View reviewed changes

run_core_ops.sh Outdated Show resolved Hide resolved

PaliC requested changes Aug 22, 2025

View reviewed changes

Laurawly added 7 commits August 22, 2025 21:45

feat: Add score tracking to run_kernel_agent.py

d38491c

- Capture correctness and performance scores from output - Save scores in operation README and global scores.json - Include configuration details in score tracking

Merge main branch: Add verbose mode and untestable operators

5dafec7

fix: Correct syntax error after merge

061b57c

PaliC reviewed Aug 25, 2025

View reviewed changes

Laurawly added 2 commits September 1, 2025 11:34

refactor: Remove README.md files from generated kernel folders

c400448

Clean up generated kernel directories by removing auto-generated README files. Keeping only the implementation files and operation summaries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix operation filtering and integrate KernelAgent with BackendBench test cases #111

Fix operation filtering and integrate KernelAgent with BackendBench test cases #111

Laurawly commented Aug 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

PaliC left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msaroufim commented Aug 22, 2025 •

edited

Loading

Uh oh!

PaliC Aug 25, 2025

Uh oh!

PaliC Aug 25, 2025

Uh oh!

Uh oh!

Fix operation filtering and integrate KernelAgent with BackendBench test cases #111

Are you sure you want to change the base?

Fix operation filtering and integrate KernelAgent with BackendBench test cases #111

Conversation

Laurawly commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

PaliC left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msaroufim commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PaliC Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

PaliC Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Laurawly commented Aug 22, 2025 •

edited

Loading

PaliC left a comment •

edited

Loading

msaroufim commented Aug 22, 2025 •

edited

Loading