-
Notifications
You must be signed in to change notification settings - Fork 4
Fix operation filtering and integrate KernelAgent with BackendBench test cases #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Modified KernelAgent backend to use BackendBench test cases when available - Added method to convert BackendBench tests to KernelAgent format - Updated main.py to pass test cases to KernelAgent - Added list of 77 core TorchBench ops and convenient run script - Fixed import path for KernelAgent submodule This ensures KernelAgent uses real test inputs from BackendBench instead of generating synthetic tests, improving validation quality for the 77 core operations that appear in both PyTorch's core set and TorchBench traces.
- Modified filter logic in data_loaders.py to extract operation names and do exact matching - Prevents substring matches (e.g., 'relu' no longer matches 'leaky_relu') - Applied fix to all three filter locations: parquet loading, trace file parsing, and trace stream parsing - Now --ops 'relu' will only match aten.relu.default, not leaky_relu variants This ensures precise operation selection when running specific ops with KernelAgent.
- Add run_core_ops.sh: Runs KernelAgent on 77 core TorchBench operators - Captures individual operation scores - Organizes successful kernels into DirectoryBackend structure - Creates detailed failure analysis report - Uses timestamped directories to prevent overwriting - Add run_single_op.sh: Test script for running single operations - Useful for debugging and quick tests - Creates organized output with scores in README files - Scripts create organized_TIMESTAMP/ directories with: - RUN_SUMMARY.md with overall results - Individual op directories with README showing scores - Properly named kernels for DirectoryBackend compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is great to see :)
A few small things but it's mostly good.
Regarding the output / result format, if you have the time see if you can modify save_verbose_results
to be more useful to you / look something more like the format your using. I feel like if BackendBench is showing useful errors to you while developing KernelFalcon, then others will appreciate them as well.
Just ping me if you are doing this as I'm currently planning on doing some UX changes myself and I'd rather avoid merge conflicts. Also feel free to just tell me to make it look like the format in your script haha.
Regarding the op directory structure could we reuse the work done here https://github.com/meta-pytorch/BackendBench/pull/90/files#diff-df945c7d441794a1bfa57450cd0d72f864581ed43a7dacce67284ed34cf29e7b and if that script is not useful we should fix it For now structure looks reasonable, thank you for fixing a correctness issue as well with naming, will not comment on style for now and just focus on content once we get the first LLM generated backend |
- Move scripts to scripts/ folder as requested by PaliC - Replace shell scripts with Python implementation using logging - Reuse PR #90's clean_op_name_for_directory function - Keep TORCHBENCH_CORE_OPS list but document it better - Remove hardcoded shell scripts in favor of Python script This addresses Mark's comment about reusing PR #90's work and PaliC's suggestions for better code organization.
- Capture correctness and performance scores from output - Save scores in operation README and global scores.json - Include configuration details in score tracking
…ation - Create KernelAgentFP16Backend that filters test cases to only FP16/BF16 dtypes - Add classification of 143 TorchBench ops into Triton-friendly (85) and problematic (58) - Move TORCHBENCH_CORE_OPS to constants.py as requested in PR review - Replace shell scripts with Python implementations using logging - Add single-op and batch scripts for KernelAgent testing This addresses dtype compatibility issues where operations like sub achieved only 0.81 correctness due to int64 and scalar test cases. With FP16/BF16 filtering, we expect near 1.0 correctness for Triton-friendly operations.
…tegration - Categorize all 143 TorchBench operations into Triton-friendly (88), capable (34), and challenging (21) - Add FP16/BF16 filtering to eval.py for better Triton compatibility - Update KernelAgent backend to use PR #90 directory structure - Consolidate scripts and move to BackendBench/scripts/ - Replace print statements with proper logging - Remove experimental kernel_agent_fp16 backend in favor of filtering flag - Add comprehensive operation classification based on Triton compiler analysis
…ration - Implement reviewer's suggestion to use serialize_args format - Replace manual tensor recreation with T(...) -> torch.randn(...) conversion - Support all tensor dtypes (int, bool, complex, float) - Remove redundant import in data_loaders.py - Run ruff format on all modified files
|
||
# ✅ TRITON-FRIENDLY: Easy wins with good expected performance | ||
# These ops have static tiled loop nests, affine index maps, coalesced access patterns | ||
TRITON_FRIENDLY_OPS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Laurawly @msaroufim do you want me to earmark these specifically in the test set?
# The 77 core TorchBench operators | ||
# This list is derived from analysis of which operators appear most frequently | ||
# in TorchBench workloads and are considered high-priority for optimization | ||
TORCHBENCH_CORE_OPS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Laurawly @msaroufim similarly is it useful to say what a core op is in the test set?
Generated high-performance Triton kernels using KernelAgent with GPT models. Successfully generated implementations for: - Unary operations: abs, cos, sin, exp, log2, sqrt, rsqrt, reciprocal, neg, floor, round, erf, sgn - Activation functions: relu, relu_, sigmoid, sigmoid_, tanh, gelu, elu, silu, silu_, hardtanh, hardtanh_, hardsigmoid, hardswish_, leaky_relu, leaky_relu_ - Binary operations: add, add_, sub, rsub, mul, mul_, div, div_, pow - Matrix operations: mm, bmm, addmm - Other operations: _softmax, _log_softmax, _log_softmax_backward_data Each implementation includes optimized Triton kernels with proper memory access patterns and README documentation.
Clean up generated kernel directories by removing auto-generated README files. Keeping only the implementation files and operation summaries.
This PR improves KernelAgent integration with BackendBench by fixing operation filtering to use exact matching (previously "relu" can map to "leaky_relu") and passing BackendBench test cases to KernelAgent for better test generation quality. Also added kernelAgent run script which needs reviews from @msaroufim. After aligned on the script we can remove it.