[Enhancement] Optimize templates for half/bfloat16#1845
[Enhancement] Optimize templates for half/bfloat16#1845LeiWang1999 merged 4 commits intotile-ai:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughSplits scalar bf16/fp16 Min/Max codegen into explicit type-specific branches using native __hmin/__hmax with casts, rewrites 16-bit shuffle/math helpers to operate via integer bitcasts for half_t/bfloat16_t, and extends reduction paths to handle max/min/absmax with proper init/dup/update semantics. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@regression-perf |
Performance Regression Test ReportTriggered by: @LJC00118 Results
Artifacts
|
… Update initialization logic for temporary buffers and improve reduction operations in reduce.cc. Add print statements for debugging in test_tilelang_language_reduce.py.
Summary by CodeRabbit
Bug Fixes
Refactor