flashinfer-ai · Copilot · Sep 19, 2025 · Sep 19, 2025
diff --git a/.github/ISSUE_TEMPLATE/xfails_comprehensive_report.md b/.github/ISSUE_TEMPLATE/xfails_comprehensive_report.md
@@ -0,0 +1,154 @@
+---
+name: "Comprehensive Test Xfails and Skips Report"
+about: "Comprehensive analysis of all test skips and expected failures requiring developer attention"
+title: "[TEST INFRASTRUCTURE] Fix 143 Test Skips and Expected Failures Across FlashInfer Test Suite"
+labels: ["test-infrastructure", "technical-debt", "priority-high"]
+assignees: []
+---
+
+## 🚨 Critical Issue: Test Infrastructure Technical Debt
+
+**Auto-generated issue to track and fix test skips and expected failures in FlashInfer.**
+
+### 📊 Executive Summary
+
+The FlashInfer test suite currently has **143 test skips and expected failures** that represent significant technical debt and limit the project's reliability and hardware compatibility.
+
+| Category | Count | Impact Level |
+|----------|-------|--------------|
+| 🖥️ **Hardware Requirements** | **51** | **CRITICAL** |
+| 🚫 **Feature Unsupported** | **31** | **HIGH** |
+| ⚠️ **Parameter Validation** | **21** | **MEDIUM** |
+| 🔧 **Backend Limitations** | **4** | **MEDIUM** |
+| 🌐 **Environment Issues** | **2** | **LOW** |
+| 📂 **Other** | **34** | **VARIES** |
+
+### 🔥 Most Critical Issues
+
+#### 1. Hardware Compatibility Crisis (51 issues - CRITICAL)
+- **SM90A support failures**: 10 test failures across core functionality
+- **Modern GPU incompatibility**: SM110/120/121 GPUs not supported by TensorRT-LLM integration
+- **Compute capability gaps**: Features require specific GPU generations with no fallbacks
+
+**Business Impact**: Users with newer/older GPU hardware cannot fully utilize FlashInfer.
+
+#### 2. Missing Feature Support (31 issues - HIGH)
+- **FlashAttention 3**: 9 test failures due to incomplete FA3 integration
+- **Sequence length limitations**: Multiple causal attention restrictions for `qo_len > kv_len`
+- **Backend feature gaps**: Inconsistent feature support across CUDNN/Cutlass/TensorRT-LLM
+
+**Business Impact**: Limited functionality compared to competing libraries.
+
+#### 3. Parameter Validation Failures (21 issues - MEDIUM)
+- **Head configuration**: Requirements like `num_qo_heads` divisible by `num_kv_heads`
+- **Vocabulary/sampling**: Invalid parameter combinations cause silent failures
+- **Block sparse configuration**: Insufficient validation of size relationships
+
+**Business Impact**: Poor developer experience with unclear error messages.
+
+### 📋 Detailed Action Plan
+
+#### Phase 1: Hardware Compatibility (Weeks 1-4)
+- [ ] **Audit SM90A support** - Determine why SM90A is marked unsupported
+- [ ] **TensorRT-LLM compatibility** - Work with NVIDIA on SM110/120/121 support
+- [ ] **Hardware abstraction layer** - Implement fallback mechanisms
+- [ ] **Compatibility matrix** - Document supported hardware combinations
+
+#### Phase 2: Feature Implementation (Weeks 5-8)
+- [ ] **Complete FA3 integration** - Fix 9 FlashAttention 3 related failures
+- [ ] **Causal attention with long sequences** - Support `qo_len > kv_len` cases
+- [ ] **Backend feature parity** - Ensure consistent features across backends
+- [ ] **Missing functionality** - Implement high-priority unsupported features
+
+#### Phase 3: Parameter Validation (Weeks 9-10)
+- [ ] **Centralized validation** - Create unified parameter checking system
+- [ ] **Better error messages** - Provide actionable feedback for invalid configs
+- [ ] **Configuration helpers** - Add utilities to validate parameter combinations
+- [ ] **Documentation** - Clear parameter requirement documentation
+
+#### Phase 4: Infrastructure Improvement (Weeks 11-12)
+- [ ] **Test categorization** - Separate hardware-dependent from logic tests
+- [ ] **Conditional execution** - Better hardware detection and graceful degradation
+- [ ] **CI/CD integration** - Automated tracking of xfail reduction
+- [ ] **Monitoring dashboard** - Track progress over time
+
+### 🎯 Success Metrics
+
+**Primary Goals (6 months):**
+- Reduce total xfails from 143 to < 50 (-65%)
+- Achieve 95% test pass rate on supported hardware
+- Zero hardware compatibility failures for supported GPUs
+- Complete FA3 integration (0 FA3-related failures)
+
+**Secondary Goals:**
+- All parameter validation issues resolved
+- Consistent feature support across backends
+- Automated xfail tracking in CI/CD
+- Comprehensive hardware compatibility documentation
+
+### 📁 Resources and Reports
+
+**Generated Reports:**
+- 📄 **[Comprehensive Report](./XFAILS_REPORT.md)** - Detailed analysis of all issues
+- 🔧 **[Generation Script](./scripts/generate_xfails_report.py)** - Automated report creation
+- 📊 **[Tracking Workflow](./.github/workflows/track_xfails.yml)** - CI/CD integration
+- 📖 **[Documentation](./docs/XFAILS_TRACKING.md)** - System usage guide
+
+**Data Formats:**
+- `python scripts/generate_xfails_report.py --format json` - Machine-readable data
+- `python scripts/generate_xfails_report.py --format csv` - Spreadsheet analysis
+
+### 🔄 Progress Tracking
+
+**Weekly Check-ins:**
+- [ ] Week 1: Hardware audit complete
+- [ ] Week 2: SM90A support plan established  
+- [ ] Week 3: TensorRT-LLM compatibility roadmap
+- [ ] Week 4: Hardware abstraction layer design
+- [ ] Week 5: FA3 integration started
+- [ ] Week 6: Causal attention fixes
+- [ ] Week 8: Backend parity assessment
+- [ ] Week 10: Parameter validation overhaul
+- [ ] Week 12: Infrastructure improvements complete
+
+**Monthly Reviews:**
+- Regenerate xfails report to track reduction
+- Update this issue with progress
+- Adjust priorities based on user feedback
+
+### 🚀 Getting Started
+
+**For Contributors:**
+1. Review the [comprehensive report](./XFAILS_REPORT.md)
+2. Pick issues from Phase 1 (hardware compatibility) for maximum impact
+3. Use `python scripts/generate_xfails_report.py` to track progress
+4. Focus on high-count, high-impact categories first
+
+**For Maintainers:**
+1. Assign owners for each phase
+2. Set up weekly progress reviews
+3. Integrate automated tracking into release process
+4. Create sub-issues for major categories
+
+### 💼 Resource Requirements
+
+**Engineering Effort Estimate:**
+- **Hardware compatibility**: 2-3 engineer-months
+- **Feature implementation**: 3-4 engineer-months
+- **Parameter validation**: 1-2 engineer-months
+- **Infrastructure**: 1 engineer-month
+- **Total**: 7-10 engineer-months
+
+**Skills Needed:**
+- CUDA/GPU programming expertise
+- TensorRT-LLM integration experience
+- FlashAttention implementation knowledge
+- Test infrastructure and CI/CD experience
+
+---
+
+**Priority**: HIGH - This technical debt significantly impacts FlashInfer's usability and reliability.
+
+**Auto-generated by**: FlashInfer xfails analysis tool  
+**Last updated**: 2024-12-19  
+**Next review**: Weekly until completion
diff --git a/.github/workflows/track_xfails.yml b/.github/workflows/track_xfails.yml
@@ -0,0 +1,69 @@
+name: Track Test Xfails
+
+on:
+  schedule:
+    # Run weekly on Sundays at 00:00 UTC
+    - cron: '0 0 * * 0'
+  workflow_dispatch:
+    # Allow manual triggering
+
+  push:
+    paths:
+      - 'tests/**/*.py'
+      - 'scripts/generate_xfails_report.py'
+
+jobs:
+  generate-xfails-report:
+    runs-on: ubuntu-latest
+
+    steps:
+    - name: Checkout repository
+      uses: actions/checkout@v4
+
+    - name: Set up Python
+      uses: actions/setup-python@v4
+      with:
+        python-version: '3.9'
+
+    - name: Generate xfails report
+      run: |
+        python scripts/generate_xfails_report.py --output XFAILS_REPORT.md
+
+    - name: Generate JSON report for tracking
+      run: |
+        python scripts/generate_xfails_report.py --format json --output xfails_data.json
+
+    - name: Check for changes
+      id: changes
+      run: |
+        if git diff --quiet XFAILS_REPORT.md; then
+          echo "changed=false" >> $GITHUB_OUTPUT
+        else
+          echo "changed=true" >> $GITHUB_OUTPUT
+        fi
+
+    - name: Commit and push if changed
+      if: steps.changes.outputs.changed == 'true'
+      run: |
+        git config --local user.email "[email protected]"
+        git config --local user.name "GitHub Action"
+        git add XFAILS_REPORT.md xfails_data.json
+        git commit -m "Update xfails report - $(date +'%Y-%m-%d')"
+        git push
+
+    - name: Upload reports as artifacts
+      uses: actions/upload-artifact@v3
+      with:
+        name: xfails-reports
+        path: |
+          XFAILS_REPORT.md
+          xfails_data.json
+
+    - name: Create issue if xfails increased
+      if: steps.changes.outputs.changed == 'true'
+      uses: actions/github-script@v6
+      with:
+        script: |
+          // This would need additional logic to compare with previous report
+          // and create an issue if xfails count increased significantly
+          console.log("Xfails report updated - consider manual review");