Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions .github/ISSUE_TEMPLATE/xfails_comprehensive_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
---
name: "Comprehensive Test Xfails and Skips Report"
about: "Comprehensive analysis of all test skips and expected failures requiring developer attention"
title: "[TEST INFRASTRUCTURE] Fix 143 Test Skips and Expected Failures Across FlashInfer Test Suite"
labels: ["test-infrastructure", "technical-debt", "priority-high"]
assignees: []
---

## 🚨 Critical Issue: Test Infrastructure Technical Debt

**Auto-generated issue to track and fix test skips and expected failures in FlashInfer.**

### 📊 Executive Summary

The FlashInfer test suite currently has **143 test skips and expected failures** that represent significant technical debt and limit the project's reliability and hardware compatibility.

| Category | Count | Impact Level |
|----------|-------|--------------|
| 🖥️ **Hardware Requirements** | **51** | **CRITICAL** |
| 🚫 **Feature Unsupported** | **31** | **HIGH** |
| ⚠️ **Parameter Validation** | **21** | **MEDIUM** |
| 🔧 **Backend Limitations** | **4** | **MEDIUM** |
| 🌐 **Environment Issues** | **2** | **LOW** |
| 📂 **Other** | **34** | **VARIES** |

### 🔥 Most Critical Issues

#### 1. Hardware Compatibility Crisis (51 issues - CRITICAL)
- **SM90A support failures**: 10 test failures across core functionality
- **Modern GPU incompatibility**: SM110/120/121 GPUs not supported by TensorRT-LLM integration
- **Compute capability gaps**: Features require specific GPU generations with no fallbacks

**Business Impact**: Users with newer/older GPU hardware cannot fully utilize FlashInfer.

#### 2. Missing Feature Support (31 issues - HIGH)
- **FlashAttention 3**: 9 test failures due to incomplete FA3 integration
- **Sequence length limitations**: Multiple causal attention restrictions for `qo_len > kv_len`
- **Backend feature gaps**: Inconsistent feature support across CUDNN/Cutlass/TensorRT-LLM

**Business Impact**: Limited functionality compared to competing libraries.

#### 3. Parameter Validation Failures (21 issues - MEDIUM)
- **Head configuration**: Requirements like `num_qo_heads` divisible by `num_kv_heads`
- **Vocabulary/sampling**: Invalid parameter combinations cause silent failures
- **Block sparse configuration**: Insufficient validation of size relationships

**Business Impact**: Poor developer experience with unclear error messages.

### 📋 Detailed Action Plan

#### Phase 1: Hardware Compatibility (Weeks 1-4)
- [ ] **Audit SM90A support** - Determine why SM90A is marked unsupported
- [ ] **TensorRT-LLM compatibility** - Work with NVIDIA on SM110/120/121 support
- [ ] **Hardware abstraction layer** - Implement fallback mechanisms
- [ ] **Compatibility matrix** - Document supported hardware combinations

#### Phase 2: Feature Implementation (Weeks 5-8)
- [ ] **Complete FA3 integration** - Fix 9 FlashAttention 3 related failures
- [ ] **Causal attention with long sequences** - Support `qo_len > kv_len` cases
- [ ] **Backend feature parity** - Ensure consistent features across backends
- [ ] **Missing functionality** - Implement high-priority unsupported features

#### Phase 3: Parameter Validation (Weeks 9-10)
- [ ] **Centralized validation** - Create unified parameter checking system
- [ ] **Better error messages** - Provide actionable feedback for invalid configs
- [ ] **Configuration helpers** - Add utilities to validate parameter combinations
- [ ] **Documentation** - Clear parameter requirement documentation

#### Phase 4: Infrastructure Improvement (Weeks 11-12)
- [ ] **Test categorization** - Separate hardware-dependent from logic tests
- [ ] **Conditional execution** - Better hardware detection and graceful degradation
- [ ] **CI/CD integration** - Automated tracking of xfail reduction
- [ ] **Monitoring dashboard** - Track progress over time

### 🎯 Success Metrics

**Primary Goals (6 months):**
- Reduce total xfails from 143 to < 50 (-65%)
- Achieve 95% test pass rate on supported hardware
- Zero hardware compatibility failures for supported GPUs
- Complete FA3 integration (0 FA3-related failures)

**Secondary Goals:**
- All parameter validation issues resolved
- Consistent feature support across backends
- Automated xfail tracking in CI/CD
- Comprehensive hardware compatibility documentation

### 📁 Resources and Reports

**Generated Reports:**
- 📄 **[Comprehensive Report](./XFAILS_REPORT.md)** - Detailed analysis of all issues
- 🔧 **[Generation Script](./scripts/generate_xfails_report.py)** - Automated report creation
- 📊 **[Tracking Workflow](./.github/workflows/track_xfails.yml)** - CI/CD integration
- 📖 **[Documentation](./docs/XFAILS_TRACKING.md)** - System usage guide

**Data Formats:**
- `python scripts/generate_xfails_report.py --format json` - Machine-readable data
- `python scripts/generate_xfails_report.py --format csv` - Spreadsheet analysis

### 🔄 Progress Tracking

**Weekly Check-ins:**
- [ ] Week 1: Hardware audit complete
- [ ] Week 2: SM90A support plan established
- [ ] Week 3: TensorRT-LLM compatibility roadmap
- [ ] Week 4: Hardware abstraction layer design
- [ ] Week 5: FA3 integration started
- [ ] Week 6: Causal attention fixes
- [ ] Week 8: Backend parity assessment
- [ ] Week 10: Parameter validation overhaul
- [ ] Week 12: Infrastructure improvements complete

**Monthly Reviews:**
- Regenerate xfails report to track reduction
- Update this issue with progress
- Adjust priorities based on user feedback

### 🚀 Getting Started

**For Contributors:**
1. Review the [comprehensive report](./XFAILS_REPORT.md)
2. Pick issues from Phase 1 (hardware compatibility) for maximum impact
3. Use `python scripts/generate_xfails_report.py` to track progress
4. Focus on high-count, high-impact categories first

**For Maintainers:**
1. Assign owners for each phase
2. Set up weekly progress reviews
3. Integrate automated tracking into release process
4. Create sub-issues for major categories

### 💼 Resource Requirements

**Engineering Effort Estimate:**
- **Hardware compatibility**: 2-3 engineer-months
- **Feature implementation**: 3-4 engineer-months
- **Parameter validation**: 1-2 engineer-months
- **Infrastructure**: 1 engineer-month
- **Total**: 7-10 engineer-months

**Skills Needed:**
- CUDA/GPU programming expertise
- TensorRT-LLM integration experience
- FlashAttention implementation knowledge
- Test infrastructure and CI/CD experience

---

**Priority**: HIGH - This technical debt significantly impacts FlashInfer's usability and reliability.

**Auto-generated by**: FlashInfer xfails analysis tool
**Last updated**: 2024-12-19
**Next review**: Weekly until completion
69 changes: 69 additions & 0 deletions .github/workflows/track_xfails.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
name: Track Test Xfails

on:
schedule:
# Run weekly on Sundays at 00:00 UTC
- cron: '0 0 * * 0'
workflow_dispatch:
# Allow manual triggering

push:
paths:
- 'tests/**/*.py'
- 'scripts/generate_xfails_report.py'

jobs:
generate-xfails-report:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'

- name: Generate xfails report
run: |
python scripts/generate_xfails_report.py --output XFAILS_REPORT.md

- name: Generate JSON report for tracking
run: |
python scripts/generate_xfails_report.py --format json --output xfails_data.json

- name: Check for changes
id: changes
run: |
if git diff --quiet XFAILS_REPORT.md; then
echo "changed=false" >> $GITHUB_OUTPUT
else
echo "changed=true" >> $GITHUB_OUTPUT
fi

- name: Commit and push if changed
if: steps.changes.outputs.changed == 'true'
run: |
git config --local user.email "[email protected]"
git config --local user.name "GitHub Action"
git add XFAILS_REPORT.md xfails_data.json
git commit -m "Update xfails report - $(date +'%Y-%m-%d')"
git push

- name: Upload reports as artifacts
uses: actions/upload-artifact@v3
with:
name: xfails-reports
path: |
XFAILS_REPORT.md
xfails_data.json

- name: Create issue if xfails increased
if: steps.changes.outputs.changed == 'true'
uses: actions/github-script@v6
with:
script: |
// This would need additional logic to compare with previous report
// and create an issue if xfails count increased significantly
console.log("Xfails report updated - consider manual review");
Loading
Loading