Skip to content

Commit cc7ffe3

Browse files
committed
Add issue/PR templates; relax mask/bias checks
Introduces standardized issue and pull request templates to streamline bug reports, feature proposals, and performance diagnostics. Relaxes validation in variable-length attention forward by dropping dtype/device checks for mask and bias, enabling optional inputs and avoiding unnecessary failures.
1 parent e3ff84c commit cc7ffe3

File tree

7 files changed

+395
-5
lines changed

7 files changed

+395
-5
lines changed
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
name: Bug report
2+
description: Create a report to help us improve Flash-DMA
3+
title: "[BUG REPORT] "
4+
labels:
5+
- bug
6+
assignees:
7+
- LoserCheems
8+
- Evanwu1125
9+
- SNHuan
10+
- Thanksyy
11+
- ftgreat
12+
- zacliu2023
13+
- juliohsu
14+
- wubingheng111
15+
body:
16+
- type: markdown
17+
attributes:
18+
value: |
19+
Thanks for taking the time to report an issue. Please fill out the details below so we can reproduce and fix the problem quickly.
20+
- type: textarea
21+
id: bug-description
22+
attributes:
23+
label: Describe the bug
24+
description: Provide a concise description of the incorrect behaviour.
25+
placeholder: Unexpected error when calling flash_dmattn(...)
26+
validations:
27+
required: true
28+
- type: textarea
29+
id: reproduction
30+
attributes:
31+
label: Steps to reproduce
32+
description: Share the minimal steps or code necessary for us to see the failure.
33+
placeholder: |
34+
1. Import flash_dmattn
35+
2. Run the snippet below
36+
3. Observe the error
37+
render: python
38+
validations:
39+
required: true
40+
- type: textarea
41+
id: expected-behavior
42+
attributes:
43+
label: Expected behaviour
44+
description: Tell us what you expected to happen instead.
45+
placeholder: The kernel should return valid attention output without raising an exception.
46+
validations:
47+
required: true
48+
- type: textarea
49+
id: environment
50+
attributes:
51+
label: Environment information
52+
description: Run the following command and paste the full output.
53+
placeholder: |
54+
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.version.cuda}'); print(f'GPU: {torch.cuda.get_device_name() if torch.cuda.is_available() else \"None\"}')"
55+
render: shell
56+
validations:
57+
required: true
58+
- type: textarea
59+
id: additional-context
60+
attributes:
61+
label: Additional context
62+
description: Include sequence lengths, batch sizes, or any other details that might help us debug.
63+
placeholder: Tested with seq_len=8192, batch=2, head_dim=128...
64+
- type: textarea
65+
id: traceback
66+
attributes:
67+
label: Error traceback
68+
description: Paste the full traceback if available.
69+
render: text
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
name: Feature request
2+
description: Suggest an idea for FDMA
3+
title: "[FEATURE REQUEST] "
4+
labels:
5+
- feature
6+
assignees:
7+
- LoserCheems
8+
- Evanwu1125
9+
- SNHuan
10+
- Thanksyy
11+
- ftgreat
12+
- zacliu2023
13+
- juliohsu
14+
- wubingheng111
15+
body:
16+
- type: markdown
17+
attributes:
18+
value: |
19+
Help us understand the feature you are proposing and why it matters for Flash-DMA workflows.
20+
- type: textarea
21+
id: problem
22+
attributes:
23+
label: Problem statement
24+
description: Explain the problem or limitation that motivates this feature request.
25+
placeholder: I am limited by...
26+
validations:
27+
required: true
28+
- type: textarea
29+
id: proposed-solution
30+
attributes:
31+
label: Proposed solution
32+
description: Describe the feature or behaviour you would like to see.
33+
placeholder: Introduce a kernel path that...
34+
validations:
35+
required: true
36+
- type: textarea
37+
id: alternatives
38+
attributes:
39+
label: Alternatives considered
40+
description: List any other approaches you have evaluated and why they are insufficient.
41+
placeholder: I tried using...
42+
- type: textarea
43+
id: implementation
44+
attributes:
45+
label: Implementation details
46+
description: Call out potential CUDA/Python changes, performance implications, or compatibility considerations.
47+
placeholder: Requires updates to flash_dmattn_interface and CUDA op...
48+
- type: textarea
49+
id: use-case
50+
attributes:
51+
label: Use case
52+
description: Describe the workloads or scenarios that would benefit from this feature.
53+
placeholder: Long-context code completion with...
54+
- type: textarea
55+
id: references
56+
attributes:
57+
label: Related work
58+
description: Share links to papers, repositories, or prior art that inspired this request.
59+
placeholder: Paper link or repository URL
60+
- type: textarea
61+
id: additional-context
62+
attributes:
63+
label: Additional context
64+
description: Add any extra information or screenshots that may help us understand the request.
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
name: Performance issue
2+
description: Report performance problems or optimisation opportunities
3+
title: "[PERFORMANCE] "
4+
labels:
5+
- performance
6+
assignees:
7+
- LoserCheems
8+
- Evanwu1125
9+
- SNHuan
10+
- Thanksyy
11+
- ftgreat
12+
- zacliu2023
13+
- juliohsu
14+
- wubingheng111
15+
body:
16+
- type: markdown
17+
attributes:
18+
value: |
19+
Provide enough detail about performance regressions or optimisation opportunities so we can reproduce and diagnose them.
20+
- type: textarea
21+
id: issue-description
22+
attributes:
23+
label: Performance issue description
24+
description: Summarise the performance problem.
25+
placeholder: Forward latency increases when...
26+
validations:
27+
required: true
28+
- type: textarea
29+
id: current-performance
30+
attributes:
31+
label: Current performance metrics
32+
description: Share benchmark numbers and configuration (sequence length, batch size, heads, head dimension, throughput, memory usage).
33+
placeholder: |
34+
Sequence length: 8192
35+
Batch size: 2
36+
Heads: 32
37+
Head dim: 128
38+
Speed: 15.2 ms/iteration
39+
Memory: 8.5 GB
40+
validations:
41+
required: true
42+
- type: textarea
43+
id: expected-performance
44+
attributes:
45+
label: Expected performance
46+
description: Explain what performance you expect and the baseline you are comparing against.
47+
placeholder: Expect <10 ms/iteration based on Flash Attention benchmark...
48+
- type: textarea
49+
id: environment
50+
attributes:
51+
label: Environment information
52+
description: Run the following command and paste the output.
53+
placeholder: |
54+
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.version.cuda}'); print(f'GPU: {torch.cuda.get_device_name() if torch.cuda.is_available() else \"None\"}')"
55+
render: shell
56+
validations:
57+
required: true
58+
- type: textarea
59+
id: benchmark-code
60+
attributes:
61+
label: Benchmark code
62+
description: Provide the code snippet or script used to measure performance.
63+
render: python
64+
- type: textarea
65+
id: profiling
66+
attributes:
67+
label: Profiling information
68+
description: Include relevant excerpts from nsys, nvprof, or PyTorch profiler if available.
69+
- type: textarea
70+
id: system-info
71+
attributes:
72+
label: System information
73+
description: GPU model, compute capability, CPU, RAM, and other hardware details.
74+
placeholder: RTX 4090 24GB, compute capability 8.9, Intel i9-14900K, 64GB RAM
75+
- type: textarea
76+
id: additional-context
77+
attributes:
78+
label: Additional context
79+
description: Mention regressions, different batch sizes, attention patterns, or other observations.
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
name: Bug Fix
2+
description: Fix a bug with clear reproduction, scope, and tests
3+
title: "[BUG FIX] "
4+
labels:
5+
- bug
6+
body:
7+
- type: markdown
8+
attributes:
9+
value: |
10+
Thanks for contributing a bug fix! Please complete the sections below so reviewers can understand and verify the change quickly.
11+
- type: textarea
12+
id: summary
13+
attributes:
14+
label: Summary
15+
description: What bug is fixed and what parts of the codebase are impacted?
16+
placeholder: Resolves crash when...
17+
validations:
18+
required: true
19+
- type: textarea
20+
id: root-cause
21+
attributes:
22+
label: Root cause
23+
description: Briefly describe the underlying issue.
24+
placeholder: The kernel assumed...
25+
- type: textarea
26+
id: changes
27+
attributes:
28+
label: Changes
29+
description: Highlight the notable code-level modifications.
30+
placeholder: Updated flash_dmattn_interface to...
31+
validations:
32+
required: true
33+
- type: textarea
34+
id: reproduction
35+
attributes:
36+
label: Reproduction steps or MRE
37+
description: Provide steps or a minimal snippet that reproduces the original bug.
38+
render: python
39+
- type: textarea
40+
id: tests
41+
attributes:
42+
label: Tests
43+
description: List the tests you added or ran and their results.
44+
placeholder: Ran benchmarks/forward_equivalence.py; added unit test...
45+
validations:
46+
required: true
47+
- type: textarea
48+
id: compatibility
49+
attributes:
50+
label: Compatibility
51+
description: Note any migration concerns or backwards compatibility considerations.
52+
- type: checkboxes
53+
id: checklist
54+
attributes:
55+
label: Checklist
56+
options:
57+
- label: Linked issue provided
58+
- label: Adds or updates tests
59+
- label: Updates docs if needed
60+
- label: No performance regressions observed
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
name: Feature Support
2+
description: Introduce a new feature with design context and tests
3+
title: "[FEATURE SUPPORT] "
4+
labels:
5+
- feature
6+
body:
7+
- type: markdown
8+
attributes:
9+
value: |
10+
Share enough detail about the new feature so reviewers can evaluate scope, design, and testing.
11+
- type: textarea
12+
id: summary
13+
attributes:
14+
label: Summary
15+
description: What feature is being added and why?
16+
placeholder: Adds configurable...
17+
validations:
18+
required: true
19+
- type: textarea
20+
id: design
21+
attributes:
22+
label: Design
23+
description: Outline the design or architecture and mention alternatives considered.
24+
placeholder: Uses new backend selection flow...
25+
- type: textarea
26+
id: changes
27+
attributes:
28+
label: Changes
29+
description: Describe new or changed public APIs, configuration, or CLI behaviour.
30+
placeholder: Adds flash_dmattn.feature_flag...
31+
validations:
32+
required: true
33+
- type: textarea
34+
id: implementation-notes
35+
attributes:
36+
label: Implementation notes
37+
description: Highlight tricky parts or noteworthy implementation details.
38+
- type: textarea
39+
id: tests
40+
attributes:
41+
label: Tests
42+
description: List unit or integration tests you added or updated and how you validated them.
43+
placeholder: Ran benchmarks/forward_equivalence.py; added example in...
44+
validations:
45+
required: true
46+
- type: textarea
47+
id: docs
48+
attributes:
49+
label: Documentation
50+
description: Mention doc updates or examples that accompany this feature.
51+
- type: checkboxes
52+
id: checklist
53+
attributes:
54+
label: Checklist
55+
options:
56+
- label: Linked issue provided
57+
- label: API stabilised
58+
- label: Tests added or updated
59+
- label: Docs added or updated
60+
- label: No known performance regressions
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
name: Performance Optimization
2+
description: Optimise performance with benchmark evidence
3+
title: "[PERFORMANCE OPTIMIZATION] "
4+
labels:
5+
- performance
6+
body:
7+
- type: markdown
8+
attributes:
9+
value: |
10+
Document the optimisation, methodology, and results so reviewers can validate gains and correctness.
11+
- type: textarea
12+
id: summary
13+
attributes:
14+
label: Summary
15+
description: What is optimised and why?
16+
placeholder: Improves forward latency for...
17+
validations:
18+
required: true
19+
- type: textarea
20+
id: baseline
21+
attributes:
22+
label: Baseline metrics
23+
description: Provide the current performance numbers and environment.
24+
placeholder: Baseline throughput 150 tok/s on H100 with...
25+
validations:
26+
required: true
27+
- type: textarea
28+
id: approach
29+
attributes:
30+
label: Approach
31+
description: Describe the optimisation techniques used.
32+
placeholder: Introduced block-wise accumulation...
33+
- type: textarea
34+
id: results
35+
attributes:
36+
label: Results
37+
description: Share before/after benchmarks and how to reproduce them.
38+
placeholder: |
39+
Before: 15.2 ms/iteration (benchmark command)
40+
After: 9.8 ms/iteration (benchmark command)
41+
validations:
42+
required: true
43+
- type: textarea
44+
id: impact
45+
attributes:
46+
label: Impact
47+
description: Note memory, throughput trade-offs, or hardware-specific considerations.
48+
- type: textarea
49+
id: risks
50+
attributes:
51+
label: Risks
52+
description: Highlight edge cases, correctness risks, or gating tests added.
53+
- type: checkboxes
54+
id: checklist
55+
attributes:
56+
label: Checklist
57+
options:
58+
- label: Linked issue provided
59+
- label: Benchmarks included and reproducible
60+
- label: No accuracy regression
61+
- label: Docs updated where needed

0 commit comments

Comments
 (0)