|
| 1 | +name: Performance issue |
| 2 | +description: Report performance problems or optimisation opportunities |
| 3 | +title: "[PERFORMANCE] " |
| 4 | +labels: |
| 5 | + - performance |
| 6 | +assignees: |
| 7 | + - LoserCheems |
| 8 | + - Evanwu1125 |
| 9 | + - SNHuan |
| 10 | + - Thanksyy |
| 11 | + - ftgreat |
| 12 | + - zacliu2023 |
| 13 | + - juliohsu |
| 14 | + - wubingheng111 |
| 15 | +body: |
| 16 | + - type: markdown |
| 17 | + attributes: |
| 18 | + value: | |
| 19 | + Provide enough detail about performance regressions or optimisation opportunities so we can reproduce and diagnose them. |
| 20 | + - type: textarea |
| 21 | + id: issue-description |
| 22 | + attributes: |
| 23 | + label: Performance issue description |
| 24 | + description: Summarise the performance problem. |
| 25 | + placeholder: Forward latency increases when... |
| 26 | + validations: |
| 27 | + required: true |
| 28 | + - type: textarea |
| 29 | + id: current-performance |
| 30 | + attributes: |
| 31 | + label: Current performance metrics |
| 32 | + description: Share benchmark numbers and configuration (sequence length, batch size, heads, head dimension, throughput, memory usage). |
| 33 | + placeholder: | |
| 34 | + Sequence length: 8192 |
| 35 | + Batch size: 2 |
| 36 | + Heads: 32 |
| 37 | + Head dim: 128 |
| 38 | + Speed: 15.2 ms/iteration |
| 39 | + Memory: 8.5 GB |
| 40 | + validations: |
| 41 | + required: true |
| 42 | + - type: textarea |
| 43 | + id: expected-performance |
| 44 | + attributes: |
| 45 | + label: Expected performance |
| 46 | + description: Explain what performance you expect and the baseline you are comparing against. |
| 47 | + placeholder: Expect <10 ms/iteration based on Flash Attention benchmark... |
| 48 | + - type: textarea |
| 49 | + id: environment |
| 50 | + attributes: |
| 51 | + label: Environment information |
| 52 | + description: Run the following command and paste the output. |
| 53 | + placeholder: | |
| 54 | + python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.version.cuda}'); print(f'GPU: {torch.cuda.get_device_name() if torch.cuda.is_available() else \"None\"}')" |
| 55 | + render: shell |
| 56 | + validations: |
| 57 | + required: true |
| 58 | + - type: textarea |
| 59 | + id: benchmark-code |
| 60 | + attributes: |
| 61 | + label: Benchmark code |
| 62 | + description: Provide the code snippet or script used to measure performance. |
| 63 | + render: python |
| 64 | + - type: textarea |
| 65 | + id: profiling |
| 66 | + attributes: |
| 67 | + label: Profiling information |
| 68 | + description: Include relevant excerpts from nsys, nvprof, or PyTorch profiler if available. |
| 69 | + - type: textarea |
| 70 | + id: system-info |
| 71 | + attributes: |
| 72 | + label: System information |
| 73 | + description: GPU model, compute capability, CPU, RAM, and other hardware details. |
| 74 | + placeholder: RTX 4090 24GB, compute capability 8.9, Intel i9-14900K, 64GB RAM |
| 75 | + - type: textarea |
| 76 | + id: additional-context |
| 77 | + attributes: |
| 78 | + label: Additional context |
| 79 | + description: Mention regressions, different batch sizes, attention patterns, or other observations. |
0 commit comments