|
73 | 73 |
|
74 | 74 | Pick the next available challenge number that is NOT already taken by a merged challenge or an open PR. Also avoid creating a challenge on the same topic as any pending PR, even if the number differs. |
75 | 75 |
|
| 76 | + THEME — REAL-WORLD INFERENCE KERNELS: |
| 77 | + Focus on challenges inspired by real-world ML inference workloads. Think about the building blocks of modern neural networks (transformers, diffusion models, LLMs, vision models) and the GPU kernels that make them fast. Good examples: |
| 78 | + - Transformer components: multi-head attention, KV-cache updates, rotary positional embeddings (RoPE), RMS normalization, grouped-query attention |
| 79 | + - Inference optimizations: flash attention, paged attention, speculative decoding verification, quantized matmul (INT8/INT4), fused MLP blocks |
| 80 | + - Diffusion model ops: denoising steps, classifier-free guidance fusion, cross-attention |
| 81 | + - Sequence/token operations: top-k/top-p sampling, beam search step, KV-cache rotation, causal masking |
| 82 | + - Model architecture blocks: full transformer decoder block (like the existing GPT-2 challenge), mixture-of-experts routing, SwiGLU/GeGLU activations, LoRA forward pass |
| 83 | + - Serving primitives: batched inference with variable sequence lengths, continuous batching, prefix caching |
| 84 | +
|
| 85 | + Look at `challenges/medium/74_gpt2_block/` as the gold standard for this style of challenge. The solver should implement a meaningful, self-contained inference building block — not a toy operation. |
| 86 | +
|
76 | 87 | HARD RULES: |
77 | 88 | - Do NOT create trivial element-wise challenges. We have way too many (sigmoid, relu, silu, clipping, etc). If your idea is just "apply f(x) to every element", pick something else. |
78 | 89 | - Do NOT duplicate existing challenges — check both the merged challenges in the repo AND the open PRs listed above. |
|
0 commit comments