Skip to content

Commit 42a8e7a

Browse files
PR feedback
1 parent 0939c24 commit 42a8e7a

File tree

1 file changed

+3
-5
lines changed

1 file changed

+3
-5
lines changed

.github/workflows/create-challenge.yml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,12 +75,10 @@ jobs:
7575
7676
THEME — REAL-WORLD INFERENCE KERNELS:
7777
Focus on challenges inspired by real-world ML inference workloads. Think about the building blocks of modern neural networks (transformers, diffusion models, LLMs, vision models) and the GPU kernels that make them fast. Good examples:
78-
- Transformer components: multi-head attention, KV-cache updates, rotary positional embeddings (RoPE), RMS normalization, grouped-query attention
79-
- Inference optimizations: flash attention, paged attention, speculative decoding verification, quantized matmul (INT8/INT4), fused MLP blocks
80-
- Diffusion model ops: denoising steps, classifier-free guidance fusion, cross-attention
78+
- Fused inference kernels: fused SwiGLU/GeGLU MLP blocks, flash attention, paged attention, speculative decoding verification, quantized matmul (INT8/INT4), fused QKV projection, KV-cache updates
8179
- Sequence/token operations: top-k/top-p sampling, beam search step, KV-cache rotation, causal masking
82-
- Model architecture blocks: full transformer decoder block (like the existing GPT-2 challenge), mixture-of-experts routing, SwiGLU/GeGLU activations, LoRA forward pass
83-
- Serving primitives: batched inference with variable sequence lengths, continuous batching, prefix caching
80+
- Model architecture blocks: full transformer decoder block (like the existing GPT-2 challenge), mixture-of-experts routing, LoRA forward pass
81+
- Online/streaming algorithms: online softmax, streaming attention (process new queries without storing entire rows), continuous batching, prefix caching
8482
8583
Look at `challenges/medium/74_gpt2_block/` as the gold standard for this style of challenge. The solver should implement a meaningful, self-contained inference building block — not a toy operation.
8684

0 commit comments

Comments
 (0)