Add challenge 74: GPT-2 Inference (Medium)#194
Conversation
|
@claude rename challenge to 74 and fix linting issues |
Challenge 73 is already taken by All-Pairs Shortest Paths on main. Rename medium/73_gpt2_block → medium/74_gpt2_block to avoid conflict. Also add an explicit zero-input test case (x=zeros) to generate_functional_test() to satisfy CLAUDE.md requirement that functional tests include zero inputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review: PR #194 — GPT-2 Transformer BlockIssues Fixed1. Challenge number conflict (Critical) Challenge 2. Missing zero-input test case CLAUDE.md requires Both fixes have been committed and pushed to the PR branch. ValidationWrote a CUDA reference solution and ran it through the platform ( Checklist Reviewchallenge.html ✅
challenge.py ✅
Starter files ✅All 6 present — CUDA, PyTorch, Triton, JAX, CuTe, Mojo ✓
General
|
…ng, add weight indexing example
- Replace MHA/FFN acronyms with MultiHeadAttn/FeedForward in equations and steps
- Fix LaTeX \_ rendering issue inside \text{} (plain _ works in MathJax/KaTeX)
- Clarify no causal mask is applied in attention
- Fix ambiguous attn_out variable naming across steps (use A, P, F)
- Add concrete weight buffer indexing example (W_qkv[i][j])
- Document LaTeX underscore rule in CLAUDE.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| atol=1e-03, | ||
| rtol=1e-03, |
There was a problem hiding this comment.
@claude Are these acceptable tolerance levels for this challenge?
| proj = fc @ W_proj + b_proj | ||
|
|
||
| # residual connection 2 | ||
| output.copy_(hidden + proj) |
There was a problem hiding this comment.
Nit: To avoid a memcpy, you can do torch.add(hidden, proj, out=output)
…mment - Add required <h2>Example</h2> section to challenge.html (was missing, checklist requires Implementation Requirements, Example(s), Constraints) - Fix starter.jax.py comment: "on the GPU" → "on GPU" to match CLAUDE.md JAX template format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add missing Examples section to challenge.html - Add torch.manual_seed(0) to make generate_example_test() deterministic - Fix starter.jax.py comment: "on the GPU" -> "on GPU" (matches template) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add missing <h2>Example</h2> section to challenge.html - Fix device assertion to verify CUDA (assert x.device.type == "cuda") Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous bot commits had already added Example sections; consolidate to a single <h2>Example</h2> section after Weight Layout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The checklist requires <h2> sections for Implementation Requirements, Example(s), and Constraints. The Example section was missing. Since D=768 is a fixed architecture dimension, exact tensor values cannot be shown, so the example describes input/output shapes for seq_len=4 (matching generate_example_test()). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous commit added an Example section before Weight Layout, but one already existed after Weight Layout. This removes the newly-added duplicate, leaving only the Example section before Constraints. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Adds challenge 73: GPT-2 Transformer Block (Medium difficulty)
Implements a single GPT-2 124M decoder block: the user receives an input activation tensor and a packed weight buffer, and must compute the full pre-norm transformer block — LayerNorm, 12-head self-attention with combined QKV projection, residual connection, LayerNorm, two-layer FFN with GELU activation, and a second residual connection
This is the first challenge that composes multiple ML primitives (LayerNorm, matmul, softmax, GELU) into a complete architectural building block, as opposed to implementing them in isolation
What this teaches
Test plan