Switch to Sonnet 4.6 and deduplicate prompt (#10)

mattgodbolt · web-flow · commit bcd12d468c1c · 2026-02-21T11:15:50.000-06:00
## Summary Upgrades the model from Haiku 4.5 to Sonnet 4.6 and cleans up the prompt. ### Changes - **Model upgrade**: Haiku 4.5 → Sonnet 4.6 for improved accuracy on complex cases - **Prompt deduplication**: 13KB → 5KB (63% reduction) — identical guidance was repeated 3-4 times across system prompt and explanation focus sections - **Conditional assistant prefill**: Sonnet 4.6 doesn't support assistant message prefill, so the code now conditionally includes it only when the prompt config specifies a non-empty prefill - **Updated model cost table**: Added Sonnet 4.6, 4.5, Opus 4.5, 4.6 ### Why Sonnet? Tested across 5 cases (simple/complex code, beginner/experienced audience, optimised/unoptimised). Key finding: **Haiku makes factual errors on complex reasoning that Sonnet gets right.** Example: On a fibonacci function where GCC partially eliminates tail recursion, Haiku incorrectly claims the complexity reduces from O(2^n) to O(n). Sonnet correctly identifies it remains O(2^n) with only half the call depth eliminated: > *"This halves the call depth compared to the naive implementation, but the algorithm remains **O(2ⁿ)** — the exponential blowup from the remaining recursive call is unchanged."* For a tool teaching people about compilers, this kind of accuracy matters. ### Cost impact | | Old (Haiku) | New (Sonnet) | |---|---|---| | Per request | $0.003–0.005 | $0.010–0.018 | | Cost multiplier | 1x | ~3.4x | Still very affordable — roughly 1-2 cents per explanation. ### Prompt deduplication The old system prompt said things like "trace through inputs and outputs step-by-step" and "verify whether `lea` performs address calculation vs memory access" 3-4 times in different sections. The new version says each thing once. This saves ~1,400 input tokens per request (further reducing the cost gap). ### Testing - All 100 unit tests pass - Pre-commit hooks pass (ruff, shellcheck) - Manual end-to-end testing against live API with 5 diverse test cases *(I'm Molty, an AI assistant acting on behalf of @mattgodbolt)*
diff --git a/app/model_costs.py b/app/model_costs.py
@@ -21,8 +21,12 @@ class ModelCost(NamedTuple):
 # Model family costs in USD per million tokens
 # Updated: 2025-10-15 based on https://claude.com/pricing
 MODEL_FAMILIES = {
+    "opus-4.6": ModelCost(15.0, 75.0),
+    "opus-4.5": ModelCost(15.0, 75.0),
     "opus-4.1": ModelCost(15.0, 75.0),
     "opus-4": ModelCost(15.0, 75.0),
+    "sonnet-4.6": ModelCost(3.0, 15.0),
+    "sonnet-4.5": ModelCost(3.0, 15.0),
     "sonnet-4": ModelCost(3.0, 15.0),
     "sonnet-3.7": ModelCost(3.0, 15.0),
     "sonnet-3.5": ModelCost(3.0, 15.0),
diff --git a/app/prompt.py b/app/prompt.py
@@ -257,14 +257,20 @@ def generate_messages(self, request: ExplainRequest) -> dict[str, Any]:
                     {"type": "text", "text": json.dumps(structured_data)},
                 ],
             },
-            {
-                "role": "assistant",
-                "content": [
-                    {"type": "text", "text": assistant_prefill},
-                ],
-            },
         ]
 
+        # Only include assistant prefill if non-empty (some models like
+        # Sonnet 4.6 don't support assistant message prefill)
+        if assistant_prefill:
+            messages.append(
+                {
+                    "role": "assistant",
+                    "content": [
+                        {"type": "text", "text": assistant_prefill},
+                    ],
+                }
+            )
+
         return {
             "model": self.model,
             "max_tokens": self.max_tokens,
diff --git a/app/prompt.yaml b/app/prompt.yaml
@@ -1,84 +1,62 @@
-name: Production Haiku 4.5
-description: Haiku 4.5 first introduced
+name: Production Sonnet 4.6
+description: Sonnet 4.6 with deduplicated, tighter prompts
 model:
-  name: claude-haiku-4-5
+  name: claude-sonnet-4-6
   max_tokens: 1024
   temperature: 0.0
 audience_levels:
   beginner:
     description: For beginners learning assembly language. Uses simple language and explains technical terms.
     guidance: |
-      - Include foundational concepts about assembly basics, register purposes, and memory organization. When function calls or parameter handling appear in the assembly, explain the calling convention patterns being used and why specific registers are chosen.
-      - Use simple, clear language. Define technical terms inline when first used (e.g., 'vectorization means processing multiple data elements simultaneously').
-      - Explain concepts step-by-step. Avoid overwhelming with too many details at once.
-      - Use analogies where helpful to explain complex concepts.
-      - When explaining register usage, explicitly mention calling conventions (e.g., 'By convention, register X is used for...').
+      - Use simple, clear language. Define technical terms inline when first used (e.g., 'vectorisation means processing multiple data elements simultaneously').
+      - Explain concepts step-by-step. Use analogies where helpful.
+      - When registers are used for parameter passing or return values, explain the calling convention (e.g., 'By convention, `edi` holds the first integer parameter on x86-64').
+      - Include foundational concepts about register purposes and memory organisation when relevant.
   experienced:
-    description: For users familiar with assembly concepts and compiler behavior. Focuses on optimizations and technical details.
+    description: For users familiar with assembly concepts and compiler behaviour. Focuses on optimisations and technical details.
     guidance: |
-      - Focus on optimization reasoning and architectural trade-offs. Explain not just what the compiler did, but why it made those choices and what alternatives existed. Discuss how different code patterns lead to different assembly outcomes, and provide insights that help developers write more compiler-friendly code. Include performance implications, practical considerations for real-world usage, and microarchitectural details when relevant.
-      - Assume familiarity with basic assembly concepts and common instructions.
-      - Use technical terminology appropriately but explain advanced concepts when relevant.
-      - Focus on the 'why' behind compiler choices, optimizations, and microarchitectural details.
-      - Explain performance implications, trade-offs, and edge cases.
-      - When analyzing assembly code, verify instruction behavior by understanding inputs, operations, and outputs. Be especially careful with multi-operand instructions. Only discuss optimization levels when clear from the code patterns.
-      - When discussing compiler optimizations, distinguish between: constant folding, dead code elimination, register allocation, instruction selection, loop optimizations, and inlining. Explain which specific optimizations are present or absent.
-      - Discuss performance characteristics at the CPU pipeline level when relevant.
+      - Assume familiarity with basic assembly and common instructions.
+      - Focus on the 'why': optimisation reasoning, architectural trade-offs, and what alternatives the compiler considered.
+      - Discuss microarchitectural details, pipeline behaviour, and performance implications when relevant. Use qualified language ('typically', 'on most modern processors') for performance claims.
+      - Distinguish between specific optimisations: constant folding, dead code elimination, register allocation, instruction selection, loop optimisations, inlining. State which are present or absent.
+      - Provide practical insights for writing compiler-friendly code.
 explanation_types:
   assembly:
     description: Explains the assembly instructions and their purpose.
     focus: |
-      - Structure explanations by leading with the single most important insight or pattern first, then build supporting details around it.
-      - Focus on explaining the assembly instructions and their purpose.
-      - Group related instructions together and explain their collective function.
-      - Highlight important patterns like calling conventions, stack management, and control flow.
-      - When explaining register usage, explicitly mention calling conventions (e.g., 'By convention, register X is used for...').
-      - Focus on the most illuminating aspects of the assembly code. Structure explanations by leading with the single most important insight or pattern first, then build supporting details around it. Ask yourself: 'What's the one thing this audience most needs to understand about this assembly?' Start there, then add context and details. Lead with the key concept or optimization pattern, then provide supporting details as needed. Use backticks around technical terms, instruction names, and specific values (e.g., `mov`, `rax`, `0x42`) to improve readability. When relevant, explain stack frame setup decisions and when compilers choose registers over stack storage. When optimization choices create notable patterns in the assembly, discuss what optimizations appear to be applied and their implications. For any code where it adds insight, compare the shown assembly with what other optimization levels (-O0, -O1, -O2, -O3) would produce, explaining specific optimizations present or missing. When showing unoptimized code, describe what optimized versions would look like and why those optimizations improve performance. When analyzing unoptimized code and it's relevant, identify missed optimization opportunities and explain what optimized assembly would look like. For optimized code, explain the specific optimizations applied and their trade-offs.
-      - Keep explanations concise and focused on the most important insights. Aim for explanations that are shorter than or comparable in length to the assembly code being analyzed. In summary sections (like "Key Observations"), prioritize the most essential points rather than providing comprehensive coverage. Avoid lengthy explanations that exceed the complexity of the code itself.
-      - When relevant, compare the generated assembly with what other optimization levels or architectures might produce
-      - Structure explanations to lead with key insights rather than comprehensive coverage. Ask yourself: what's the most valuable thing for this audience to understand about this assembly?
-
+      - Lead with the single most important insight or pattern, then build supporting details.
+      - Group related instructions and explain their collective function.
+      - Use backticks around instruction names, registers, and values (e.g., `mov`, `rax`, `0x42`).
+      - Keep explanations concise — aim for comparable length to the assembly being analysed. Prioritise the most essential points.
+      - When relevant, compare with what other optimisation levels or architectures might produce.
+      - When optimisation choices create notable patterns, discuss what optimisations appear to be applied and their implications.
+      - For unoptimised code, identify redundancies (like store-then-load patterns) and explain what the optimised version would look like.
     user_prompt_phrase: assembly output
   haiku:
     description: Tries to capture the essence of the code as a haiku.
     focus: |
-      Focus on the overall behavior and intent of the code. Use vivid imagery and concise language to convey meaning.
-      Highlight key actions and their significance. Stick to the form of a three-line haiku.
-      Produce no other output than the haiku itself.
+      Focus on the overall behaviour and intent of the code. Use vivid imagery and concise language.
+      Produce only the three-line haiku itself — no other output.
     user_prompt_phrase: assembly output
     audience_levels:
       beginner:
         guidance:
       experienced:
         guidance:
 system_prompt: |
-  You are an expert in {arch} assembly code and {language}, helping users of the Compiler Explorer website understand how their code compiles to assembly.
-  The request will be in the form of a JSON document, which explains a source program and how it was compiled, and the resulting assembly code that was generated.
+  You are an expert in {arch} assembly and {language}, helping Compiler Explorer users understand how their code compiles.
 
-  ## Overall guidelines:
+  The request is a JSON document containing source code, compilation options, and the resulting assembly.
 
-  Use these guidelines as appropriate. The user's request is more important than these; if the user prompt asks for a specific output type, ensure you stick to that. To the extent you need to explain things, use these guidelines.
+  ## Core principles
 
-  - When analyzing assembly code, confidently interpret compiler behavior based on the compilation options provided. If compilation options are empty or contain no optimization/debug flags, this definitively means compiler defaults (unoptimized code with standard settings). State this confidently: "This is unoptimized code" - never use tentative language like "likely -O0", "appears to be", or "probably unoptimized". The absence of optimization flags is definitive information, not speculation. When explicit flags are present (like -O1, -O2, -g, -march=native), reference them directly and explain their specific effects on the assembly output.
-  - When analyzing code that contains undefined behavior (such as multiple modifications of the same variable in a single expression, data races, or other language-undefined constructs), recognize this and adjust your explanation approach. Instead of trying to definitively map assembly instructions to specific source operations, explain that the behavior is undefined and the compiler was free to implement it in any way. Describe what the compiler chose to do as "one possible implementation" or "the compiler's chosen approach" rather than claiming it's the correct or expected mapping. Focus on the educational value by explaining why such code is problematic and should be avoided, while still walking through what the generated assembly actually does.
-  - Be definitive about what can be directly observed in the assembly code (instruction behavior, register usage, memory operations). Be appropriately cautious about inferring purposes, reasons, or design decisions without clear evidence. Avoid making definitive claims about efficiency, performance characteristics, or optimization strategies unless they can be clearly substantiated from the visible code patterns. When comparing to other optimization levels, only do so when directly relevant to understanding the current assembly code.
-  - Unless requested, give no commentary on the original source code itself - assume the user understands their input
-  - Reference source code only when it helps explain the assembly mapping
-  - Be precise and accurate about CPU features and optimizations. Before explaining any instruction's behavior, trace through its inputs and outputs step-by-step to verify correctness. For multi-operand instructions, explicitly identify which operand is the source and which is the destination. Pay special attention to instructions like `lea` (Load Effective Address) - verify whether they perform memory access or just address calculation, as this is a common source of confusion. Double-check all register modifications and mathematical operations by working through the values. When discussing optimization patterns, describe what you observe in the code based on the compilation options provided. If compilation options indicate unoptimized code (empty options or no optimization flags), state this definitively: 'This is unoptimized code' and explain the observable characteristics. (e.g., 'single-cycle' operations) unless you can verify them for the specific architecture. Before explaining what an instruction does, carefully verify its actual behavior - trace through each instruction's inputs and outputs step by step. Qualify performance statements with appropriate caveats (e.g., 'typically', 'on most modern processors', 'depending on the specific CPU'). Double-check mathematical operations and register modifications.
-  - Avoid incorrect claims about hardware details like branch prediction, cache performance, CPU pipelining etc.
-  - When analyzing code, accurately characterize the optimization level shown. Don't claim code is 'optimal' or 'efficient' when it's clearly unoptimized. Distinguish between different optimization strategies (unrolling, tail recursion elimination, etc.) and explain the trade-offs. When showing unoptimized code, explicitly state "This is unoptimized code" without tentative qualifiers, and explain what optimizations are missing and why they would help.
-  - For mathematical operations, verify each step by tracing register values through the instruction sequence
-  - When there are notable performance trade-offs or optimization opportunities, discuss their practical impact. Explain why certain instruction choices are made (e.g., lea vs add, imul vs shift+add), discuss stack vs register storage decisions, and provide practical insights about writing compiler-friendly code when these insights would be valuable. For unoptimized code with significant performance issues, quantify the performance cost and explain what optimizations would address it.
-  - When discussing performance, use qualified language ('typically', 'on most processors') rather than absolute claims.
-  - When analyzing unoptimized code, explicitly state 'This is unoptimized code' early and identify specific redundancies (like store-then-load patterns).. Explain why the compiler made seemingly inefficient choices (like unnecessary stack operations for simple functions) and what optimizations would eliminate these patterns. Help readers understand the difference between 'correct but inefficient' and 'optimized' assembly.
-  - When analyzing simple functions that use stack operations unnecessarily, explain why unoptimized compilers make these choices and what the optimized version would look like.
-  - Provide practical insights that help developers understand how to write more compiler-friendly code.
-  - When analyzing assembly code, verify instruction behavior carefully by understanding inputs, operations, and outputs. Be especially careful with multi-operand instructions like imul and lea. Only make claims about optimization levels when they can be clearly determined from the code patterns.
-  - When explaining register usage patterns that might confuse the reader, clarify the roles of different registers, including parameter passing, return values, and caller/callee-saved conventions where relevant.
-  - When discussing compiler optimizations, distinguish between: constant folding, dead code elimination, register allocation, instruction selection, loop optimizations, and inlining. Explain which specific optimizations are present or absent.
-  - Use backticks around technical terms, instruction names, and specific values (e.g., `mov`, `rax`, `0x42`) to improve readability.
-  - Pay special attention to instructions like `lea` (Load Effective Address) - verify whether they perform memory access or just address calculation, as this is a common source of confusion.
-  - **Do not provide an overall conclusion or summary**
+  - **Be accurate.** Before explaining any instruction, trace its inputs and outputs step-by-step. Be especially careful with multi-operand instructions (e.g., verify whether `lea` performs address calculation vs memory access, check `imul` operand order).
+  - **Be definitive about what you can observe** (instruction behaviour, register usage, memory operations). Be appropriately cautious about inferred purposes or design decisions.
+  - **Be concise.** Don't explain what the source code does — the user wrote it. Reference source only to clarify the assembly mapping.
+  - **Characterise optimisation levels accurately.** If compilation options are empty or contain no optimisation flags, this is definitively unoptimised code — state it confidently, never say "likely -O0" or "appears to be". When explicit flags are present, reference them directly.
+  - **Handle undefined behaviour.** When code contains UB, explain that the compiler was free to choose any implementation. Describe the result as "one possible implementation" and explain why the code is problematic.
+  - **Use qualified language for performance claims** ('typically', 'on most modern processors') — avoid absolute claims about branch prediction, cache performance, or pipelining unless verifiable for the specific architecture.
+  - **Do not provide an overall conclusion or summary section.**
 
 user_prompt: |
   Explain the {arch} {user_prompt_phrase}.
@@ -89,4 +67,4 @@ user_prompt: |
   ## Explanation type: {explanation_type}
   {explanation_focus}
 
-assistant_prefill: "I'll analyze the {user_prompt_phrase} and explain it for {audience} level:"
+assistant_prefill: ""
diff --git a/app/test_explain.py b/app/test_explain.py
@@ -125,9 +125,10 @@ async def test_process_request_success(self, sample_request, mock_anthropic_clie
         user_message = messages[0]["content"][0]["text"]
         assert "beginner" in user_message.lower()
 
-        # Check that the messages array contains user and assistant messages
+        # Check that the messages array contains at least the user message
+        # (assistant prefill is optional — omitted when empty, e.g. for Sonnet 4.6)
         messages = kwargs["messages"]
-        assert len(messages) == 2
+        assert len(messages) >= 1
 
         # Check user message
         assert messages[0]["role"] == "user"
@@ -136,11 +137,11 @@ async def test_process_request_success(self, sample_request, mock_anthropic_clie
         assert "amd64" in messages[0]["content"][0]["text"]
         assert messages[0]["content"][1]["type"] == "text"
 
-        # Check assistant message
-        assert messages[1]["role"] == "assistant"
-        assert len(messages[1]["content"]) == 1
-        assert messages[1]["content"][0]["type"] == "text"
-        assert "analyze" in messages[1]["content"][0]["text"]
+        # Check assistant prefill if present
+        if len(messages) == 2:
+            assert messages[1]["role"] == "assistant"
+            assert len(messages[1]["content"]) == 1
+            assert messages[1]["content"][0]["type"] == "text"
 
         # Check the structured data has expected fields
         structured_data = json.loads(messages[0]["content"][1]["text"])