merge model pricing into models page

mkaput · mkaput · commit 7f4e55b8c8d9 · 2026-03-04T11:45:38.000+01:00
diff --git a/astro.config.ts b/astro.config.ts
@@ -97,7 +97,6 @@ export default defineConfig({
             { slug: "expanding-horizons/threads-context-and-caching" },
             { slug: "expanding-horizons/some-words-about-models" },
             { slug: "expanding-horizons/a-few-words-about-mcp" },
-            { slug: "expanding-horizons/model-pricing" },
             { slug: "expanding-horizons/what-to-read-next" },
           ],
         },
diff --git a/src/content/docs/expanding-horizons/model-pricing.md b/src/content/docs/expanding-horizons/model-pricing.md
diff --git a/src/content/docs/expanding-horizons/some-words-about-models.md b/src/content/docs/expanding-horizons/some-words-about-models.md
@@ -1,9 +1,9 @@
 ---
 title: Some words about models
-description: A concise comparison of popular coding models, including free-access options, strengths, weaknesses, and cost tradeoffs.
+description: "A concise guide to coding models: where to access them for free, how frontier options differ, and how API/subscription pricing affects spend."
 ---
 
-Choose models by reliability, cost, and workflow fit, then benchmark them on your own real tasks.
+Choose models by reliability, workflow fit, and pricing model, then benchmark them on your own real tasks.
 
 ## Where to access free models
 
@@ -33,3 +33,39 @@ Choose models by reliability, cost, and workflow fit, then benchmark them on you
   - MiniMax, Kimi, Qwen, etc. Very cheap. MiniMax M2.5 is very close to the frontier.
   - Smaller versions are often open-source.
   - Be careful to use US/EU inference providers only; check out [OpenCode Zen](https://opencode.ai/docs/zen/), as they host all models in the USA; avoid routing sensitive work through providers in other jurisdictions.
+
+## Model pricing
+
+There are two very different pricing worlds in AI tools: API pricing (pay per token) and app subscriptions (pay a flat monthly fee with usage limits).
+
+### API pricing
+
+On price trackers like [models.dev](https://models.dev/) and [llm-prices.com](https://www.llm-prices.com/), you'll usually see these fields:
+
+- **Input cost**: what you pay for non-cached input tokens sent to the model.
+- **Output cost**: what you pay for tokens generated by the model.
+- **Cache write cost**: what you pay when the provider stores a prompt prefix in cache (so it can be reused later).
+- **Cache read cost**: what you pay when later requests reuse that cached prefix.
+
+Simple mental model:
+
+```
+total cost = input + output + cache write + cache read
+```
+
+If you're integrating directly with an LLM API, lowering cost per request/session mostly means reducing the most expensive token categories:
+
+- Keep prompts stable at the top (system prompt, tool defs, long instructions) to maximize cache hits.
+- Move dynamic parts (timestamps, random IDs, volatile context) lower in the prompt so they don't invalidate the cached prefix.
+- Cap output length when possible (`max_tokens` / equivalent).
+- Keep threads compact. Good cache hit rates help, but each turn still adds some uncached tail tokens, and cache entries can expire/prune over long sessions.
+
+If you're using an agent harness, many of these optimizations are handled internally (prompt layout, caching, compaction). Your main cost levers are usually model choice and keeping tasks/threads scoped.
+
+### Subscriptions
+
+Subscriptions are different from API billing. You pay a monthly fee for usage inside a product, usually with fair-use limits or soft/hard caps. These plans do not include raw API credits for your own apps.
+
+For most people, this is the cheapest way to get heavy day-to-day usage. The effective subscription-vs-API ratio can swing a lot as vendors change limits, model mixes, and pricing.
+
+Common subscription options: [ChatGPT](https://openai.com/chatgpt/pricing/), [Claude](https://claude.com/pricing), [Cursor](https://cursor.com/pricing), [Factory](https://factory.ai/pricing), [OpenCode Go](https://opencode.ai/docs/go/).