Skip to content

Commit 7f4e55b

Browse files
committed
merge model pricing into models page
1 parent 8480d14 commit 7f4e55b

File tree

3 files changed

+38
-41
lines changed

3 files changed

+38
-41
lines changed

astro.config.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,6 @@ export default defineConfig({
9797
{ slug: "expanding-horizons/threads-context-and-caching" },
9898
{ slug: "expanding-horizons/some-words-about-models" },
9999
{ slug: "expanding-horizons/a-few-words-about-mcp" },
100-
{ slug: "expanding-horizons/model-pricing" },
101100
{ slug: "expanding-horizons/what-to-read-next" },
102101
],
103102
},

src/content/docs/expanding-horizons/model-pricing.md

Lines changed: 0 additions & 38 deletions
This file was deleted.

src/content/docs/expanding-horizons/some-words-about-models.md

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
title: Some words about models
3-
description: A concise comparison of popular coding models, including free-access options, strengths, weaknesses, and cost tradeoffs.
3+
description: "A concise guide to coding models: where to access them for free, how frontier options differ, and how API/subscription pricing affects spend."
44
---
55

6-
Choose models by reliability, cost, and workflow fit, then benchmark them on your own real tasks.
6+
Choose models by reliability, workflow fit, and pricing model, then benchmark them on your own real tasks.
77

88
## Where to access free models
99

@@ -33,3 +33,39 @@ Choose models by reliability, cost, and workflow fit, then benchmark them on you
3333
- MiniMax, Kimi, Qwen, etc. Very cheap. MiniMax M2.5 is very close to the frontier.
3434
- Smaller versions are often open-source.
3535
- Be careful to use US/EU inference providers only; check out [OpenCode Zen](https://opencode.ai/docs/zen/), as they host all models in the USA; avoid routing sensitive work through providers in other jurisdictions.
36+
37+
## Model pricing
38+
39+
There are two very different pricing worlds in AI tools: API pricing (pay per token) and app subscriptions (pay a flat monthly fee with usage limits).
40+
41+
### API pricing
42+
43+
On price trackers like [models.dev](https://models.dev/) and [llm-prices.com](https://www.llm-prices.com/), you'll usually see these fields:
44+
45+
- **Input cost**: what you pay for non-cached input tokens sent to the model.
46+
- **Output cost**: what you pay for tokens generated by the model.
47+
- **Cache write cost**: what you pay when the provider stores a prompt prefix in cache (so it can be reused later).
48+
- **Cache read cost**: what you pay when later requests reuse that cached prefix.
49+
50+
Simple mental model:
51+
52+
```
53+
total cost = input + output + cache write + cache read
54+
```
55+
56+
If you're integrating directly with an LLM API, lowering cost per request/session mostly means reducing the most expensive token categories:
57+
58+
- Keep prompts stable at the top (system prompt, tool defs, long instructions) to maximize cache hits.
59+
- Move dynamic parts (timestamps, random IDs, volatile context) lower in the prompt so they don't invalidate the cached prefix.
60+
- Cap output length when possible (`max_tokens` / equivalent).
61+
- Keep threads compact. Good cache hit rates help, but each turn still adds some uncached tail tokens, and cache entries can expire/prune over long sessions.
62+
63+
If you're using an agent harness, many of these optimizations are handled internally (prompt layout, caching, compaction). Your main cost levers are usually model choice and keeping tasks/threads scoped.
64+
65+
### Subscriptions
66+
67+
Subscriptions are different from API billing. You pay a monthly fee for usage inside a product, usually with fair-use limits or soft/hard caps. These plans do not include raw API credits for your own apps.
68+
69+
For most people, this is the cheapest way to get heavy day-to-day usage. The effective subscription-vs-API ratio can swing a lot as vendors change limits, model mixes, and pricing.
70+
71+
Common subscription options: [ChatGPT](https://openai.com/chatgpt/pricing/), [Claude](https://claude.com/pricing), [Cursor](https://cursor.com/pricing), [Factory](https://factory.ai/pricing), [OpenCode Go](https://opencode.ai/docs/go/).

0 commit comments

Comments
 (0)