|
1 | 1 | --- |
2 | 2 | title: Some words about models |
3 | | -description: A concise comparison of popular coding models, including free-access options, strengths, weaknesses, and cost tradeoffs. |
| 3 | +description: "A concise guide to coding models: where to access them for free, how frontier options differ, and how API/subscription pricing affects spend." |
4 | 4 | --- |
5 | 5 |
|
6 | | -Choose models by reliability, cost, and workflow fit, then benchmark them on your own real tasks. |
| 6 | +Choose models by reliability, workflow fit, and pricing model, then benchmark them on your own real tasks. |
7 | 7 |
|
8 | 8 | ## Where to access free models |
9 | 9 |
|
@@ -33,3 +33,39 @@ Choose models by reliability, cost, and workflow fit, then benchmark them on you |
33 | 33 | - MiniMax, Kimi, Qwen, etc. Very cheap. MiniMax M2.5 is very close to the frontier. |
34 | 34 | - Smaller versions are often open-source. |
35 | 35 | - Be careful to use US/EU inference providers only; check out [OpenCode Zen](https://opencode.ai/docs/zen/), as they host all models in the USA; avoid routing sensitive work through providers in other jurisdictions. |
| 36 | + |
| 37 | +## Model pricing |
| 38 | + |
| 39 | +There are two very different pricing worlds in AI tools: API pricing (pay per token) and app subscriptions (pay a flat monthly fee with usage limits). |
| 40 | + |
| 41 | +### API pricing |
| 42 | + |
| 43 | +On price trackers like [models.dev](https://models.dev/) and [llm-prices.com](https://www.llm-prices.com/), you'll usually see these fields: |
| 44 | + |
| 45 | +- **Input cost**: what you pay for non-cached input tokens sent to the model. |
| 46 | +- **Output cost**: what you pay for tokens generated by the model. |
| 47 | +- **Cache write cost**: what you pay when the provider stores a prompt prefix in cache (so it can be reused later). |
| 48 | +- **Cache read cost**: what you pay when later requests reuse that cached prefix. |
| 49 | + |
| 50 | +Simple mental model: |
| 51 | + |
| 52 | +``` |
| 53 | +total cost = input + output + cache write + cache read |
| 54 | +``` |
| 55 | + |
| 56 | +If you're integrating directly with an LLM API, lowering cost per request/session mostly means reducing the most expensive token categories: |
| 57 | + |
| 58 | +- Keep prompts stable at the top (system prompt, tool defs, long instructions) to maximize cache hits. |
| 59 | +- Move dynamic parts (timestamps, random IDs, volatile context) lower in the prompt so they don't invalidate the cached prefix. |
| 60 | +- Cap output length when possible (`max_tokens` / equivalent). |
| 61 | +- Keep threads compact. Good cache hit rates help, but each turn still adds some uncached tail tokens, and cache entries can expire/prune over long sessions. |
| 62 | + |
| 63 | +If you're using an agent harness, many of these optimizations are handled internally (prompt layout, caching, compaction). Your main cost levers are usually model choice and keeping tasks/threads scoped. |
| 64 | + |
| 65 | +### Subscriptions |
| 66 | + |
| 67 | +Subscriptions are different from API billing. You pay a monthly fee for usage inside a product, usually with fair-use limits or soft/hard caps. These plans do not include raw API credits for your own apps. |
| 68 | + |
| 69 | +For most people, this is the cheapest way to get heavy day-to-day usage. The effective subscription-vs-API ratio can swing a lot as vendors change limits, model mixes, and pricing. |
| 70 | + |
| 71 | +Common subscription options: [ChatGPT](https://openai.com/chatgpt/pricing/), [Claude](https://claude.com/pricing), [Cursor](https://cursor.com/pricing), [Factory](https://factory.ai/pricing), [OpenCode Go](https://opencode.ai/docs/go/). |
0 commit comments