Skip to content

Commit 509cc2b

Browse files
committed
Delete old article, add new article
1 parent b4283fa commit 509cc2b

File tree

2 files changed

+184
-168
lines changed

2 files changed

+184
-168
lines changed

articles/how_to_work_with_large_language_models.md

Lines changed: 0 additions & 168 deletions
This file was deleted.
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
---
2+
title: "LLMs 101: A Practical Introduction"
3+
description: "A hands-on, code-first introduction to large language models for Cookbook readers."
4+
last_updated: "2025-08-24"
5+
---
6+
7+
# LLMs 101: A Practical Introduction
8+
9+
> **Who this is for.** Developers who want a fast, working understanding of large language models and the knobs that matter in real apps.
10+
11+
## At a glance
12+
13+
```
14+
Text prompt
15+
↓ (tokenization)
16+
Tokens → Embeddings → [Transformer layers × N] → Next‑token probabilities
17+
↓ ↓
18+
Detokenization Sampling (temperature/top_p) → Output text
19+
```
20+
21+
- **LLMs** are neural networks (usually **transformers**) trained on lots of text to predict the next token.
22+
- **Tokenization** splits text into subword units; **embeddings** map tokens to vectors; transformer layers build context‑aware representations.
23+
- Generation repeats next‑token sampling until a stop condition (length or stop sequences) is met.
24+
25+
---
26+
27+
## Quick start: generate text
28+
29+
### Python
30+
31+
```python
32+
from openai import OpenAI
33+
34+
client = OpenAI()
35+
resp = client.responses.create(
36+
model="gpt-4o",
37+
instructions="You are a concise technical explainer.",
38+
input="In one paragraph, explain what a token is in an LLM."
39+
)
40+
print(resp.output_text)
41+
```
42+
43+
### JavaScript / TypeScript
44+
45+
```js
46+
import OpenAI from "openai";
47+
const client = new OpenAI();
48+
49+
const resp = await client.chat.completions.create({
50+
model: "gpt-4o",
51+
messages: [
52+
{ role: "system", content: "You are a concise technical explainer." },
53+
{ role: "user", content: "In one paragraph, explain what a token is in an LLM." }
54+
]
55+
});
56+
console.log(resp.choices[0].message.content);
57+
```
58+
59+
> **Tip.** Model names evolve; check your Models list before shipping. Prefer streaming for chat‑like UIs (see below).
60+
61+
---
62+
63+
## What can LLMs do?
64+
65+
Despite the name, LLMs can be **multi‑modal** when models and inputs support it (text, code, sometimes images/audio). Core text tasks:
66+
67+
- **Generate**: draft, rewrite, continue, or brainstorm.
68+
- **Transform**: translate, rephrase, format, classify, extract.
69+
- **Analyze**: summarize, compare, tag, or answer questions.
70+
- **Tool use / agents**: call functions or APIs as part of a loop to act.
71+
72+
These patterns compose into search, assistants, form‑fillers, data extraction, QA, and more.
73+
74+
---
75+
76+
## How LLMs work (just enough to be dangerous)
77+
78+
1. **Tokenization.** Input text → tokens (IDs). Whitespace and punctuation matter—“token‑budget math” is a real constraint.
79+
2. **Embeddings.** Each token ID becomes a vector; positions are encoded so order matters.
80+
3. **Transformer layers.** Self‑attention mixes information across positions so each token’s representation becomes **contextual** (richer than the raw embedding).
81+
4. **Decoding.** The model outputs a probability distribution over the next token.
82+
5. **Sampling.** Choose how “adventurous” generation is (see knobs below), append the token, and repeat until done.
83+
84+
---
85+
86+
## The knobs you’ll touch most
87+
88+
- **Temperature** *(0.0–2.0)* — Lower → more deterministic/boring; higher → more diverse/creative.
89+
- **Top‑p (nucleus)** *(0–1)* — Sample only from the smallest set of tokens whose cumulative probability ≤ *p*.
90+
- **Max output tokens** — Hard limit on output length; controls latency and cost.
91+
- **System / instructions** — Up‑front role, constraints, and style to steer behavior.
92+
- **Stop sequences** — Cleanly cut off output at known boundaries.
93+
- **Streaming** — Receive tokens as they’re generated; improves perceived latency.
94+
95+
**Practical defaults:** `temperature=0.2–0.7`, `top_p=1.0`, set a **max output** that fits your UI, and **stream** by default for chat UX.
96+
97+
---
98+
99+
## Make context do the heavy lifting
100+
101+
- **Context window.** Inputs + outputs share a finite token budget; plan prompts and retrieval to fit.
102+
- **Ground with your data (RAG).** Retrieve relevant snippets and include them in the prompt to improve factuality.
103+
- **Structured outputs.** Ask for JSON (and validate) when you need machine‑readable results.
104+
- **Few‑shot examples.** Provide 1–3 compact exemplars to stabilize format and tone.
105+
106+
---
107+
108+
## Minimal streaming example
109+
110+
### Python
111+
112+
```python
113+
from openai import OpenAI
114+
client = OpenAI()
115+
116+
with client.responses.stream(
117+
model="gpt-4o",
118+
input="Stream a two-sentence explanation of context windows."
119+
) as stream:
120+
for event in stream:
121+
if event.type == "response.output_text.delta":
122+
print(event.delta, end="")
123+
```
124+
125+
### JavaScript
126+
127+
```js
128+
import OpenAI from "openai";
129+
const client = new OpenAI();
130+
131+
const stream = await client.responses.stream({
132+
model: "gpt-4o",
133+
input: "Stream a two-sentence explanation of context windows."
134+
});
135+
136+
for await (const event of stream) {
137+
if (event.type === "response.output_text.delta") {
138+
process.stdout.write(event.delta);
139+
}
140+
}
141+
```
142+
143+
---
144+
145+
## Limitations (design around these)
146+
147+
- **Hallucinations.** Models can generate plausible but false statements. Ground with citations/RAG; validate critical outputs.
148+
- **Recency.** Models don’t inherently know the latest facts; retrieve or provide current data.
149+
- **Ambiguity.** Vague prompts → vague answers; specify domain, audience, length, and format.
150+
- **Determinism.** Even at `temperature=0`, responses may vary across runs/envs. Don’t promise bit‑for‑bit reproducibility.
151+
- **Cost & latency.** Longer prompts and bigger models are slower and costlier; iterate toward the smallest model that meets quality.
152+
153+
---
154+
155+
## Common gotchas
156+
157+
- **Characters ≠ tokens.** Budget both input and output to avoid truncation.
158+
- **Over‑prompting.** Prefer simple, testable instructions; add examples sparingly.
159+
- **Leaky formats.** If you need JSON, enforce it (schema + validators) and add a repair step.
160+
- **One prompt for everything.** Separate prompts per task/endpoint; keep them versioned and testable.
161+
- **Skipping evaluation.** Keep a tiny dataset of real tasks; score changes whenever you tweak prompts, models, or retrieval.
162+
163+
---
164+
165+
## Glossary
166+
167+
- **Token** — Small unit of text (≈ subword) used by models.
168+
- **Embedding** — Vector representation of a token or text span.
169+
- **Context window** — Max tokens the model can attend to at once (prompt + output).
170+
- **Temperature / top‑p** — Randomness controls during sampling.
171+
- **System / instructions** — Up‑front guidance that shapes responses.
172+
- **RAG** — Retrieval‑Augmented Generation; retrieve data and include it in the prompt.
173+
174+
---
175+
176+
## Where to go next
177+
178+
- Prompt patterns for **structured outputs**
179+
- **Retrieval‑augmented generation (RAG)** basics
180+
- **Evaluating** LLM quality (offline + online)
181+
- **Streaming UX** patterns and backpressure handling
182+
- **Safety** and policy‑aware prompting
183+
184+
> Adapted from a shorter draft and expanded with code-first guidance.

0 commit comments

Comments
 (0)