Skip to content

Commit 26a67b2

Browse files
committed
update intro
1 parent 7433ba1 commit 26a67b2

File tree

1 file changed

+182
-0
lines changed

1 file changed

+182
-0
lines changed
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
---
2+
title: "LLMs 101: A Practical Introduction"
3+
description: "A hands-on, code-first introduction to large language models for Cookbook readers."
4+
last_updated: "2025-08-24"
5+
---
6+
7+
# LLMs 101: A Practical Introduction
8+
9+
> **Who this is for.** Developers who want a fast, working understanding of large language models and the knobs that matter in real apps.
10+
11+
## At a glance
12+
13+
```
14+
Text prompt
15+
↓ (tokenization)
16+
Tokens → Embeddings → [Transformer layers × N] → Next‑token probabilities
17+
↓ ↓
18+
Detokenization Sampling (temperature/top_p) → Output text
19+
```
20+
21+
- **LLMs** are neural networks (usually **transformers**) trained on lots of text to predict the next token.
22+
- **Tokenization** splits text into subword units; **embeddings** map tokens to vectors; transformer layers build context‑aware representations.
23+
- Generation repeats next‑token sampling until a stop condition (length or stop sequences) is met.
24+
25+
---
26+
27+
## Quick start: generate text
28+
29+
### Python
30+
31+
```python
32+
from openai import OpenAI
33+
34+
client = OpenAI()
35+
resp = client.responses.create(
36+
model="gpt-4o",
37+
instructions="You are a concise technical explainer.",
38+
input="In one paragraph, explain what a token is in an LLM."
39+
)
40+
print(resp.output_text)
41+
```
42+
43+
### JavaScript / TypeScript
44+
45+
```js
46+
import OpenAI from "openai";
47+
const client = new OpenAI();
48+
49+
const resp = await client.responses.create({
50+
model: "gpt-4o",
51+
instructions: "You are a concise technical explainer.",
52+
input: "In one paragraph, explain what a token is in an LLM."
53+
});
54+
console.log(resp.output_text);
55+
```
56+
57+
> **Tip.** Model names evolve; check your Models list before shipping. Prefer streaming for chat‑like UIs (see below).
58+
59+
---
60+
61+
## What can LLMs do?
62+
63+
Despite the name, LLMs can be **multi‑modal** when models and inputs support it (text, code, sometimes images/audio). Core text tasks:
64+
65+
- **Generate**: draft, rewrite, continue, or brainstorm.
66+
- **Transform**: translate, rephrase, format, classify, extract.
67+
- **Analyze**: summarize, compare, tag, or answer questions.
68+
- **Tool use / agents**: call functions or APIs as part of a loop to act.
69+
70+
These patterns compose into search, assistants, form‑fillers, data extraction, QA, and more.
71+
72+
---
73+
74+
## How LLMs work (just enough to be dangerous)
75+
76+
1. **Tokenization.** Input text → tokens (IDs). Whitespace and punctuation matter—“token‑budget math” is a real constraint.
77+
2. **Embeddings.** Each token ID becomes a vector; positions are encoded so order matters.
78+
3. **Transformer layers.** Self‑attention mixes information across positions so each token’s representation becomes **contextual** (richer than the raw embedding).
79+
4. **Decoding.** The model outputs a probability distribution over the next token.
80+
5. **Sampling.** Choose how “adventurous” generation is (see knobs below), append the token, and repeat until done.
81+
82+
---
83+
84+
## The knobs you’ll touch most
85+
86+
- **Temperature** *(0.0–2.0)* — Lower → more deterministic/boring; higher → more diverse/creative.
87+
- **Top‑p (nucleus)** *(0–1)* — Sample only from the smallest set of tokens whose cumulative probability ≤ *p*.
88+
- **Max output tokens** — Hard limit on output length; controls latency and cost.
89+
- **System / instructions** — Up‑front role, constraints, and style to steer behavior.
90+
- **Stop sequences** — Cleanly cut off output at known boundaries.
91+
- **Streaming** — Receive tokens as they’re generated; improves perceived latency.
92+
93+
**Practical defaults:** `temperature=0.2–0.7`, `top_p=1.0`, set a **max output** that fits your UI, and **stream** by default for chat UX.
94+
95+
---
96+
97+
## Make context do the heavy lifting
98+
99+
- **Context window.** Inputs + outputs share a finite token budget; plan prompts and retrieval to fit.
100+
- **Ground with your data (RAG).** Retrieve relevant snippets and include them in the prompt to improve factuality.
101+
- **Structured outputs.** Ask for JSON (and validate) when you need machine‑readable results.
102+
- **Few‑shot examples.** Provide 1–3 compact exemplars to stabilize format and tone.
103+
104+
---
105+
106+
## Minimal streaming example
107+
108+
### Python
109+
110+
```python
111+
from openai import OpenAI
112+
client = OpenAI()
113+
114+
with client.responses.stream(
115+
model="gpt-4o",
116+
input="Stream a two-sentence explanation of context windows."
117+
) as stream:
118+
for event in stream:
119+
if event.type == "response.output_text.delta":
120+
print(event.delta, end="")
121+
```
122+
123+
### JavaScript
124+
125+
```js
126+
import OpenAI from "openai";
127+
const client = new OpenAI();
128+
129+
const stream = await client.responses.stream({
130+
model: "gpt-4o",
131+
input: "Stream a two-sentence explanation of context windows."
132+
});
133+
134+
for await (const event of stream) {
135+
if (event.type === "response.output_text.delta") {
136+
process.stdout.write(event.delta);
137+
}
138+
}
139+
```
140+
141+
---
142+
143+
## Limitations (design around these)
144+
145+
- **Hallucinations.** Models can generate plausible but false statements. Ground with citations/RAG; validate critical outputs.
146+
- **Recency.** Models don’t inherently know the latest facts; retrieve or provide current data.
147+
- **Ambiguity.** Vague prompts → vague answers; specify domain, audience, length, and format.
148+
- **Determinism.** Even at `temperature=0`, responses may vary across runs/envs. Don’t promise bit‑for‑bit reproducibility.
149+
- **Cost & latency.** Longer prompts and bigger models are slower and costlier; iterate toward the smallest model that meets quality.
150+
151+
---
152+
153+
## Common gotchas
154+
155+
- **Characters ≠ tokens.** Budget both input and output to avoid truncation.
156+
- **Over‑prompting.** Prefer simple, testable instructions; add examples sparingly.
157+
- **Leaky formats.** If you need JSON, enforce it (schema + validators) and add a repair step.
158+
- **One prompt for everything.** Separate prompts per task/endpoint; keep them versioned and testable.
159+
- **Skipping evaluation.** Keep a tiny dataset of real tasks; score changes whenever you tweak prompts, models, or retrieval.
160+
161+
---
162+
163+
## Glossary
164+
165+
- **Token** — Small unit of text (≈ subword) used by models.
166+
- **Embedding** — Vector representation of a token or text span.
167+
- **Context window** — Max tokens the model can attend to at once (prompt + output).
168+
- **Temperature / top‑p** — Randomness controls during sampling.
169+
- **System / instructions** — Up‑front guidance that shapes responses.
170+
- **RAG** — Retrieval‑Augmented Generation; retrieve data and include it in the prompt.
171+
172+
---
173+
174+
## Where to go next
175+
176+
- Prompt patterns for **structured outputs**
177+
- **Retrieval‑augmented generation (RAG)** basics
178+
- **Evaluating** LLM quality (offline + online)
179+
- **Streaming UX** patterns and backpressure handling
180+
- **Safety** and policy‑aware prompting
181+
182+
> Adapted from a shorter draft and expanded with code-first guidance.

0 commit comments

Comments
 (0)