Skip to content
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
26a67b2
update intro
paytonison Sep 15, 2025
bf278df
Add Payton Ison to authors.yaml
paytonison Sep 15, 2025
2161cbd
Add new article entry for LLMs 101
paytonison Sep 15, 2025
0238183
Update registry.yaml
paytonison Sep 16, 2025
1ac7f5c
Delete how_to_work_with_large_language_models.md
paytonison Sep 16, 2025
0464858
Merge branch 'main' into paytonison/introduction-rewrite
paytonison Sep 16, 2025
a6c2e27
Update registry.yaml
paytonison Sep 16, 2025
0699988
Update articles/openai-cookbook-llms-101.md
paytonison Sep 16, 2025
0aee202
Merge branch 'main' into paytonison/introduction-rewrite
paytonison Sep 17, 2025
a2be568
Merge branch 'openai:main' into paytonison/introduction-rewrite
paytonison Sep 17, 2025
0f0cfe8
Convert run-transformers.md guide to interactive notebook
paytonison Sep 17, 2025
da038a1
Make HF Transformers markdown into Jupyter notebook.
paytonison Sep 18, 2025
d06db98
Change file name.
paytonison Sep 18, 2025
91a95f3
Organize prompting-related articles into unified prompting directory …
Copilot Sep 18, 2025
2467c58
Update examples/agents_sdk/multi-agent-portfolio-collaboration/utils.py
paytonison Sep 18, 2025
463ff57
Revise README for prompting guides and examples
paytonison Sep 18, 2025
bfca2ea
1
paytonison Sep 19, 2025
fbef132
q
paytonison Sep 19, 2025
2752002
update intro
paytonison Sep 15, 2025
c477ac2
Add Payton Ison to authors.yaml
paytonison Sep 15, 2025
cf0acf5
Add new article entry for LLMs 101
paytonison Sep 15, 2025
2b9f65d
Update registry.yaml
paytonison Sep 16, 2025
b58f442
Delete how_to_work_with_large_language_models.md
paytonison Sep 16, 2025
b7749d3
Update registry.yaml
paytonison Sep 16, 2025
adde2ce
Update articles/openai-cookbook-llms-101.md
paytonison Sep 16, 2025
46d3c03
Organize prompting-related articles into unified prompting directory …
Copilot Sep 18, 2025
8c9c1cb
Update examples/agents_sdk/multi-agent-portfolio-collaboration/utils.py
paytonison Sep 18, 2025
1c1e541
Revise README for prompting guides and examples
paytonison Sep 18, 2025
a7cfb12
Merge branch 'openai:main' into transformers-edit
paytonison Sep 26, 2025
a69807b
Merge branch 'main' into paytonison/introduction-rewrite
paytonison Sep 26, 2025
6febee1
Merge branch 'paytonison/introduction-rewrite' of https://github.com/…
paytonison Sep 26, 2025
d7f570b
Revert "Merge branch 'paytonison/introduction-rewrite' of https://git…
paytonison Sep 26, 2025
bdacaa2
Merge branch 'transformers-edit' of https://github.com/paytonison/ope…
paytonison Sep 26, 2025
bf1895e
Merge branch 'prompt-bundle' of https://github.com/paytonison/openai-…
paytonison Sep 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 0 additions & 168 deletions articles/how_to_work_with_large_language_models.md

This file was deleted.

184 changes: 184 additions & 0 deletions articles/openai-cookbook-llms-101.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
---
title: "LLMs 101: A Practical Introduction"
description: "A hands-on, code-first introduction to large language models for Cookbook readers."
last_updated: "2025-08-24"
---

# LLMs 101: A Practical Introduction

> **Who this is for.** Developers who want a fast, working understanding of large language models and the knobs that matter in real apps.

## At a glance

```
Text prompt
↓ (tokenization)
Tokens → Embeddings → [Transformer layers × N] → Next‑token probabilities
↓ ↓
Detokenization Sampling (temperature/top_p) → Output text
```

- **LLMs** are neural networks (usually **transformers**) trained on lots of text to predict the next token.
- **Tokenization** splits text into subword units; **embeddings** map tokens to vectors; transformer layers build context‑aware representations.
- Generation repeats next‑token sampling until a stop condition (length or stop sequences) is met.

---

## Quick start: generate text

### Python

```python
from openai import OpenAI

client = OpenAI()
resp = client.responses.create(
model="gpt-4o",
instructions="You are a concise technical explainer.",
input="In one paragraph, explain what a token is in an LLM."
)
print(resp.output_text)
```

### JavaScript / TypeScript

```js
import OpenAI from "openai";
const client = new OpenAI();

const resp = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a concise technical explainer." },
{ role: "user", content: "In one paragraph, explain what a token is in an LLM." }
]
});
console.log(resp.choices[0].message.content);
```

> **Tip.** Model names evolve; check your Models list before shipping. Prefer streaming for chat‑like UIs (see below).

---

## What can LLMs do?

Despite the name, LLMs can be **multi‑modal** when models and inputs support it (text, code, sometimes images/audio). Core text tasks:

- **Generate**: draft, rewrite, continue, or brainstorm.
- **Transform**: translate, rephrase, format, classify, extract.
- **Analyze**: summarize, compare, tag, or answer questions.
- **Tool use / agents**: call functions or APIs as part of a loop to act.

These patterns compose into search, assistants, form‑fillers, data extraction, QA, and more.

---

## How LLMs work (just enough to be dangerous)

1. **Tokenization.** Input text → tokens (IDs). Whitespace and punctuation matter—“token‑budget math” is a real constraint.
2. **Embeddings.** Each token ID becomes a vector; positions are encoded so order matters.
3. **Transformer layers.** Self‑attention mixes information across positions so each token’s representation becomes **contextual** (richer than the raw embedding).
4. **Decoding.** The model outputs a probability distribution over the next token.
5. **Sampling.** Choose how “adventurous” generation is (see knobs below), append the token, and repeat until done.

---

## The knobs you’ll touch most

- **Temperature** *(0.0–2.0)* — Lower → more deterministic/boring; higher → more diverse/creative.
- **Top‑p (nucleus)** *(0–1)* — Sample only from the smallest set of tokens whose cumulative probability ≤ *p*.
- **Max output tokens** — Hard limit on output length; controls latency and cost.
- **System / instructions** — Up‑front role, constraints, and style to steer behavior.
- **Stop sequences** — Cleanly cut off output at known boundaries.
- **Streaming** — Receive tokens as they’re generated; improves perceived latency.

**Practical defaults:** `temperature=0.2–0.7`, `top_p=1.0`, set a **max output** that fits your UI, and **stream** by default for chat UX.

---

## Make context do the heavy lifting

- **Context window.** Inputs + outputs share a finite token budget; plan prompts and retrieval to fit.
- **Ground with your data (RAG).** Retrieve relevant snippets and include them in the prompt to improve factuality.
- **Structured outputs.** Ask for JSON (and validate) when you need machine‑readable results.
- **Few‑shot examples.** Provide 1–3 compact exemplars to stabilize format and tone.

---

## Minimal streaming example

### Python

```python
from openai import OpenAI
client = OpenAI()

with client.responses.stream(
model="gpt-4o",
input="Stream a two-sentence explanation of context windows."
) as stream:
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="")
```

### JavaScript

```js
import OpenAI from "openai";
const client = new OpenAI();

const stream = await client.responses.stream({
model: "gpt-4o",
input: "Stream a two-sentence explanation of context windows."
});

for await (const event of stream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta);
}
}
```

---

## Limitations (design around these)

- **Hallucinations.** Models can generate plausible but false statements. Ground with citations/RAG; validate critical outputs.
- **Recency.** Models don’t inherently know the latest facts; retrieve or provide current data.
- **Ambiguity.** Vague prompts → vague answers; specify domain, audience, length, and format.
- **Determinism.** Even at `temperature=0`, responses may vary across runs/envs. Don’t promise bit‑for‑bit reproducibility.
- **Cost & latency.** Longer prompts and bigger models are slower and costlier; iterate toward the smallest model that meets quality.

---

## Common gotchas

- **Characters ≠ tokens.** Budget both input and output to avoid truncation.
- **Over‑prompting.** Prefer simple, testable instructions; add examples sparingly.
- **Leaky formats.** If you need JSON, enforce it (schema + validators) and add a repair step.
- **One prompt for everything.** Separate prompts per task/endpoint; keep them versioned and testable.
- **Skipping evaluation.** Keep a tiny dataset of real tasks; score changes whenever you tweak prompts, models, or retrieval.

---

## Glossary

- **Token** — Small unit of text (≈ subword) used by models.
- **Embedding** — Vector representation of a token or text span.
- **Context window** — Max tokens the model can attend to at once (prompt + output).
- **Temperature / top‑p** — Randomness controls during sampling.
- **System / instructions** — Up‑front guidance that shapes responses.
- **RAG** — Retrieval‑Augmented Generation; retrieve data and include it in the prompt.

---

## Where to go next

- Prompt patterns for **structured outputs**
- **Retrieval‑augmented generation (RAG)** basics
- **Evaluating** LLM quality (offline + online)
- **Streaming UX** patterns and backpressure handling
- **Safety** and policy‑aware prompting

> Adapted from a shorter draft and expanded with code-first guidance.
5 changes: 5 additions & 0 deletions authors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -477,3 +477,8 @@ heejingithub:
name: "Heejin Cho"
website: "https://www.linkedin.com/in/heejc/"
avatar: "https://avatars.githubusercontent.com/u/169293861"

paytonison:
name: "Payton Ison"
website: "https://linkedin.com/in/paytonison"
avatar: "https://avatars.githubusercontent.com/u/148833579"
9 changes: 9 additions & 0 deletions registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2509,3 +2509,12 @@
- katiagg
tags:
- images

- title: LLMs 101: A Practical Introduction
path: articles/openai-cookbook-llms-101.md
date: 2025-09-15
authors:
- paytonison
tags:
- llms
- beginners