diff --git a/articles/how_to_work_with_large_language_models.md b/articles/how_to_work_with_large_language_models.md deleted file mode 100644 index cf6b48e1be..0000000000 --- a/articles/how_to_work_with_large_language_models.md +++ /dev/null @@ -1,168 +0,0 @@ -# How to work with large language models - -## How large language models work - -[Large language models][Large language models Blog Post] are functions that map text to text. Given an input string of text, a large language model predicts the text that should come next. - -The magic of large language models is that by being trained to minimize this prediction error over vast quantities of text, the models end up learning concepts useful for these predictions. For example, they learn: - -- how to spell -- how grammar works -- how to paraphrase -- how to answer questions -- how to hold a conversation -- how to write in many languages -- how to code -- etc. - -They do this by “reading” a large amount of existing text and learning how words tend to appear in context with other words, and uses what it has learned to predict the next most likely word that might appear in response to a user request, and each subsequent word after that. - -GPT-3 and GPT-4 power [many software products][OpenAI Customer Stories], including productivity apps, education apps, games, and more. - -## How to control a large language model - -Of all the inputs to a large language model, by far the most influential is the text prompt. - -Large language models can be prompted to produce output in a few ways: - -- **Instruction**: Tell the model what you want -- **Completion**: Induce the model to complete the beginning of what you want -- **Scenario**: Give the model a situation to play out -- **Demonstration**: Show the model what you want, with either: - - A few examples in the prompt - - Many hundreds or thousands of examples in a fine-tuning training dataset - -An example of each is shown below. - -### Instruction prompts - -Write your instruction at the top of the prompt (or at the bottom, or both), and the model will do its best to follow the instruction and then stop. Instructions can be detailed, so don't be afraid to write a paragraph explicitly detailing the output you want, just stay aware of how many [tokens](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them) the model can process. - -Example instruction prompt: - -```text -Extract the name of the author from the quotation below. - -“Some humans theorize that intelligent species go extinct before they can expand into outer space. If they're correct, then the hush of the night sky is the silence of the graveyard.” -― Ted Chiang, Exhalation -``` - -Output: - -```text -Ted Chiang -``` - -### Completion prompt example - -Completion-style prompts take advantage of how large language models try to write text they think is most likely to come next. To steer the model, try beginning a pattern or sentence that will be completed by the output you want to see. Relative to direct instructions, this mode of steering large language models can take more care and experimentation. In addition, the models won't necessarily know where to stop, so you will often need stop sequences or post-processing to cut off text generated beyond the desired output. - -Example completion prompt: - -```text -“Some humans theorize that intelligent species go extinct before they can expand into outer space. If they're correct, then the hush of the night sky is the silence of the graveyard.” -― Ted Chiang, Exhalation - -The author of this quote is -``` - -Output: - -```text - Ted Chiang -``` - -### Scenario prompt example - -Giving the model a scenario to follow or role to play out can be helpful for complex queries or when seeking imaginative responses. When using a hypothetical prompt, you set up a situation, problem, or story, and then ask the model to respond as if it were a character in that scenario or an expert on the topic. - -Example scenario prompt: - -```text -Your role is to extract the name of the author from any given text - -“Some humans theorize that intelligent species go extinct before they can expand into outer space. If they're correct, then the hush of the night sky is the silence of the graveyard.” -― Ted Chiang, Exhalation -``` - -Output: - -```text - Ted Chiang -``` - -### Demonstration prompt example (few-shot learning) - -Similar to completion-style prompts, demonstrations can show the model what you want it to do. This approach is sometimes called few-shot learning, as the model learns from a few examples provided in the prompt. - -Example demonstration prompt: - -```text -Quote: -“When the reasoning mind is forced to confront the impossible again and again, it has no choice but to adapt.” -― N.K. Jemisin, The Fifth Season -Author: N.K. Jemisin - -Quote: -“Some humans theorize that intelligent species go extinct before they can expand into outer space. If they're correct, then the hush of the night sky is the silence of the graveyard.” -― Ted Chiang, Exhalation -Author: -``` - -Output: - -```text - Ted Chiang -``` - -### Fine-tuned prompt example - -With enough training examples, you can [fine-tune][Fine Tuning Docs] a custom model. In this case, instructions become unnecessary, as the model can learn the task from the training data provided. However, it can be helpful to include separator sequences (e.g., `->` or `###` or any string that doesn't commonly appear in your inputs) to tell the model when the prompt has ended and the output should begin. Without separator sequences, there is a risk that the model continues elaborating on the input text rather than starting on the answer you want to see. - -Example fine-tuned prompt (for a model that has been custom trained on similar prompt-completion pairs): - -```text -“Some humans theorize that intelligent species go extinct before they can expand into outer space. If they're correct, then the hush of the night sky is the silence of the graveyard.” -― Ted Chiang, Exhalation - -### - - -``` - -Output: - -```text - Ted Chiang -``` - -## Code Capabilities - -Large language models aren't only great at text - they can be great at code too. OpenAI's [GPT-4][GPT-4 and GPT-4 Turbo] model is a prime example. - -GPT-4 powers [numerous innovative products][OpenAI Customer Stories], including: - -- [GitHub Copilot] (autocompletes code in Visual Studio and other IDEs) -- [Replit](https://replit.com/) (can complete, explain, edit and generate code) -- [Cursor](https://cursor.sh/) (build software faster in an editor designed for pair-programming with AI) - -GPT-4 is more advanced than previous models like `gpt-3.5-turbo-instruct`. But, to get the best out of GPT-4 for coding tasks, it's still important to give clear and specific instructions. As a result, designing good prompts can take more care. - -### More prompt advice - -For more prompt examples, visit [OpenAI Examples][OpenAI Examples]. - -In general, the input prompt is the best lever for improving model outputs. You can try tricks like: - -- **Be more specific** E.g., if you want the output to be a comma separated list, ask it to return a comma separated list. If you want it to say "I don't know" when it doesn't know the answer, tell it 'Say "I don't know" if you do not know the answer.' The more specific your instructions, the better the model can respond. -- **Provide Context**: Help the model understand the bigger picture of your request. This could be background information, examples/demonstrations of what you want or explaining the purpose of your task. -- **Ask the model to answer as if it was an expert.** Explicitly asking the model to produce high quality output or output as if it was written by an expert can induce the model to give higher quality answers that it thinks an expert would write. Phrases like "Explain in detail" or "Describe step-by-step" can be effective. -- **Prompt the model to write down the series of steps explaining its reasoning.** If understanding the 'why' behind an answer is important, prompt the model to include its reasoning. This can be done by simply adding a line like "[Let's think step by step](https://arxiv.org/abs/2205.11916)" before each answer. - -[Fine Tuning Docs]: https://platform.openai.com/docs/guides/fine-tuning -[OpenAI Customer Stories]: https://openai.com/customer-stories -[Large language models Blog Post]: https://openai.com/research/better-language-models -[GitHub Copilot]: https://github.com/features/copilot/ -[GPT-4 and GPT-4 Turbo]: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo -[GPT3 Apps Blog Post]: https://openai.com/blog/gpt-3-apps/ -[OpenAI Examples]: https://platform.openai.com/examples diff --git a/articles/openai-cookbook-llms-101.md b/articles/openai-cookbook-llms-101.md new file mode 100644 index 0000000000..f306effc2f --- /dev/null +++ b/articles/openai-cookbook-llms-101.md @@ -0,0 +1,184 @@ +--- +title: "LLMs 101: A Practical Introduction" +description: "A hands-on, code-first introduction to large language models for Cookbook readers." +last_updated: "2025-08-24" +--- + +# LLMs 101: A Practical Introduction + +> **Who this is for.** Developers who want a fast, working understanding of large language models and the knobs that matter in real apps. + +## At a glance + +``` +Text prompt + ↓ (tokenization) +Tokens → Embeddings → [Transformer layers × N] → Next‑token probabilities + ↓ ↓ +Detokenization Sampling (temperature/top_p) → Output text +``` + +- **LLMs** are neural networks (usually **transformers**) trained on lots of text to predict the next token. +- **Tokenization** splits text into subword units; **embeddings** map tokens to vectors; transformer layers build context‑aware representations. +- Generation repeats next‑token sampling until a stop condition (length or stop sequences) is met. + +--- + +## Quick start: generate text + +### Python + +```python +from openai import OpenAI + +client = OpenAI() +resp = client.responses.create( + model="gpt-4o", + instructions="You are a concise technical explainer.", + input="In one paragraph, explain what a token is in an LLM." +) +print(resp.output_text) +``` + +### JavaScript / TypeScript + +```js +import OpenAI from "openai"; +const client = new OpenAI(); + +const resp = await client.chat.completions.create({ + model: "gpt-4o", + messages: [ + { role: "system", content: "You are a concise technical explainer." }, + { role: "user", content: "In one paragraph, explain what a token is in an LLM." } + ] +}); +console.log(resp.choices[0].message.content); +``` + +> **Tip.** Model names evolve; check your Models list before shipping. Prefer streaming for chat‑like UIs (see below). + +--- + +## What can LLMs do? + +Despite the name, LLMs can be **multi‑modal** when models and inputs support it (text, code, sometimes images/audio). Core text tasks: + +- **Generate**: draft, rewrite, continue, or brainstorm. +- **Transform**: translate, rephrase, format, classify, extract. +- **Analyze**: summarize, compare, tag, or answer questions. +- **Tool use / agents**: call functions or APIs as part of a loop to act. + +These patterns compose into search, assistants, form‑fillers, data extraction, QA, and more. + +--- + +## How LLMs work (just enough to be dangerous) + +1. **Tokenization.** Input text → tokens (IDs). Whitespace and punctuation matter—“token‑budget math” is a real constraint. +2. **Embeddings.** Each token ID becomes a vector; positions are encoded so order matters. +3. **Transformer layers.** Self‑attention mixes information across positions so each token’s representation becomes **contextual** (richer than the raw embedding). +4. **Decoding.** The model outputs a probability distribution over the next token. +5. **Sampling.** Choose how “adventurous” generation is (see knobs below), append the token, and repeat until done. + +--- + +## The knobs you’ll touch most + +- **Temperature** *(0.0–2.0)* — Lower → more deterministic/boring; higher → more diverse/creative. +- **Top‑p (nucleus)** *(0–1)* — Sample only from the smallest set of tokens whose cumulative probability ≤ *p*. +- **Max output tokens** — Hard limit on output length; controls latency and cost. +- **System / instructions** — Up‑front role, constraints, and style to steer behavior. +- **Stop sequences** — Cleanly cut off output at known boundaries. +- **Streaming** — Receive tokens as they’re generated; improves perceived latency. + +**Practical defaults:** `temperature=0.2–0.7`, `top_p=1.0`, set a **max output** that fits your UI, and **stream** by default for chat UX. + +--- + +## Make context do the heavy lifting + +- **Context window.** Inputs + outputs share a finite token budget; plan prompts and retrieval to fit. +- **Ground with your data (RAG).** Retrieve relevant snippets and include them in the prompt to improve factuality. +- **Structured outputs.** Ask for JSON (and validate) when you need machine‑readable results. +- **Few‑shot examples.** Provide 1–3 compact exemplars to stabilize format and tone. + +--- + +## Minimal streaming example + +### Python + +```python +from openai import OpenAI +client = OpenAI() + +with client.responses.stream( + model="gpt-4o", + input="Stream a two-sentence explanation of context windows." +) as stream: + for event in stream: + if event.type == "response.output_text.delta": + print(event.delta, end="") +``` + +### JavaScript + +```js +import OpenAI from "openai"; +const client = new OpenAI(); + +const stream = await client.responses.stream({ + model: "gpt-4o", + input: "Stream a two-sentence explanation of context windows." +}); + +for await (const event of stream) { + if (event.type === "response.output_text.delta") { + process.stdout.write(event.delta); + } +} +``` + +--- + +## Limitations (design around these) + +- **Hallucinations.** Models can generate plausible but false statements. Ground with citations/RAG; validate critical outputs. +- **Recency.** Models don’t inherently know the latest facts; retrieve or provide current data. +- **Ambiguity.** Vague prompts → vague answers; specify domain, audience, length, and format. +- **Determinism.** Even at `temperature=0`, responses may vary across runs/envs. Don’t promise bit‑for‑bit reproducibility. +- **Cost & latency.** Longer prompts and bigger models are slower and costlier; iterate toward the smallest model that meets quality. + +--- + +## Common gotchas + +- **Characters ≠ tokens.** Budget both input and output to avoid truncation. +- **Over‑prompting.** Prefer simple, testable instructions; add examples sparingly. +- **Leaky formats.** If you need JSON, enforce it (schema + validators) and add a repair step. +- **One prompt for everything.** Separate prompts per task/endpoint; keep them versioned and testable. +- **Skipping evaluation.** Keep a tiny dataset of real tasks; score changes whenever you tweak prompts, models, or retrieval. + +--- + +## Glossary + +- **Token** — Small unit of text (≈ subword) used by models. +- **Embedding** — Vector representation of a token or text span. +- **Context window** — Max tokens the model can attend to at once (prompt + output). +- **Temperature / top‑p** — Randomness controls during sampling. +- **System / instructions** — Up‑front guidance that shapes responses. +- **RAG** — Retrieval‑Augmented Generation; retrieve data and include it in the prompt. + +--- + +## Where to go next + +- Prompt patterns for **structured outputs** +- **Retrieval‑augmented generation (RAG)** basics +- **Evaluating** LLM quality (offline + online) +- **Streaming UX** patterns and backpressure handling +- **Safety** and policy‑aware prompting + +> Adapted from a shorter draft and expanded with code-first guidance. diff --git a/authors.yaml b/authors.yaml index cd407f6c69..ac52be18a2 100644 --- a/authors.yaml +++ b/authors.yaml @@ -499,4 +499,8 @@ himadri: website: "https://www.linkedin.com/in/himadri-acharya-086ba261/" avatar: "https://avatars.githubusercontent.com/u/14100684?v=4" - \ No newline at end of file + +paytonison: + name: "Payton Ison" + website: "https://linkedin.com/in/paytonison" + avatar: "https://avatars.githubusercontent.com/u/148833579?v=4" diff --git a/registry.yaml b/registry.yaml index 3b38ccbe10..7e637d8ba0 100644 --- a/registry.yaml +++ b/registry.yaml @@ -2541,5 +2541,12 @@ - codex +- title: LLMs 101 - A Practical Introduction + path: articles/openai-cookbook-llms-101.md + date: 2025-09-26 + authors: + - paytonison + tags: + - introduction + - beginners -