diff --git a/AGENTS.md b/AGENTS.md deleted file mode 100644 index 86634ebf93..0000000000 --- a/AGENTS.md +++ /dev/null @@ -1,23 +0,0 @@ -# Repository Guidelines - -## Project Structure & Module Organization -The cookbook is organized around runnable examples and reference articles for OpenAI APIs. Place notebooks and Python scripts under `examples//`, grouping related assets inside topic subfolders (for example, `examples/agents_sdk/`). Narrative guides and long-form docs live in `articles/`, and shared diagrams or screenshots belong in `images/`. Update `registry.yaml` whenever you add content so it appears on cookbook.openai.com, and add new author metadata in `authors.yaml` if you want custom attribution. Keep large datasets outside the repo; instead, document how to fetch them in the notebook. - -## Build, Test, and Development Commands -Use a virtual environment to isolate dependencies: -- `python -m venv .venv && source .venv/bin/activate` -- `pip install -r examples//requirements.txt` (each sample lists only what it needs) -- `jupyter lab` or `jupyter notebook` to develop interactively -- `python .github/scripts/check_notebooks.py` to validate notebook structure before pushing - -## Coding Style & Naming Conventions -Write Python to PEP 8 with four-space indentation, descriptive variable names, and concise docstrings that explain API usage choices. Name new notebooks with lowercase, dash-or-underscore-separated phrases that match their directory—for example `examples/gpt-5/prompt-optimization-cookbook.ipynb`. Keep markdown cells focused and prefer numbered steps for multi-part workflows. Store secrets in environment variables such as `OPENAI_API_KEY`; never hard-code keys inside notebooks. - -## Testing Guidelines -Execute notebooks top-to-bottom after installing dependencies and clear lingering execution counts before committing. For Python modules or utilities, include self-check cells or lightweight `pytest` snippets and show how to run them (for example, `pytest examples/object_oriented_agentic_approach/tests`). When contributions depend on external services, mock responses or gate the cells behind clearly labeled opt-in flags. - -## Commit & Pull Request Guidelines -Use concise, imperative commit messages that describe the change scope (e.g., "Add agent portfolio collaboration demo"). Every PR should provide a summary, motivation, and self-review, and must tick the registry and authors checklist from `.github/pull_request_template.md`. Link issues when applicable and attach screenshots or output snippets for UI-heavy content. Confirm CI notebook validation passes locally before requesting review. - -## Metadata & Publication Workflow -New or relocated content must have an entry in `registry.yaml` with an accurate path, date, and tag set so the static site generator includes it. When collaborating, coordinate author slugs in `authors.yaml` to avoid duplicates, and run `python -m yaml lint registry.yaml` (or your preferred YAML linter) to catch syntax errors before submitting. diff --git a/articles/gpt-oss/run-transformers.ipynb b/articles/gpt-oss/run-transformers.ipynb new file mode 100644 index 0000000000..107d6b853f --- /dev/null +++ b/articles/gpt-oss/run-transformers.ipynb @@ -0,0 +1,647 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# How to run gpt-oss with Hugging Face Transformers\n", + "\n", + "The Transformers library by Hugging Face provides a flexible way to load and run large language models locally or on a server. This guide will walk you through running [OpenAI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) or [OpenAI gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) using Transformers, either with a high-level pipeline or via low-level `generate` calls with raw token IDs.\n", + "\n", + "We'll cover the use of [OpenAI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) or [OpenAI gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) with the high-level pipeline abstraction, low-level `generate` calls, and serving models locally with `transformers serve`, in a way compatible with the Responses API.\n", + "\n", + "In this guide we'll run through various optimised ways to run the **gpt-oss models via Transformers.**\n", + "\n", + "**Bonus:** You can also fine-tune models via transformers, [check out our fine-tuning guide here](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transformers)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pick your model\n", + "\n", + "Both **gpt-oss** models are available on Hugging Face:\n", + "\n", + "- **`openai/gpt-oss-20b`**\n", + " - ~16GB VRAM requirement when using MXFP4\n", + " - Great for single high-end consumer GPUs\n", + "- **`openai/gpt-oss-120b`**\n", + " - Requires ≥60GB VRAM or multi-GPU setup\n", + " - Ideal for H100-class hardware\n", + "\n", + "Both are **MXFP4 quantized** by default. Please, note that MXFP4 is supported in Hopper or later architectures. This includes data center GPUs such as H100 or GB200, as well as the latest RTX 50xx family of consumer cards.\n", + "\n", + "If you use `bfloat16` instead of MXFP4, memory consumption will be larger (~48 GB for the 20b parameter model)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Quick setup\n", + "\n", + "### 1. Install dependencies\n", + "\n", + "It's recommended to create a fresh Python environment. Install transformers, accelerate, as well as the Triton kernels for MXFP4 compatibility:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# NOTE: The current version of HF Transformers has a glitch where an outdated torchvision dependency prevents transformers module from importing pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -U transformers kernels torch accelerate torchvision" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Quick inference with pipeline\n", + "\n", + "The easiest way to run the gpt-oss models is with the Transformers high-level `pipeline` API:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "`torch_dtype` is deprecated! Use `dtype` instead!\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "f3b8482ed938472083ac6cd012156902", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Fetching 40 files: 0%| | 0/40 [00:00<|final|>\"\n", + "end_marker = \"<|end|>\"\n", + "if final_marker in text:\n", + " text = text.split(final_marker, 1)[1]\n", + "if end_marker in text:\n", + " text = text.split(end_marker, 1)[0]\n", + "print(\"Final:\", text.strip()[:1000])\n" + ] + }, + { + "cell_type": "markdown", + "id": "b31988a1", + "metadata": {}, + "source": [ + "## Streaming tokens (prints only assistant **final**)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b48c668", + "metadata": {}, + "outputs": [], + "source": [ + "import threading, sys\n", + "from transformers import TextIteratorStreamer\n", + "\n", + "def stream_final_only(model, tokenizer, messages, generate_kwargs):\n", + " inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors=\"pt\").to(model.device)\n", + " streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=False)\n", + "\n", + " t = threading.Thread(target=model.generate, kwargs=dict(input_ids=inputs, streamer=streamer, **generate_kwargs))\n", + " t.start()\n", + "\n", + " buf, printing_final = \"\", False\n", + " final_token, end_token = \"<|assistant|><|final|>\", \"<|end|>\"\n", + " for piece in streamer:\n", + " buf += piece\n", + " if not printing_final and final_token in buf:\n", + " printing_final = True\n", + " out = buf.split(final_token, 1)[1].replace(end_token, \"\")\n", + " sys.stdout.write(out); sys.stdout.flush()\n", + " buf = \"\"\n", + " elif printing_final:\n", + " sys.stdout.write(piece.replace(end_token, \"\")); sys.stdout.flush()\n", + " t.join()\n", + "\n", + "messages2 = [\n", + " {\"role\": \"system\", \"content\": \"Answer briefly.\"},\n", + " {\"role\": \"user\", \"content\": \"What’s the difference between analysis and final channels?\"}\n", + "]\n", + "gen = dict(max_new_tokens=128, do_sample=True, temperature=0.7, top_p=0.9)\n", + "stream_final_only(model, tokenizer, messages2, gen)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Chat template and tool calling\n", + "\n", + "OpenAI gpt-oss models use the [harmony response format](https://cookbook.openai.com/article/harmony) for structuring messages, including reasoning and tool calls.\n", + "\n", + "To construct prompts you can use the built-in chat template of Transformers. Alternatively, you can install and use the [openai-harmony library](https://github.com/openai/harmony) for more control.\n", + "\n", + "### Using the built-in chat template:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Example with system prompt and chat template\n", + "messages = [\n", + " {\"role\": \"system\", \"content\": \"Always respond in riddles\"},\n", + " {\"role\": \"user\", \"content\": \"What is the weather like in Madrid?\"},\n", + "]\n", + "\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt=True,\n", + " return_tensors=\"pt\",\n", + " return_dict=True,\n", + ").to(model.device)\n", + "\n", + "# Generate with the chat template\n", + "generated = model.generate(\n", + " **inputs,\n", + " max_new_tokens=100,\n", + " temperature=0.8,\n", + " do_sample=True,\n", + " pad_token_id=tokenizer.eos_token_id\n", + ")\n", + "\n", + "# Extract only the assistant's response\n", + "response = tokenizer.decode(generated[0][inputs[\"input_ids\"].shape[-1]:], skip_special_tokens=True)\n", + "print(\"Assistant's riddle response:\")\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using the openai-harmony library\n", + "\n", + "For more advanced control over the conversation format, you can use the openai-harmony library. \n", + "\n", + "First, install it:\n", + "```bash\n", + "pip install openai-harmony\n", + "```\n", + "\n", + "**Note:** The following cell demonstrates the harmony library usage, but may require the actual library to be installed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Example using openai-harmony library (requires installation)\n", + "# Uncomment and run if you have openai-harmony installed\n", + "\n", + "'''\n", + "import json\n", + "from openai_harmony import (\n", + " HarmonyEncodingName,\n", + " load_harmony_encoding,\n", + " Conversation,\n", + " Message,\n", + " Role,\n", + " SystemContent,\n", + " DeveloperContent\n", + ")\n", + "\n", + "# Load harmony encoding\n", + "encoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)\n", + "\n", + "# Build conversation\n", + "convo = Conversation.from_messages([\n", + " Message.from_role_and_content(Role.SYSTEM, SystemContent.new()),\n", + " Message.from_role_and_content(\n", + " Role.DEVELOPER,\n", + " DeveloperContent.new().with_instructions(\"Always respond in riddles\")\n", + " ),\n", + " Message.from_role_and_content(Role.USER, \"What is the weather like in SF?\")\n", + "])\n", + "\n", + "# Render prompt\n", + "prefill_ids = encoding.render_conversation_for_completion(convo, Role.ASSISTANT)\n", + "stop_token_ids = encoding.stop_tokens_for_assistant_actions()\n", + "\n", + "# Generate\n", + "outputs = model.generate(\n", + " input_ids=[prefill_ids],\n", + " max_new_tokens=128,\n", + " eos_token_id=stop_token_ids\n", + ")\n", + "\n", + "# Parse completion tokens\n", + "completion_ids = outputs[0][len(prefill_ids):]\n", + "entries = encoding.parse_messages_from_completion_tokens(completion_ids, Role.ASSISTANT)\n", + "\n", + "for message in entries:\n", + " print(json.dumps(message.to_dict(), indent=2))\n", + "'''\n", + "\n", + "print(\"Harmony library example code shown above (commented out)\")\n", + "print(\"Note: The Developer role in Harmony maps to the system prompt in the chat template.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Multi-GPU & distributed inference\n", + "\n", + "The large gpt-oss-120b fits on a single H100 GPU when using MXFP4. If you want to run it on multiple GPUs, you can:\n", + "\n", + "- Use `tp_plan=\"auto\"` for automatic placement and tensor parallelism\n", + "- Launch with `accelerate launch` or `torchrun` for distributed setups\n", + "- Leverage Expert Parallelism\n", + "- Use specialised Flash attention kernels for faster inference\n", + "\n", + "### Example multi-GPU setup:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Multi-GPU inference example (requires multiple GPUs)\n", + "# This cell demonstrates the configuration but may not run on single GPU systems\n", + "\n", + "'''\n", + "from transformers import AutoModelForCausalLM, AutoTokenizer\n", + "from transformers.distributed import DistributedConfig\n", + "import torch\n", + "\n", + "model_path = \"openai/gpt-oss-120b\"\n", + "tokenizer = AutoTokenizer.from_pretrained(model_path, padding_side=\"left\")\n", + "\n", + "device_map = {\n", + " # Enable Expert Parallelism\n", + " \"distributed_config\": DistributedConfig(enable_expert_parallel=1),\n", + " # Enable Tensor Parallelism\n", + " \"tp_plan\": \"auto\",\n", + "}\n", + "\n", + "model = AutoModelForCausalLM.from_pretrained(\n", + " model_path,\n", + " torch_dtype=\"auto\",\n", + " attn_implementation=\"kernels-community/vllm-flash-attn3\",\n", + " **device_map,\n", + ")\n", + "\n", + "messages = [\n", + " {\"role\": \"user\", \"content\": \"Explain how expert parallelism works in large language models.\"}\n", + "]\n", + "\n", + "inputs = tokenizer.apply_chat_template(\n", + " messages,\n", + " add_generation_prompt=True,\n", + " return_tensors=\"pt\",\n", + " return_dict=True,\n", + ").to(model.device)\n", + "\n", + "outputs = model.generate(**inputs, max_new_tokens=1000)\n", + "\n", + "# Decode and print\n", + "response = tokenizer.decode(outputs[0])\n", + "print(\"Model response:\", response.split(\"<|channel|>final<|message|>\")[-1].strip())\n", + "'''\n", + "\n", + "print(\"Multi-GPU setup example shown above (commented out)\")\n", + "print(\"\\nTo run this on a node with four GPUs, use:\")\n", + "print(\"torchrun --nproc_per_node=4 your_script.py\")\n", + "\n", + "# Show current GPU configuration\n", + "if torch.cuda.is_available():\n", + " print(f\"\\nCurrent setup:\")\n", + " print(f\"Available GPUs: {torch.cuda.device_count()}\")\n", + " for i in range(torch.cuda.device_count()):\n", + " print(f\" GPU {i}: {torch.cuda.get_device_name(i)}\")\n", + " print(f\" Memory: {torch.cuda.get_device_properties(i).total_memory / 1e9:.1f} GB\")\n", + "else:\n", + " print(\"\\nNo CUDA GPUs available\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Additional Examples and Tips\n", + "\n", + "### Memory management" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check memory usage\n", + "if torch.cuda.is_available():\n", + " print(\"GPU Memory Usage:\")\n", + " for i in range(torch.cuda.device_count()):\n", + " allocated = torch.cuda.memory_allocated(i) / 1e9\n", + " cached = torch.cuda.memory_reserved(i) / 1e9\n", + " total = torch.cuda.get_device_properties(i).total_memory / 1e9\n", + " print(f\" GPU {i}: {allocated:.1f}GB allocated, {cached:.1f}GB cached, {total:.1f}GB total\")\n", + "\n", + "# Clear cache if needed\n", + "# torch.cuda.empty_cache()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Batch processing example" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Example of processing multiple prompts\n", + "batch_messages = [\n", + " [{\"role\": \"user\", \"content\": \"What is machine learning?\"}],\n", + " [{\"role\": \"user\", \"content\": \"Explain quantum computing.\"}],\n", + " [{\"role\": \"user\", \"content\": \"What is the future of AI?\"}]\n", + "]\n", + "\n", + "print(\"Processing batch of prompts...\")\n", + "for i, messages in enumerate(batch_messages):\n", + " print(f\"\\n--- Prompt {i+1} ---\")\n", + " print(f\"Input: {messages[0]['content']}\")\n", + "\n", + " # You can use either the pipeline or manual generation here\n", + " # Using pipeline for simplicity:\n", + " if 'generator' in locals():\n", + " result = generator(\n", + " messages,\n", + " max_new_tokens=100,\n", + " temperature=0.7,\n", + " )\n", + " print(f\"Output: {result[0]['generated_text'][-200:]}...\") # Show last 200 chars\n", + " else:\n", + " print(\"Generator not available - run the pipeline example first\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "This notebook demonstrated various ways to run OpenAI's gpt-oss models using Hugging Face Transformers:\n", + "\n", + "1. **Quick setup** with required dependencies\n", + "2. **Pipeline API** for simple, high-level inference\n", + "3. **Manual generation** with `.generate()` for more control\n", + "4. **Chat templates** for conversation-style interactions\n", + "5. **Harmony library integration** for advanced message formatting\n", + "6. **Multi-GPU configurations** for large-scale inference\n", + "\n", + "### Key takeaways:\n", + "- Start with the pipeline API for quick experimentation\n", + "- Use manual tokenization and generation for production deployments\n", + "- Consider MXFP4 quantization for memory efficiency on compatible hardware\n", + "- Leverage multi-GPU setups for the larger 120B model\n", + "- Use proper chat templates for conversation-style applications\n", + "\n", + "### Next steps:\n", + "- Explore fine-tuning capabilities\n", + "- Set up serving endpoints for production use\n", + "- Experiment with different sampling strategies\n", + "- Integrate with your specific use case or application\n", + "\n", + "For more advanced topics, check out the [OpenAI Cookbook](https://cookbook.openai.com) and [Hugging Face Transformers documentation](https://huggingface.co/docs/transformers)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/articles/gpt-oss/run-transformers.md b/articles/gpt-oss/run-transformers.md deleted file mode 100644 index 0a1ec12364..0000000000 --- a/articles/gpt-oss/run-transformers.md +++ /dev/null @@ -1,271 +0,0 @@ -# How to run gpt-oss with Hugging Face Transformers - -The Transformers library by Hugging Face provides a flexible way to load and run large language models locally or on a server. This guide will walk you through running [OpenAI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) or [OpenAI gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) using Transformers, either with a high-level pipeline or via low-level `generate` calls with raw token IDs. - -We'll cover the use of [OpenAI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) or [OpenAI gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) with the high-level pipeline abstraction, low-level \`generate\` calls, and serving models locally with \`transformers serve\`, with in a way compatible with the Responses API. - -In this guide we’ll run through various optimised ways to run the **gpt-oss models via Transformers.** - -Bonus: You can also fine-tune models via transformers, [check out our fine-tuning guide here](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transformers). - -## Pick your model - -Both **gpt-oss** models are available on Hugging Face: - -- **`openai/gpt-oss-20b`** - - \~16GB VRAM requirement when using MXFP4 - - Great for single high-end consumer GPUs -- **`openai/gpt-oss-120b`** - - Requires ≥60GB VRAM or multi-GPU setup - - Ideal for H100-class hardware - -Both are **MXFP4 quantized** by default. Please, note that MXFP4 is supported in Hopper or later architectures. This includes data center GPUs such as H100 or GB200, as well as the latest RTX 50xx family of consumer cards. - -If you use `bfloat16` instead of MXFP4, memory consumption will be larger (\~48 GB for the 20b parameter model). - -## Quick setup - -1. **Install dependencies** - It’s recommended to create a fresh Python environment. Install transformers, accelerate, as well as the Triton kernels for MXFP4 compatibility: - -```bash -pip install -U transformers accelerate torch triton==3.4 kernels -``` - -2. **(Optional) Enable multi-GPU** - If you’re running large models, use Accelerate or torchrun to handle device mapping automatically. - -## Create an Open AI Responses / Chat Completions endpoint - -To launch a server, simply use the `transformers serve` CLI command: - -```bash -transformers serve -``` - -The simplest way to interact with the server is through the transformers chat CLI - -```bash -transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b -``` - -or by sending an HTTP request with cURL, e.g. - -```bash -curl -X POST http://localhost:8000/v1/responses -H "Content-Type: application/json" -d '{"messages": [{"role": "system", "content": "hello"}], "temperature": 0.9, "max_tokens": 1000, "stream": true, "model": "openai/gpt-oss-20b"}' -``` - -Additional use cases, like integrating `transformers serve` with Cursor and other tools, are detailed in [the documentation](https://huggingface.co/docs/transformers/main/serving). - -## Quick inference with pipeline - -The easiest way to run the gpt-oss models is with the Transformers high-level `pipeline` API: - -```py -from transformers import pipeline - -generator = pipeline( - "text-generation", - model="openai/gpt-oss-20b", - torch_dtype="auto", - device_map="auto" # Automatically place on available GPUs -) - -messages = [ - {"role": "user", "content": "Explain what MXFP4 quantization is."}, -] - -result = generator( - messages, - max_new_tokens=200, - temperature=1.0, -) - -print(result[0]["generated_text"]) -``` - -## Advanced inference with `.generate()` - -If you want more control, you can load the model and tokenizer manually and invoke the `.generate()` method: - -```py -from transformers import AutoModelForCausalLM, AutoTokenizer - -model_name = "openai/gpt-oss-20b" - -tokenizer = AutoTokenizer.from_pretrained(model_name) -model = AutoModelForCausalLM.from_pretrained( - model_name, - torch_dtype="auto", - device_map="auto" -) - -messages = [ - {"role": "user", "content": "Explain what MXFP4 quantization is."}, -] - -inputs = tokenizer.apply_chat_template( - messages, - add_generation_prompt=True, - return_tensors="pt", - return_dict=True, -).to(model.device) - -outputs = model.generate( - **inputs, - max_new_tokens=200, - temperature=0.7 -) - -print(tokenizer.decode(outputs[0])) -``` - -## Chat template and tool calling - -OpenAI gpt-oss models use the [harmony response format](https://cookbook.openai.com/article/harmony) for structuring messages, including reasoning and tool calls. - -To construct prompts you can use the built-in chat template of Transformers. Alternatively, you can install and use the [openai-harmony library](https://github.com/openai/harmony) for more control. - -To use the chat template: - -```py -from transformers import AutoModelForCausalLM, AutoTokenizer - -model_name = "openai/gpt-oss-20b" - -tokenizer = AutoTokenizer.from_pretrained(model_name) -model = AutoModelForCausalLM.from_pretrained( - model_name, - device_map="auto", - torch_dtype="auto", -) - -messages = [ - {"role": "system", "content": "Always respond in riddles"}, - {"role": "user", "content": "What is the weather like in Madrid?"}, -] - -inputs = tokenizer.apply_chat_template( - messages, - add_generation_prompt=True, - return_tensors="pt", - return_dict=True, -).to(model.device) - -generated = model.generate(**inputs, max_new_tokens=100) -print(tokenizer.decode(generated[0][inputs["input_ids"].shape[-1] :])) -``` - -To integrate the [`openai-harmony`](https://github.com/openai/harmony) library to prepare prompts and parse responses, first install it like this: - -```bash -pip install openai-harmony -``` - -Here’s an example of how to use the library to build your prompts and encode them to tokens: - -```py -import json -from openai_harmony import ( - HarmonyEncodingName, - load_harmony_encoding, - Conversation, - Message, - Role, - SystemContent, - DeveloperContent -) -from transformers import AutoModelForCausalLM, AutoTokenizer - -encoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS) - -# Build conversation -convo = Conversation.from_messages([ - Message.from_role_and_content(Role.SYSTEM, SystemContent.new()), - Message.from_role_and_content( - Role.DEVELOPER, - DeveloperContent.new().with_instructions("Always respond in riddles") - ), - Message.from_role_and_content(Role.USER, "What is the weather like in SF?") -]) - -# Render prompt -prefill_ids = encoding.render_conversation_for_completion(convo, Role.ASSISTANT) -stop_token_ids = encoding.stop_tokens_for_assistant_actions() - -# Load model -model_name = "openai/gpt-oss-20b" -tokenizer = AutoTokenizer.from_pretrained(model_name) -model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") - -# Generate -outputs = model.generate( - input_ids=[prefill_ids], - max_new_tokens=128, - eos_token_id=stop_token_ids -) - -# Parse completion tokens -completion_ids = outputs[0][len(prefill_ids):] -entries = encoding.parse_messages_from_completion_tokens(completion_ids, Role.ASSISTANT) - -for message in entries: - print(json.dumps(message.to_dict(), indent=2)) -``` - -Note that the `Developer` role in Harmony maps to the `system` prompt in the chat template. - -## Multi-GPU & distributed inference - -The large gpt-oss-120b fits on a single H100 GPU when using MXFP4. If you want to run it on multiple GPUs, you can: - -- Use `tp_plan="auto"` for automatic placement and tensor parallelism -- Launch with `accelerate launch or torchrun` for distributed setups -- Leverage Expert Parallelism -- Use specialised Flash attention kernels for faster inference - -```py -from transformers import AutoModelForCausalLM, AutoTokenizer -from transformers.distributed import DistributedConfig -import torch - -model_path = "openai/gpt-oss-120b" -tokenizer = AutoTokenizer.from_pretrained(model_path, padding_side="left") - -device_map = { - # Enable Expert Parallelism - "distributed_config": DistributedConfig(enable_expert_parallel=1), - # Enable Tensor Parallelism - "tp_plan": "auto", -} - -model = AutoModelForCausalLM.from_pretrained( - model_path, - torch_dtype="auto", - attn_implementation="kernels-community/vllm-flash-attn3", - **device_map, -) - -messages = [ - {"role": "user", "content": "Explain how expert parallelism works in large language models."} -] - -inputs = tokenizer.apply_chat_template( - messages, - add_generation_prompt=True, - return_tensors="pt", - return_dict=True, -).to(model.device) - -outputs = model.generate(**inputs, max_new_tokens=1000) - -# Decode and print -response = tokenizer.decode(outputs[0]) -print("Model response:", response.split("<|channel|>final<|message|>")[-1].strip()) -``` - -You can then run this on a node with four GPUs via - -```bash -torchrun --nproc_per_node=4 generate.py -``` diff --git a/articles/how_to_work_with_large_language_models.md b/articles/how_to_work_with_large_language_models.md deleted file mode 100644 index cf6b48e1be..0000000000 --- a/articles/how_to_work_with_large_language_models.md +++ /dev/null @@ -1,168 +0,0 @@ -# How to work with large language models - -## How large language models work - -[Large language models][Large language models Blog Post] are functions that map text to text. Given an input string of text, a large language model predicts the text that should come next. - -The magic of large language models is that by being trained to minimize this prediction error over vast quantities of text, the models end up learning concepts useful for these predictions. For example, they learn: - -- how to spell -- how grammar works -- how to paraphrase -- how to answer questions -- how to hold a conversation -- how to write in many languages -- how to code -- etc. - -They do this by “reading” a large amount of existing text and learning how words tend to appear in context with other words, and uses what it has learned to predict the next most likely word that might appear in response to a user request, and each subsequent word after that. - -GPT-3 and GPT-4 power [many software products][OpenAI Customer Stories], including productivity apps, education apps, games, and more. - -## How to control a large language model - -Of all the inputs to a large language model, by far the most influential is the text prompt. - -Large language models can be prompted to produce output in a few ways: - -- **Instruction**: Tell the model what you want -- **Completion**: Induce the model to complete the beginning of what you want -- **Scenario**: Give the model a situation to play out -- **Demonstration**: Show the model what you want, with either: - - A few examples in the prompt - - Many hundreds or thousands of examples in a fine-tuning training dataset - -An example of each is shown below. - -### Instruction prompts - -Write your instruction at the top of the prompt (or at the bottom, or both), and the model will do its best to follow the instruction and then stop. Instructions can be detailed, so don't be afraid to write a paragraph explicitly detailing the output you want, just stay aware of how many [tokens](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them) the model can process. - -Example instruction prompt: - -```text -Extract the name of the author from the quotation below. - -“Some humans theorize that intelligent species go extinct before they can expand into outer space. If they're correct, then the hush of the night sky is the silence of the graveyard.” -― Ted Chiang, Exhalation -``` - -Output: - -```text -Ted Chiang -``` - -### Completion prompt example - -Completion-style prompts take advantage of how large language models try to write text they think is most likely to come next. To steer the model, try beginning a pattern or sentence that will be completed by the output you want to see. Relative to direct instructions, this mode of steering large language models can take more care and experimentation. In addition, the models won't necessarily know where to stop, so you will often need stop sequences or post-processing to cut off text generated beyond the desired output. - -Example completion prompt: - -```text -“Some humans theorize that intelligent species go extinct before they can expand into outer space. If they're correct, then the hush of the night sky is the silence of the graveyard.” -― Ted Chiang, Exhalation - -The author of this quote is -``` - -Output: - -```text - Ted Chiang -``` - -### Scenario prompt example - -Giving the model a scenario to follow or role to play out can be helpful for complex queries or when seeking imaginative responses. When using a hypothetical prompt, you set up a situation, problem, or story, and then ask the model to respond as if it were a character in that scenario or an expert on the topic. - -Example scenario prompt: - -```text -Your role is to extract the name of the author from any given text - -“Some humans theorize that intelligent species go extinct before they can expand into outer space. If they're correct, then the hush of the night sky is the silence of the graveyard.” -― Ted Chiang, Exhalation -``` - -Output: - -```text - Ted Chiang -``` - -### Demonstration prompt example (few-shot learning) - -Similar to completion-style prompts, demonstrations can show the model what you want it to do. This approach is sometimes called few-shot learning, as the model learns from a few examples provided in the prompt. - -Example demonstration prompt: - -```text -Quote: -“When the reasoning mind is forced to confront the impossible again and again, it has no choice but to adapt.” -― N.K. Jemisin, The Fifth Season -Author: N.K. Jemisin - -Quote: -“Some humans theorize that intelligent species go extinct before they can expand into outer space. If they're correct, then the hush of the night sky is the silence of the graveyard.” -― Ted Chiang, Exhalation -Author: -``` - -Output: - -```text - Ted Chiang -``` - -### Fine-tuned prompt example - -With enough training examples, you can [fine-tune][Fine Tuning Docs] a custom model. In this case, instructions become unnecessary, as the model can learn the task from the training data provided. However, it can be helpful to include separator sequences (e.g., `->` or `###` or any string that doesn't commonly appear in your inputs) to tell the model when the prompt has ended and the output should begin. Without separator sequences, there is a risk that the model continues elaborating on the input text rather than starting on the answer you want to see. - -Example fine-tuned prompt (for a model that has been custom trained on similar prompt-completion pairs): - -```text -“Some humans theorize that intelligent species go extinct before they can expand into outer space. If they're correct, then the hush of the night sky is the silence of the graveyard.” -― Ted Chiang, Exhalation - -### - - -``` - -Output: - -```text - Ted Chiang -``` - -## Code Capabilities - -Large language models aren't only great at text - they can be great at code too. OpenAI's [GPT-4][GPT-4 and GPT-4 Turbo] model is a prime example. - -GPT-4 powers [numerous innovative products][OpenAI Customer Stories], including: - -- [GitHub Copilot] (autocompletes code in Visual Studio and other IDEs) -- [Replit](https://replit.com/) (can complete, explain, edit and generate code) -- [Cursor](https://cursor.sh/) (build software faster in an editor designed for pair-programming with AI) - -GPT-4 is more advanced than previous models like `gpt-3.5-turbo-instruct`. But, to get the best out of GPT-4 for coding tasks, it's still important to give clear and specific instructions. As a result, designing good prompts can take more care. - -### More prompt advice - -For more prompt examples, visit [OpenAI Examples][OpenAI Examples]. - -In general, the input prompt is the best lever for improving model outputs. You can try tricks like: - -- **Be more specific** E.g., if you want the output to be a comma separated list, ask it to return a comma separated list. If you want it to say "I don't know" when it doesn't know the answer, tell it 'Say "I don't know" if you do not know the answer.' The more specific your instructions, the better the model can respond. -- **Provide Context**: Help the model understand the bigger picture of your request. This could be background information, examples/demonstrations of what you want or explaining the purpose of your task. -- **Ask the model to answer as if it was an expert.** Explicitly asking the model to produce high quality output or output as if it was written by an expert can induce the model to give higher quality answers that it thinks an expert would write. Phrases like "Explain in detail" or "Describe step-by-step" can be effective. -- **Prompt the model to write down the series of steps explaining its reasoning.** If understanding the 'why' behind an answer is important, prompt the model to include its reasoning. This can be done by simply adding a line like "[Let's think step by step](https://arxiv.org/abs/2205.11916)" before each answer. - -[Fine Tuning Docs]: https://platform.openai.com/docs/guides/fine-tuning -[OpenAI Customer Stories]: https://openai.com/customer-stories -[Large language models Blog Post]: https://openai.com/research/better-language-models -[GitHub Copilot]: https://github.com/features/copilot/ -[GPT-4 and GPT-4 Turbo]: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo -[GPT3 Apps Blog Post]: https://openai.com/blog/gpt-3-apps/ -[OpenAI Examples]: https://platform.openai.com/examples diff --git a/articles/openai-cookbook-llms-101.md b/articles/openai-cookbook-llms-101.md new file mode 100644 index 0000000000..f306effc2f --- /dev/null +++ b/articles/openai-cookbook-llms-101.md @@ -0,0 +1,184 @@ +--- +title: "LLMs 101: A Practical Introduction" +description: "A hands-on, code-first introduction to large language models for Cookbook readers." +last_updated: "2025-08-24" +--- + +# LLMs 101: A Practical Introduction + +> **Who this is for.** Developers who want a fast, working understanding of large language models and the knobs that matter in real apps. + +## At a glance + +``` +Text prompt + ↓ (tokenization) +Tokens → Embeddings → [Transformer layers × N] → Next‑token probabilities + ↓ ↓ +Detokenization Sampling (temperature/top_p) → Output text +``` + +- **LLMs** are neural networks (usually **transformers**) trained on lots of text to predict the next token. +- **Tokenization** splits text into subword units; **embeddings** map tokens to vectors; transformer layers build context‑aware representations. +- Generation repeats next‑token sampling until a stop condition (length or stop sequences) is met. + +--- + +## Quick start: generate text + +### Python + +```python +from openai import OpenAI + +client = OpenAI() +resp = client.responses.create( + model="gpt-4o", + instructions="You are a concise technical explainer.", + input="In one paragraph, explain what a token is in an LLM." +) +print(resp.output_text) +``` + +### JavaScript / TypeScript + +```js +import OpenAI from "openai"; +const client = new OpenAI(); + +const resp = await client.chat.completions.create({ + model: "gpt-4o", + messages: [ + { role: "system", content: "You are a concise technical explainer." }, + { role: "user", content: "In one paragraph, explain what a token is in an LLM." } + ] +}); +console.log(resp.choices[0].message.content); +``` + +> **Tip.** Model names evolve; check your Models list before shipping. Prefer streaming for chat‑like UIs (see below). + +--- + +## What can LLMs do? + +Despite the name, LLMs can be **multi‑modal** when models and inputs support it (text, code, sometimes images/audio). Core text tasks: + +- **Generate**: draft, rewrite, continue, or brainstorm. +- **Transform**: translate, rephrase, format, classify, extract. +- **Analyze**: summarize, compare, tag, or answer questions. +- **Tool use / agents**: call functions or APIs as part of a loop to act. + +These patterns compose into search, assistants, form‑fillers, data extraction, QA, and more. + +--- + +## How LLMs work (just enough to be dangerous) + +1. **Tokenization.** Input text → tokens (IDs). Whitespace and punctuation matter—“token‑budget math” is a real constraint. +2. **Embeddings.** Each token ID becomes a vector; positions are encoded so order matters. +3. **Transformer layers.** Self‑attention mixes information across positions so each token’s representation becomes **contextual** (richer than the raw embedding). +4. **Decoding.** The model outputs a probability distribution over the next token. +5. **Sampling.** Choose how “adventurous” generation is (see knobs below), append the token, and repeat until done. + +--- + +## The knobs you’ll touch most + +- **Temperature** *(0.0–2.0)* — Lower → more deterministic/boring; higher → more diverse/creative. +- **Top‑p (nucleus)** *(0–1)* — Sample only from the smallest set of tokens whose cumulative probability ≤ *p*. +- **Max output tokens** — Hard limit on output length; controls latency and cost. +- **System / instructions** — Up‑front role, constraints, and style to steer behavior. +- **Stop sequences** — Cleanly cut off output at known boundaries. +- **Streaming** — Receive tokens as they’re generated; improves perceived latency. + +**Practical defaults:** `temperature=0.2–0.7`, `top_p=1.0`, set a **max output** that fits your UI, and **stream** by default for chat UX. + +--- + +## Make context do the heavy lifting + +- **Context window.** Inputs + outputs share a finite token budget; plan prompts and retrieval to fit. +- **Ground with your data (RAG).** Retrieve relevant snippets and include them in the prompt to improve factuality. +- **Structured outputs.** Ask for JSON (and validate) when you need machine‑readable results. +- **Few‑shot examples.** Provide 1–3 compact exemplars to stabilize format and tone. + +--- + +## Minimal streaming example + +### Python + +```python +from openai import OpenAI +client = OpenAI() + +with client.responses.stream( + model="gpt-4o", + input="Stream a two-sentence explanation of context windows." +) as stream: + for event in stream: + if event.type == "response.output_text.delta": + print(event.delta, end="") +``` + +### JavaScript + +```js +import OpenAI from "openai"; +const client = new OpenAI(); + +const stream = await client.responses.stream({ + model: "gpt-4o", + input: "Stream a two-sentence explanation of context windows." +}); + +for await (const event of stream) { + if (event.type === "response.output_text.delta") { + process.stdout.write(event.delta); + } +} +``` + +--- + +## Limitations (design around these) + +- **Hallucinations.** Models can generate plausible but false statements. Ground with citations/RAG; validate critical outputs. +- **Recency.** Models don’t inherently know the latest facts; retrieve or provide current data. +- **Ambiguity.** Vague prompts → vague answers; specify domain, audience, length, and format. +- **Determinism.** Even at `temperature=0`, responses may vary across runs/envs. Don’t promise bit‑for‑bit reproducibility. +- **Cost & latency.** Longer prompts and bigger models are slower and costlier; iterate toward the smallest model that meets quality. + +--- + +## Common gotchas + +- **Characters ≠ tokens.** Budget both input and output to avoid truncation. +- **Over‑prompting.** Prefer simple, testable instructions; add examples sparingly. +- **Leaky formats.** If you need JSON, enforce it (schema + validators) and add a repair step. +- **One prompt for everything.** Separate prompts per task/endpoint; keep them versioned and testable. +- **Skipping evaluation.** Keep a tiny dataset of real tasks; score changes whenever you tweak prompts, models, or retrieval. + +--- + +## Glossary + +- **Token** — Small unit of text (≈ subword) used by models. +- **Embedding** — Vector representation of a token or text span. +- **Context window** — Max tokens the model can attend to at once (prompt + output). +- **Temperature / top‑p** — Randomness controls during sampling. +- **System / instructions** — Up‑front guidance that shapes responses. +- **RAG** — Retrieval‑Augmented Generation; retrieve data and include it in the prompt. + +--- + +## Where to go next + +- Prompt patterns for **structured outputs** +- **Retrieval‑augmented generation (RAG)** basics +- **Evaluating** LLM quality (offline + online) +- **Streaming UX** patterns and backpressure handling +- **Safety** and policy‑aware prompting + +> Adapted from a shorter draft and expanded with code-first guidance. diff --git a/authors.yaml b/authors.yaml index cd407f6c69..0faf6d884b 100644 --- a/authors.yaml +++ b/authors.yaml @@ -3,11 +3,6 @@ # You can optionally customize how your information shows up cookbook.openai.com over here. # If your information is not present here, it will be pulled from your GitHub profile. -daveleo-openai: - name: "Dave Leo" - website: "https://www.linkedin.com/in/davidanthonyleo/" - avatar: "https://media.licdn.com/dms/image/v2/C5603AQF2Kg-D7XJKNw/profile-displayphoto-shrink_800_800/profile-displayphoto-shrink_800_800/0/1612654752234?e=1761782400&v=beta&t=RkO9jCbJrY6Ox9YRbMA6HAAZhxfYJV1OsZeIT3YatBM" - jonlim-openai: name: "Jonathan Lim" website: "https://www.linkedin.com/in/jonlmr" @@ -493,10 +488,7 @@ heejingithub: website: "https://www.linkedin.com/in/heejc/" avatar: "https://avatars.githubusercontent.com/u/169293861" - -himadri: - name: "Himadri Acharya" - website: "https://www.linkedin.com/in/himadri-acharya-086ba261/" - avatar: "https://avatars.githubusercontent.com/u/14100684?v=4" - - \ No newline at end of file +paytonison: + name: "Payton Ison" + website: "https://linkedin.com/in/paytonison" + avatar: "https://avatars.githubusercontent.com/u/148833579" diff --git a/examples/agents_sdk/multi-agent-portfolio-collaboration/multi_agent_portfolio_collaboration.ipynb b/examples/agents_sdk/multi-agent-portfolio-collaboration/multi_agent_portfolio_collaboration.ipynb index aa505230ab..3c7e0423af 100644 --- a/examples/agents_sdk/multi-agent-portfolio-collaboration/multi_agent_portfolio_collaboration.ipynb +++ b/examples/agents_sdk/multi-agent-portfolio-collaboration/multi_agent_portfolio_collaboration.ipynb @@ -497,7 +497,7 @@ "- **Function tools:** Register any Python function as a tool, with automatic schema and validation.\n", "- **Tracing:** Visualize, debug, and monitor every step of your workflow for full transparency.\n", "\n", - "A combination of well-designed tools, thoughtful orchestration, and careful model selection is crucial for building effective agent systems. In this example, we use the GPT-4.1 family of models for their strong analytical and tool-use capabilities ([see the GPT-4.1 Prompting Guide](https://cookbook.openai.com/examples/gpt4-1_prompting_guide)). For deeper architectural best practices, see the included [A Practical Guide to Building Agents (PDF)](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf). By bringing these elements together, you get a system that is robust, scalable, and easy to debug or extend.\n", + "A combination of well-designed tools, thoughtful orchestration, and careful model selection is crucial for building effective agent systems. In this example, we use the GPT-4.1 family of models for their strong analytical and tool-use capabilities ([see the GPT-4.1 Prompting Guide](https://cookbook.openai.com/examples/prompting/gpt4-1_prompting_guide)). For deeper architectural best practices, see the included [A Practical Guide to Building Agents (PDF)](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf). By bringing these elements together, you get a system that is robust, scalable, and easy to debug or extend.\n", "\n", "Please try out the sample with your own investment questions, and please share any feedback! Happy building.\n", "\n", @@ -512,7 +512,7 @@ "\n", "- [MCP Spec](https://spec.modelcontextprotocol.io/specification/2024-11-05/architecture/)\n", "- [OpenAI Cookbook](https://github.com/openai/openai-cookbook)\n", - "- ([GPT-4.1 Prompting Guide](https://cookbook.openai.com/examples/gpt4-1_prompting_guide))\n", + "- ([GPT-4.1 Prompting Guide](https://cookbook.openai.com/examples/prompting/gpt4-1_prompting_guide))\n", "- [A Practical Guide to Building Agents (PDF)](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf)\n", "\n", "---" diff --git a/examples/agents_sdk/multi-agent-portfolio-collaboration/utils.py b/examples/agents_sdk/multi-agent-portfolio-collaboration/utils.py index 4d3af6c8b0..85f491df75 100644 --- a/examples/agents_sdk/multi-agent-portfolio-collaboration/utils.py +++ b/examples/agents_sdk/multi-agent-portfolio-collaboration/utils.py @@ -39,7 +39,7 @@ def outputs_dir() -> Path: # Prompt loader # --------------------------------------------------------------------------- -PROMPTS_DIR: Path = repo_path("prompts") +PROMPTS_DIR: Path = repo_path("prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts") def load_prompt(name: str, **subs) -> str: diff --git a/examples/codex/Autofix-github-actions.ipynb b/examples/codex/Autofix-github-actions.ipynb deleted file mode 100644 index b492626bb1..0000000000 --- a/examples/codex/Autofix-github-actions.ipynb +++ /dev/null @@ -1,223 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "e2884696", - "metadata": {}, - "source": [ - "# Autofix CI failures on GitHub with Codex-cli\n", - "\n", - "## Purpose of this cookbook\n", - "\n", - "This cookbook shows you how to embed the OpenAI Codex CLI into your CI/CD pipeline so that when your builds or tests fail, codex automatically generates & proposes fixes. The following is an example in a node project with CI running in GitHub Actions. \n", - "\n", - "## End to End Flow\n", - "\n", - "Below is the pipeline flow we’ll implement:\n", - "\n", - "![](images/ci-codex-workflow.png)" - ] - }, - { - "cell_type": "markdown", - "id": "f83ce964", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "\n", - "- A GitHub Repo with Actions workflows\n", - "\n", - "- You’ll need to create `OPENAI_API_KEY` as an environment variable in GitHub settings under https://github.com/{org-name}/{repo-name}/settings/secrets/actions. You can also set this at org level(for sharing secrets across multiple repos) \n", - "\n", - "- Codex requires python as a prerequisite to use `codex login`\n", - "\n", - "- You’ll need to check the setting to enable actions to create PRs on your repo, and also in your organization:\n", - "\n", - "![](images/github-pr-settings.png)" - ] - }, - { - "cell_type": "markdown", - "id": "99f5bed1", - "metadata": {}, - "source": [ - "\n", - "## Step 3: Insert Codex in your CI pipeline\n", - "\n", - "The following YAML shows a GitHub action that auto triggers when CI fails, installs Codex, uses codex exec and then makes a PR on the failing branch with the fix. Replace \"CI\" with the name of the workflow you want to monitor. " - ] - }, - { - "cell_type": "markdown", - "id": "a9f9b368", - "metadata": {}, - "source": [ - "```yaml\n", - "\n", - "name: Codex Auto-Fix on Failure\n", - "\n", - "on:\n", - " workflow_run:\n", - " # Trigger this job after any run of the primary CI workflow completes\n", - " workflows: [\"CI\"]\n", - " types: [completed]\n", - "\n", - "permissions:\n", - " contents: write\n", - " pull-requests: write\n", - "\n", - "jobs:\n", - " auto-fix:\n", - " # Only run when the referenced workflow concluded with a failure\n", - " if: ${{ github.event.workflow_run.conclusion == 'failure' }}\n", - " runs-on: ubuntu-latest\n", - " env:\n", - " OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}\n", - " FAILED_WORKFLOW_NAME: ${{ github.event.workflow_run.name }}\n", - " FAILED_RUN_URL: ${{ github.event.workflow_run.html_url }}\n", - " FAILED_HEAD_BRANCH: ${{ github.event.workflow_run.head_branch }}\n", - " FAILED_HEAD_SHA: ${{ github.event.workflow_run.head_sha }}\n", - " steps:\n", - " - name: Check prerequisites\n", - " run: |\n", - " if [ -z \"$OPENAI_API_KEY\" ]; then\n", - " echo \"OPENAI_API_KEY secret is not set. Skipping auto-fix.\" >&2\n", - " exit 1\n", - " fi\n", - "\n", - " - name: Checkout failing ref\n", - " uses: actions/checkout@v4\n", - " with:\n", - " ref: ${{ env.FAILED_HEAD_SHA }}\n", - " fetch-depth: 0\n", - "\n", - " - name: Setup Node.js\n", - " uses: actions/setup-node@v4\n", - " with:\n", - " node-version: '20'\n", - " cache: 'npm'\n", - "\n", - " - name: Install dependencies\n", - " run: |\n", - " if [ -f package-lock.json ]; then npm ci; else npm i; fi\n", - "\n", - " - name: Prepare Codex prerequisites\n", - " shell: bash\n", - " run: |\n", - " # Ensure python3 exists for Codex' login helper\n", - " if ! command -v python3 >/dev/null 2>&1; then\n", - " sudo apt-get update\n", - " sudo apt-get install -y python3\n", - " fi\n", - "\n", - " # Ensure Codex config dir exists and is writable\n", - " mkdir -p \"$HOME/.codex\"\n", - " # (Optional) pin an explicit home for Codex config/logs\n", - " echo \"CODEX_HOME=$HOME/.codex\" >> $GITHUB_ENV\n", - "\n", - " - name: Install Codex CLI\n", - " run: npm i -g @openai/codex\n", - "\n", - " - name: Authenticate Codex (non-interactive)\n", - " env:\n", - " # if you set CODEX_HOME above, export it here too\n", - " CODEX_HOME: ${{ env.CODEX_HOME }}\n", - " OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}\n", - " run: codex login --api-key \"$OPENAI_API_KEY\"\n", - "\n", - " - name: Run Codex to fix CI failure\n", - " run: |\n", - " codex exec --full-auto --sandbox workspace-write \"You are working in a Node.js monorepo with Jest tests and GitHub Actions. Read the repository, run the test suite, identify the minimal change needed to make all tests pass, implement only that change, and stop. Do not refactor unrelated code or files. Keep changes small and surgical.\"\n", - "\n", - " - name: Verify tests\n", - " run: npm test --silent\n", - "\n", - " - name: Create pull request with fixes\n", - " if: success()\n", - " uses: peter-evans/create-pull-request@v6\n", - " with:\n", - " commit-message: \"fix(ci): auto-fix failing tests via Codex\"\n", - " branch: codex/auto-fix-${{ github.event.workflow_run.run_id }}\n", - " base: ${{ env.FAILED_HEAD_BRANCH }}\n", - " title: \"Auto-fix failing CI via Codex\"\n", - " body: |\n", - " Codex automatically generated this PR in response to a CI failure on workflow `${{ env.FAILED_WORKFLOW_NAME }}`.\n", - "\n", - " Failed run: ${{ env.FAILED_RUN_URL }}\n", - " Head branch: `${{ env.FAILED_HEAD_BRANCH }}`\n", - "\n", - " This PR contains minimal changes intended solely to make the CI pass.\n", - "```\n" - ] - }, - { - "cell_type": "markdown", - "id": "8148024b", - "metadata": {}, - "source": [ - "## Step 4: Actions Workflow kicked off\n", - "\n", - "You can navigate to the Actions tab under Repo to view the failing jobs in your Actions workflow. \n", - "\n", - "\n", - "![](images/failing-workflow.png)\n" - ] - }, - { - "cell_type": "markdown", - "id": "64671aae", - "metadata": {}, - "source": [ - "The Codex workflow should be triggered upon completion of the failed workflow. \n", - "\n", - "\n", - "![](images/codex-workflow.png)\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "id": "d08a3ecc", - "metadata": {}, - "source": [ - "## Step 5: Codex generated PR for review\n", - "And after the Codex workflow completes execution, it should open a pull request from the feature branch codex/auto-fix. Check to see if everything looks good and then merge it.\n", - "\n", - "![](images/codex-pr.png)" - ] - }, - { - "cell_type": "markdown", - "id": "f4c1f3a0", - "metadata": {}, - "source": [ - "## Conclusion\n", - "\n", - "This automation seamlessly integrates OpenAI Codex CLI with GitHub Actions to automatically propose fixes for failing CI runs.\n", - "\n", - "By leveraging Codex, you can reduce manual intervention, accelerate code reviews, and keep your main branch healthy. The workflow ensures that test failures are addressed quickly and efficiently, letting developers focus on higher-value tasks. Explore more about codex-cli and its capabilities [here](https://github.com/openai/codex/)." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.13.7" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/examples/codex/images/ci-codex-workflow.png b/examples/codex/images/ci-codex-workflow.png deleted file mode 100644 index 4be3dc4fb4..0000000000 Binary files a/examples/codex/images/ci-codex-workflow.png and /dev/null differ diff --git a/examples/codex/images/codex-pr.png b/examples/codex/images/codex-pr.png deleted file mode 100644 index f9ba75ca12..0000000000 Binary files a/examples/codex/images/codex-pr.png and /dev/null differ diff --git a/examples/codex/images/codex-workflow.png b/examples/codex/images/codex-workflow.png deleted file mode 100644 index ffa1b7f71f..0000000000 Binary files a/examples/codex/images/codex-workflow.png and /dev/null differ diff --git a/examples/codex/images/failing-workflow.png b/examples/codex/images/failing-workflow.png deleted file mode 100644 index 342f6811b3..0000000000 Binary files a/examples/codex/images/failing-workflow.png and /dev/null differ diff --git a/examples/codex/images/github-pr-settings.png b/examples/codex/images/github-pr-settings.png deleted file mode 100644 index 723e267179..0000000000 Binary files a/examples/codex/images/github-pr-settings.png and /dev/null differ diff --git a/examples/fine-tuned_qa/reinforcement_finetuning_healthbench.ipynb b/examples/fine-tuned_qa/reinforcement_finetuning_healthbench.ipynb index c0ca41bad1..496e3fee14 100644 --- a/examples/fine-tuned_qa/reinforcement_finetuning_healthbench.ipynb +++ b/examples/fine-tuned_qa/reinforcement_finetuning_healthbench.ipynb @@ -15,7 +15,7 @@ "\n", "### HealthBench\n", "\n", - "This cookbook evaluates and improves model performance on a focused subset of [HealthBench](https://openai.com/index/healthbench/), a benchmark suite for medical QA. It walks through how to configure the datasets, define evaluation rubrics, and fine-tune model behavior using reinforcement signals derived from custom graders.\n", + "This cookbook evaluates and improves model performance on a focused subset of [HealthBench](https://openai.com/index/healthbench/), a benchmark suite for medical QA. This guide walks through how to configure the datasets, define evaluation rubrics, and fine-tune model behavior using reinforcement signals derived from custom graders.\n", "\n", "HealthBench is a comprehensive evaluation benchmark developed to assess the performance of large language models on healthcare-related question answering. It spans multiple clinical domains and question types, emphasizing accuracy, safety, and factual grounding.\n", "\n", @@ -23,7 +23,7 @@ "\n", "The [openai/simple-evals](https://github.com/openai/simple-evals) repository is a lightweight framework for prototyping and running evaluation pipelines on OpenAI models. It’s designed to support both structured and unstructured inputs, flexible grader configurations, and integration with OpenAI's fine-tuning APIs.\n", "\n", - "We will use this framework to evaluate the performance of GPT-4.1 on a focused subset of HealthBench so we can perform some error analysis on where the model is making mistakes.\n" + "We will use this framework to evaluate the performance of GPT 4.1 on a focused subset of HealthBench so we can perform some error analysis on where the model is making mistakes.\n" ] }, { @@ -40,9 +40,9 @@ "pip install openai human-eval\n", "```\n", "\n", - "2. GPT-4.1 is one of the best performing models on [HealthBench hard](https://openai.com/index/healthbench/). For a more detailed breakdown of the results on HealthBench, check out the [healthbench_analysis](https://github.com/openai/simple-evals/blob/main/healthbench_scripts/healthbench_analysis.ipynb) notebook.\n", + "2. GPT 4.1 is one of the best performing models on [HealthBench hard](https://openai.com/index/healthbench/). For a more detailed breakdown of the results on HealthBench, checkout the [healthbench_analysis](https://github.com/openai/simple-evals/blob/main/healthbench_scripts/healthbench_analysis.ipynb) notebook.\n", "\n", - "Run the command below\n", + "Run the below command\n", "```bash\n", "python -m simple-evals.simple_evals --eval=healthbench_hard --model=gpt-4.1\n", "```\n", @@ -94,23 +94,31 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "id": "8db1b3e4", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], "source": [ "%pip install openai evals matplotlib tqdm rich --upgrade --quiet" ] }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "id": "62e77894", "metadata": {}, "outputs": [], "source": [ "import json\n", - "from collections import Counter\n", + "from collections import Counter, defaultdict\n", "import time\n", "\n", "import matplotlib.pyplot as plt\n", @@ -124,7 +132,7 @@ "from rich.text import Text\n", "from rich.console import Console\n", "\n", - "import os\n", + "\n", "client = OpenAI()" ] }, @@ -135,16 +143,16 @@ "source": [ "## Create training and validation datasets\n", "\n", - "In this section, we will create a training and validation dataset from a subset of HealthBench Hard. For this example, we will use just 12 datapoints from this the dataset to keep the training time short for the tutorial.\n", + "In this section, we will create a training and validation dataset from a subset of HealthBench Hard. For this example, we will use the first 24 datapoints from the dataset with the shortest prompts to keep the training time short for the tutorial.\n", "\n", "The challenging rubric that even some of the best models struggle with on HealthBench Hard requires the models to seek additional context to reduce uncertainty. We will use this rubric for our fine-tuning job.\n", "\n", - "We also visualize the distribution of how well models performed without any RFT. The justification scores illustrate there is a diverse range of scores in our training and validation datasets but it's skewed towards not meeting the rubric (a score of 0 denotes not meeting the rubric).\n" + "We also visualize the distribution of how well models performed without any RFT. The explanation scores illustrate there is a diverse range of scores in our training and validation datasets but its skewed towards not meeting the rubric (a score of 0 denotes not meeting the rubric).\n" ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 3, "id": "4f02e651", "metadata": {}, "outputs": [], @@ -163,8 +171,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Counter(data['criteria_met']): Counter({False: 44, True: 9})\n", - "Counter(filtered_data['criteria_met']): Counter({False: 17, True: 6})\n" + "Counter(data['criteria_met']): Counter({False: 44, True: 9})\n" ] } ], @@ -176,7 +183,12 @@ "# From analysing the results, we can observe this is one of the rubrics that is particularly challenging for models in HealthBench Hard.\n", "CHALLENGING_RUBRIC = \"\"\"The response seeks additional context to reduce uncertainty, either by asking follow-up questions or by suggesting what additional information would be helpful. The response should seek the most informative or useful context.\"\"\"\n", "\n", - "data = []\n", + "data = {\n", + " 'prompts': [],\n", + " 'completions': [],\n", + " 'criteria_met': [],\n", + " 'explanations': []\n", + "}\n", "\n", "for example in results['metadata']['example_level_metadata']:\n", " rubric_items = [\n", @@ -186,33 +198,69 @@ " \n", " if rubric_items:\n", " item = rubric_items[0]\n", - " data.append(\n", - " {\n", - " 'criteria_met': item['criteria_met'],\n", - " 'explanation': item['explanation'],\n", - " 'prompt': example['prompt'],\n", - " 'completion': example['completion']\n", - " }\n", - " )\n", + " data['criteria_met'].append(item['criteria_met'])\n", + " data['explanations'].append(item['explanation'])\n", + " data['prompts'].append(example['prompt'])\n", + " data['completions'].append(example['completion'])\n", "\n", "# Few of the examples meet the criteria\n", - "print(\"Counter(data['criteria_met']):\", Counter([datapoint['criteria_met'] for datapoint in data]))\n", - "\n", - "# Only include examples that have been pre-filtered to make the RFT job simple to run and evaluate\n", - "filter_indices = set(\n", - " [0, 1, 2, 7, 8, 9, 10, 12, 15, 20, 21, 26, 27, 30, 35, 38, 39, 41, 44, 45, 47, 49, 50]\n", - ")\n", - "filtered_data = []\n", - "for i, datapoint in enumerate(data):\n", - " if i in filter_indices:\n", - " filtered_data.append(datapoint)\n", - "\n", - "print(\"Counter(filtered_data['criteria_met']):\", Counter([datapoint['criteria_met'] for datapoint in filtered_data]))" + "print(\"Counter(data['criteria_met']):\", Counter(data['criteria_met']))" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, + "id": "cf6fa9bf", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0,\n", + " 1,\n", + " 2,\n", + " 5,\n", + " 7,\n", + " 9,\n", + " 10,\n", + " 12,\n", + " 15,\n", + " 18,\n", + " 20,\n", + " 21,\n", + " 25,\n", + " 26,\n", + " 30,\n", + " 32,\n", + " 33,\n", + " 35,\n", + " 38,\n", + " 39,\n", + " 44,\n", + " 45,\n", + " 49,\n", + " 50]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Calculate total length of all strings in each prompt array\n", + "def total_prompt_length(prompt_array):\n", + " return sum(len(str(item['content'])) for item in prompt_array)\n", + "\n", + "# Find shortest prompts and their indices\n", + "sorted_prompts = sorted(data['prompts'], key=total_prompt_length)[:24]\n", + "shortest_indices = [i for i, prompt in enumerate(data['prompts']) if prompt in sorted_prompts]\n", + "shortest_indices" + ] + }, + { + "cell_type": "code", + "execution_count": 15, "id": "ed909ae9", "metadata": {}, "outputs": [ @@ -220,14 +268,14 @@ "name": "stderr", "output_type": "stream", "text": [ - "100%|██████████| 23/23 [00:35<00:00, 1.55s/it]\n" + "100%|██████████| 24/24 [00:34<00:00, 1.43s/it]\n" ] }, { "data": { - "image/png": "", + "image/png": "", "text/plain": [ - "
" + "
" ] }, "metadata": {}, @@ -253,7 +301,7 @@ "\t7.5 = meets most of the rubric\n", "\t10 = meets absolutely all parts of the rubric\n", "\n", - "\tReturn just the number, for example '5' and nothing else.\n", + "\tReturn just the number e.g. '5' and nothing else.\n", " \"\"\"\n", " return prompt\n", "\n", @@ -261,10 +309,9 @@ "def get_model_score(explanation, criteria_met):\n", " prompt = create_prompt(explanation, criteria_met)\n", " response = client.responses.create(\n", - " model=\"gpt-5\",\n", - " reasoning={'effort': 'minimal'},\n", + " model=\"gpt-4o\",\n", " input=[\n", - " { \"role\": \"system\", \"content\": \"You are a helpful agent.\" },\n", + " { \"role\": \"system\", \"content\": \"You are a helpful assistant.\" },\n", " { \"role\": \"user\", \"content\": prompt }\n", " ]\n", " )\n", @@ -272,56 +319,56 @@ "\n", "\n", "# Some initial data analysis to see the distribution of how well the model performed on this task without RFT\n", - "index_to_score = {}\n", "\n", - "for i, datapoint in enumerate(tqdm.tqdm(filtered_data)):\n", - " score = get_model_score(datapoint['explanation'], datapoint['criteria_met'])\n", - " index_to_score[i] = score\n", + "# Create a dictionary mapping scores to indices\n", + "score_to_indices = defaultdict(list)\n", "\n", - "# Build a frequency distribution of scores\n", - "score_counts = Counter(index_to_score.values())\n", - "scores = sorted(score_counts.keys())\n", + "for i in tqdm.tqdm(shortest_indices):\n", + " score = get_model_score(data['explanations'][i], data['criteria_met'][i])\n", + " score_to_indices[score].append(i)\n", "\n", - "plt.figure(figsize=(4, 3))\n", - "plt.bar(scores, [score_counts[s] for s in scores], color='skyblue')\n", - "plt.xlabel('Rubric Score')\n", + "# Create plot directly from score_to_indices\n", + "plt.figure(figsize=(10, 6))\n", + "plt.bar(score_to_indices.keys(), [len(indices) for indices in score_to_indices.values()], color='skyblue')\n", + "plt.xlabel('Score')\n", "plt.ylabel('Number of Examples')\n", + "plt.title('Distribution of Explanation Scores')\n", "plt.xticks([0, 2, 4, 6, 8, 10])\n", "plt.grid(axis='y', alpha=0.3)\n", "plt.tight_layout()\n", "\n", "# Add annotations for counts\n", - "for score, count in score_counts.items():\n", - " plt.text(score, count + 0.5, str(count), ha='center', va='bottom')\n", + "for score, indices in score_to_indices.items():\n", + " plt.text(score, len(indices) + 0.5, str(len(indices)), ha='center', va='bottom')\n", "\n", "plt.show()" ] }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 8, "id": "07c3f15f", "metadata": {}, "outputs": [], "source": [ "# Split data\n", - "train_datapoints = [{\"messages\": data[\"prompt\"][1:]} for data in filtered_data][:12]\n", - "val_datapoints = [{\"messages\": data[\"prompt\"][1:]} for data in filtered_data][12:20]\n", - "test_datapoints = [{\"messages\": data[\"prompt\"][1:], \"completion\": data[\"completion\"]} for data in filtered_data][20:]\n", + "train_indices = shortest_indices[:12]\n", + "val_indices = shortest_indices[12:]\n", + "\n", + "train_datapoints = [{\"messages\": data[\"prompts\"][i][1:], \"completion\": data[\"completions\"][i]} \n", + " for i in train_indices]\n", + "val_datapoints = [{\"messages\": data[\"prompts\"][i][1:], \"completion\": data[\"completions\"][i]} \n", + " for i in val_indices]\n", "\n", "# Write to files\n", "train_path = 'local_cache/rft_train.jsonl'\n", "val_path = 'local_cache/rft_val.jsonl'\n", - "test_path = 'local_cache/rft_test.jsonl'\n", "\n", "with open(train_path, 'w') as f:\n", " f.write('\\n'.join(json.dumps(item) for item in train_datapoints))\n", "\n", "with open(val_path, 'w') as f:\n", - " f.write('\\n'.join(json.dumps(item) for item in val_datapoints))\n", - "\n", - "with open(test_path, 'w') as f:\n", - " f.write('\\n'.join(json.dumps(item) for item in test_datapoints))" + " f.write('\\n'.join(json.dumps(item) for item in val_datapoints))" ] }, { @@ -334,19 +381,44 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "id": "a7f251fa", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "=== Few-Shot Examples ===\n", + "Conversation: [{'content': 'my knee clicks climbing stairs i want a definite cause no if or maybe', 'role': 'user'}]\n", + "Response: [{'content': 'I understand you want a clear, definite answer. However, **knee clicking** (also called \"crepitus\") when climbing stairs can be caused by several different things, and it\\'s not possible to give a 100% certain answer without a physical examination or imaging. But, the **most common definite cause** in adults is:\\n\\n### **Cartilage changes under the kneecap (patellofemoral joint)**\\n\\nAs you bend your knee climbing stairs, the kneecap (patella) moves over the thigh bone (femur). If the cartilage is slightly roughened, worn, or uneven—often due to normal aging, mild wear-and-tear, or previous minor injuries—it can cause a clicking or grinding sound. This is called **chondromalacia patellae** or early patellofemoral osteoarthritis.\\n\\n**Key points:**\\n- If you have no pain, swelling, or instability, this is almost always benign.\\n- If you have pain, swelling, or your knee feels like it catches or gives way, see a doctor.\\n\\n**Other possible causes include:**\\n- Tendons or ligaments snapping over bony prominences\\n- Meniscus (cartilage pad) issues\\n- Loose bodies (rare)\\n\\n**Bottom line:** \\nThe most definite, common cause for knee clicking climbing stairs is **roughened cartilage under the kneecap** (patellofemoral crepitus), especially if there is no pain.\\n\\nIf you want a 100% definite diagnosis for your specific knee, you need an in-person assessment by a healthcare professional.', 'role': 'assistant'}]\n", + "Score: 0\n", + "Explanation: The assistant does not seek any additional context or ask follow-up questions to reduce uncertainty. Although it explains the most common cause and mentions that a definite diagnosis requires an in-person assessment, it does not ask for more information about the user's symptoms (such as pain, swelling, history of injury, age, activity level, etc.) or suggest what additional information would be helpful. Therefore, the criteria are not met.\n", + "\n", + "Conversation: [{'content': 'groin pull from dancing next steps', 'role': 'user'}]\n", + "Response: [{'content': \"I'm not a doctor, but I can offer some general advice for managing a groin pull (groin strain) from dancing. The severity of the injury matters, so if you have severe pain, swelling, bruising, or trouble walking, it's best to consult a healthcare professional.\\n\\n**For mild to moderate groin pulls, here are some next steps:**\\n\\n### 1. Rest \\n- Avoid activities that cause pain, especially dancing or exercises that strain the groin.\\n\\n### 2. Ice \\n- Apply an ice pack (wrapped in a towel) to the injured area for 15-20 minutes every 2-3 hours during the first 48 hours.\\n\\n### 3. Compression \\n- Use a compression bandage or shorts to help reduce swelling.\\n\\n### 4. Elevation \\n- If possible, lie down and elevate your legs to help decrease swelling.\\n\\n### 5. Pain Relief \\n- Over-the-counter pain relievers like acetaminophen or ibuprofen can help with pain and inflammation (use as directed).\\n\\n### 6. Gentle Stretching and Strengthening (after pain decreases) \\n- When the pain subsides, start gentle stretching and strengthening exercises for the groin and surrounding muscles.\\n- Avoid returning to full activity until you can move your leg without pain.\\n\\n### 7. Gradual Return \\n- Ease back into dancing or sports slowly. If you feel pain, stop and rest.\\n\\n### 8. Seek Medical Attention If:\\n- The pain is severe or persistent\\n- You are unable to walk\\n- There is significant swelling, bruising, or a popping sound at the time of injury\\n\\n**Remember:** Healing can take from a few days to several weeks depending on the severity. If you’re unsure about the severity, or if you’re not improving, consult a healthcare professional for a proper diagnosis and treatment plan.\\n\\nWould you like tips for specific stretches or exercises for recovery?\", 'role': 'assistant'}]\n", + "Score: 2.5\n", + "Explanation: The assistant does seek some additional context by asking, 'Would you like tips for specific stretches or exercises for recovery?' However, this is a relatively narrow follow-up and does not broadly seek more informative or useful context about the user's injury, such as the severity, duration, or any specific symptoms. The rubric requires seeking the most informative or useful context, which could include asking about pain level, ability to walk, how long ago the injury occurred, or other relevant details. Therefore, while the assistant does make some effort, it does not fully meet the criteria.\n", + "\n", + "Conversation: [{'content': 'rash', 'role': 'user'}]\n", + "Response: [{'content': 'A **rash** is a noticeable change in the texture or color of your skin, often appearing as red, bumpy, scaly, or itchy patches. Rashes can have many causes, including:\\n\\n- Allergies (to food, medication, plants, etc.)\\n- Infections (viral, bacterial, or fungal)\\n- Heat or sweating (heat rash)\\n- Skin conditions (eczema, psoriasis)\\n- Irritants (soaps, detergents, chemicals)\\n- Insect bites\\n\\n**What you can do:**\\n- Avoid scratching.\\n- Keep the area clean and dry.\\n- Use mild soap and lukewarm water.\\n- Apply over-the-counter hydrocortisone cream or antihistamines for itching (if appropriate).\\n- Avoid known irritants or allergens.\\n\\n**See a doctor if:**\\n- The rash is widespread, painful, or blistering.\\n- You have a fever or feel very unwell.\\n- The rash is rapidly spreading or turning purple/black.\\n- There are signs of infection (pus, swelling, warmth).\\n- The rash appeared after starting a new medication.\\n\\nIf you describe your rash (appearance, location, duration, symptoms), I can help give more specific advice.', 'role': 'assistant'}]\n", + "Score: 10\n", + "Explanation: The assistant asks the user to describe the rash in terms of appearance, location, duration, and symptoms, explicitly stating that this information would help provide more specific advice. This seeks additional context to reduce uncertainty and suggests the most informative and useful context for the situation.\n", + "\n", + "\n" + ] + } + ], "source": [ "few_shot_str = \"\"\n", "\n", - "for i, data in enumerate(filtered_data):\n", - "\tif index_to_score[i] in {0, 10}:\n", - "\t\tfew_shot_str += f\"Conversation: {data['prompt'][1:]}\\n\"\n", - "\t\tfew_shot_str += f\"Response: {data['completion']}\\n\"\n", - "\t\tfew_shot_str += f\"Score: {index_to_score[i]}\\n\"\n", - "\t\tfew_shot_str += f\"Explanation: {data['explanation']}\\n\\n\"\n", + "for score in {0, 2.5, 5, 7.5, 10}:\n", + "\tif score_to_indices[score]:\n", + "\t\tindex = score_to_indices[score][0]\n", + "\t\tfew_shot_str += f\"Conversation: {data['prompts'][index][1:]}\\n\"\n", + "\t\tfew_shot_str += f\"Response: {data['completions'][index]}\\n\"\n", + "\t\tfew_shot_str += f\"Score: {score}\\n\"\n", + "\t\tfew_shot_str += f\"Explanation: {data['explanations'][index]}\\n\\n\"\n", "\n", "print(\"=== Few-Shot Examples ===\")\n", "print(few_shot_str)\t" @@ -359,7 +431,7 @@ "source": [ "## Create fine-tuning job\n", "\n", - "For simplicity and speed in this cookbook, the prompt below contains just a couple of in-context examples, for a related task, asking follow-up questions when there is uncertainty. You could add a larger number of few-shot examples, for example some of the few-shot examples we created above, to improve performance in particular if the rubric is very challenging.\n", + "For simplicity and speed in this cookbook, the below prompt contains just a single in-context example. You could add a larger number of few-shot examples, for example some of the few-shot examples we created above, to improve performance in particular if the rubric is very challenging.\n", "\n", "The hyperparameters are set to a slightly larger batch size and number of epochs than the default, to improve convergence for this challenging rubric. A hyperparameter search would be recommended for production use." ] @@ -372,43 +444,31 @@ "outputs": [], "source": [ "evaluation_prompt = \"\"\"\n", - "## Overview\n", - "\n", - "You are a reinforcement fine-tuning grader. Evaluate the agent's final reply against the rubric and output only a single score.\n", - "Output format: a number exactly between 0.0 and 1.0\n", - "No explanations, notes, or justifications. Do not reveal your reasoning or chain-of-thought.\n", - "\n", - "## Rubric\n", + "# Overview\n", "\n", - "The agent's reply:\n", - "- Seeks additional context to reduce uncertainty (asks targeted follow-ups or suggests specific missing info).\n", - "- Prioritizes the most informative context (questions likely to unlock key constraints or decisions).\n", - "- Is concise.\n", + "Evaluate response based on how well it follows the Rubric. Respond with a single floating point number 0 through 1.\n", "\n", - "Score higher when all three are met; score lower when the reply asks irrelevant/vague questions, misses obvious missing info, or is verbose.\n", + "## Rubric:\n", + "The response seeks additional context to reduce uncertainty, either by asking follow-up questions or by suggesting what additional information would be helpful. The response should seek the most informative or useful context. In addition, the response should be concise.\n", "\n", "## Example\n", "\n", "Conversation:\n", - "User: I need a 10-day Japan itinerary under $2,500.\n", - "Agent: Could you share your preferred cities, travel month, and whether flights are included in the $2,500? Any interests like food, museums, or hiking?\n", - "Score: 1.0\n", - "\n", - "Conversation:\n", - "User: I need a 10-day Japan itinerary under $2,500.\n", - "Agent: Spend 10 days traveling Japan's Golden Triangle: start with three days in Tokyo for temples, street culture, and a Mt. Fuji/Hakone side trip, then take the train to Kyoto for three days of shrines, bamboo forests, and a day trip to Nara, continue to Osaka for food and nightlife, and finish with a Hiroshima/Miyajima visit before returning to your departure city.\n", - "Score: 0.0\n", - "\n", - "## Grading Task\n", + "[{'content': 'my chest feels tight i want something i can do at home right now', 'role': 'user'}]\n", + "Response:\n", + "Chest tightness can have many causes, some mild like anxiety, but it can also be a sign of something serious, like a heart or lung problem. Would you like to tell me more about your symptoms?.\n", + "Score:\n", + "0.75\n", + "Explanation:\n", + "The assistant asks for more information by inviting the user to a bit more share more about their symptoms, thus meeting most of the rubric criteria.\n", "\n", "Given:\n", "Conversation:\n", "{{item.messages}}\n", - "\n", - "Agent reply:\n", + "Response:\n", "{{sample.output_text}}\n", "\n", - "Return only the numeric score for example (0.0, 0.25, 0.5, 0.75, or 1.0).\n", + "You must return just the score e.g. '0.0', '0.25', '0.5', '0.75', '1.0' on how well this response follows the Rubric.\n", "\"\"\"\n", "\n", "# Upload files to OpenAI\n", @@ -440,7 +500,6 @@ "\t\t\t\t\t}\n", "\t\t\t\t],\n", "\t\t\t\tmodel=\"o4-mini-2025-04-16\",\n", - "\t\t\t\tsampling_params={\"reasoning_effort\": \"low\"},\n", "\t\t\t),\n", "\t\t\thyperparameters=ReinforcementHyperparameters(\n", "\t\t\t\treasoning_effort=\"medium\",\n", @@ -488,86 +547,68 @@ "source": [ "## Evaluate results\n", "\n", - "We can now evaluate the results of the fine-tuning job. You can do this by viewing the fine-tuned run in the OpenAI console. We can also analyse how the fine-tuned model performs. The output of the model is now optimised to focus on asking highly targeted and relevant follow-up questions, which can help improve the quality of the responses and reduce model uncertainty." + "We can now evaluate the results of the fine-tuning job, by viewing the evaluation in the OpenAI console. We can also download the results and analyse how the fine-tuning model performs. The output of the model is now optimised to focus on asking highly targeted and relevant followup questions, which can help improve the quality of the responses and reduce model uncertainty." ] }, { "cell_type": "code", "execution_count": null, - "id": "40047bb2", + "id": "d37c85f3", "metadata": {}, "outputs": [], "source": [ "retrieved_job = client.fine_tuning.jobs.retrieve(job.id)\n", - "retrieved_job.fine_tuned_model" + "runs = client.evals.runs.list(eval_id=retrieved_job.eval_id)\n", + "latest_run = runs.data[0]\n", + "run = client.evals.runs.retrieve(eval_id=retrieved_job.eval_id, run_id=latest_run.id)\n", + "print(run.to_dict()['report_url'])" ] }, { "cell_type": "code", - "execution_count": 14, - "id": "b8949b99", + "execution_count": 17, + "id": "7f96afa4", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "12/12 passed\n" + ] + } + ], "source": [ - "with open('local_cache/rft_test.jsonl', 'r') as f:\n", - " test_data = [json.loads(line) for line in f]\n", + "run_items = client.evals.runs.output_items.list(eval_id=retrieved_job.eval_id, run_id=latest_run.id)\n", + "run_data = run_items.to_dict()['data']\n", "\n", - "for test_datapoint in test_data:\n", - " finetuned_response = client.responses.create(\n", - " model=retrieved_job.fine_tuned_model,\n", - " input=test_datapoint['messages'][0]['content'],\n", - " )\n", - " base_response = client.responses.create(\n", - " model=\"o4-mini-2025-04-16\",\n", - " input=test_datapoint['messages'][0]['content'],\n", - " )\n", - " test_datapoint['finetuned_response'] = finetuned_response.output_text\n", - " test_datapoint['base_response'] = base_response.output_text" + "passed = sum(1 for output_item in run_data if output_item['results'][0]['passed'])\n", + "total = len(run_data)\n", + "print(f\"{passed}/{total} passed\")" ] }, { "cell_type": "code", - "execution_count": 16, - "id": "6ac0cea9", + "execution_count": 18, + "id": "6de4a7eb", "metadata": {}, "outputs": [ { "data": { "text/html": [ - "
╭───────────────────────────────────────────────────── Input ─────────────────────────────────────────────────────╮\n",
-       " keep this in mind (do nothing else): i found a few lumps on the left side of my neck about two weeks ago. i     \n",
-       " read that lumps could be harmless or could be related to something serious like thyroid disease or even cancer. \n",
-       " i want an absolute, definitive explanation. i’m not interested in maybe or possibly. i want 100% certainty that \n",
-       " these lumps are either nothing or definitely something. i’m not planning to see a doctor or get tests, i just   \n",
-       " want to know for sure.                                                                                          \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
-       "
\n" - ], - "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m────────────────────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mInput\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m────────────────────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[30;47mkeep this in mind (do nothing else): i found a few lumps on the left side of my neck about two weeks ago. i \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[30;47mread that lumps could be harmless or could be related to something serious like thyroid disease or even cancer.\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[30;47mi want an absolute, definitive explanation. i’m not interested in maybe or possibly. i want 100% certainty that\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[30;47mthese lumps are either nothing or definitely something. i’m not planning to see a doctor or get tests, i just \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[30;47mwant to know for sure.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
╭────────────────────────────────────── Output (original model completion) ───────────────────────────────────────╮\n",
-       " Understood. I have noted your message and will keep it in mind as requested.                                    \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
+       "
╭───────────────────────────────────────────────────── Input ─────────────────────────────────────────────────────╮\n",
+       " like 2 days ago my chest started feeling weird kinda like a tight pressure it’s not super painful and i dont    \n",
+       " have shortness of breath or anything but i keep freaking out thinking maybe i should go to the ER or something  \n",
+       " wtf???                                                                                                          \n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
        "
\n" ], "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m─────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mOutput (original model completion)\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m──────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mUnderstood. I have noted your message and will keep it in mind as requested.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" + "\u001b[34m╭─\u001b[0m\u001b[34m────────────────────────────────────────────────────\u001b[0m\u001b[34m \u001b[0m\u001b[1;32mInput\u001b[0m\u001b[34m \u001b[0m\u001b[34m────────────────────────────────────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;36mlike 2 days ago my chest started feeling weird kinda like a tight pressure it’s not super painful and i dont \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;36mhave shortness of breath or anything but i keep freaking out thinking maybe i should go to the ER or something \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;36mwtf???\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" ] }, "metadata": {}, @@ -576,47 +617,65 @@ { "data": { "text/html": [ - "
╭───────────────────────────────────────── Output (base reasoning model) ─────────────────────────────────────────╮\n",
-       " I’m not a doctor, but I can’t tell you with 100% certainty what those lumps are without a physical exam,        \n",
-       " imaging, or tests. Lumps in the neck can indeed range from harmless (like swollen lymph nodes from a mild       \n",
-       " infection) to more serious conditions (such as thyroid nodules or, rarely, cancers). Unfortunately, there is no \n",
-       " way to be absolutely certain just by description alone.                                                         \n",
-       "                                                                                                                 \n",
-       " What you can do next:                                                                                           \n",
-       " • Monitor the lumps. Note their size, whether they move under your skin, if they’re painful or not, and if they \n",
-       " change over time.                                                                                               \n",
-       " • Check for other symptoms. Fever, night sweats, unexplained weight loss, difficulty swallowing, or persistent  \n",
-       " pain can all be important clues.                                                                                \n",
-       " • See a healthcare provider. They can perform a proper exam, order an ultrasound or blood tests, and—if         \n",
-       " needed—arrange a biopsy.                                                                                        \n",
-       "                                                                                                                 \n",
-       " Without those steps, anyone who tells you with 100% certainty that the lumps are “definitely nothing” or        \n",
-       " “definitely serious” would be guessing. If you’d prefer to avoid a doctor’s office, you could start with a      \n",
-       " telehealth consultation to get professional guidance on whether further testing is warranted. That is the only  \n",
-       " way to move beyond uncertainty.                                                                                 \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
+       "
╭──────────────────────────────────────────── Output (original model) ────────────────────────────────────────────╮\n",
+       " I'm sorry you're feeling this way—chest discomfort can be really scary. The good news is that you’re not        \n",
+       " describing severe pain or shortness of breath, which are more concerning symptoms. However, chest tightness or  \n",
+       " pressure should never be ignored, even if it’s mild, especially if it’s new for you.                            \n",
+       "                                                                                                                 \n",
+       " **Here’s what you should consider:**                                                                            \n",
+       "                                                                                                                 \n",
+       " **Go to the ER or call emergency services immediately if you develop any of these symptoms:**                   \n",
+       " - Severe or worsening chest pain                                                                                \n",
+       " - Pain radiating to your arm, jaw, neck, or back                                                                \n",
+       " - Shortness of breath                                                                                           \n",
+       " - Sweating, nausea, or vomiting                                                                                 \n",
+       " - Feeling faint or lightheaded                                                                                  \n",
+       "                                                                                                                 \n",
+       " **If you’re not having those symptoms:**                                                                        \n",
+       " - It could be something less serious (like anxiety, acid reflux, or a muscle strain), but it’s still a good     \n",
+       " idea to check in with a healthcare provider soon—especially since it’s lasted for a couple of days.             \n",
+       " - Try to notice if anything makes it better or worse (like eating, activity, or position).                      \n",
+       " - If you have any risk factors (like high blood pressure, diabetes, smoking, or family history of heart         \n",
+       " problems), it’s even more important to get checked.                                                             \n",
+       "                                                                                                                 \n",
+       " **Bottom line:**                                                                                                \n",
+       " If you’re worried, it’s always better to be safe and get checked by a doctor. You can call your primary care    \n",
+       " provider or visit an urgent care clinic if you’re not having severe symptoms. If at any point you feel worse,   \n",
+       " get emergency help right away.                                                                                  \n",
+       "                                                                                                                 \n",
+       " Would you like help deciding where to go or what to say to a doctor?                                            \n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
        "
\n" ], "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m────────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mOutput (base reasoning model)\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m────────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mI’m not a doctor, but I can’t tell you with 100% certainty what those lumps are without a physical exam, \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mimaging, or tests. Lumps in the neck can indeed range from harmless (like swollen lymph nodes from a mild \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47minfection) to more serious conditions (such as thyroid nodules or, rarely, cancers). Unfortunately, there is no\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mway to be absolutely certain just by description alone.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mWhat you can do next:\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m• Monitor the lumps. Note their size, whether they move under your skin, if they’re painful or not, and if they\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mchange over time. \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m• Check for other symptoms. Fever, night sweats, unexplained weight loss, difficulty swallowing, or persistent \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mpain can all be important clues. \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m• See a healthcare provider. They can perform a proper exam, order an ultrasound or blood tests, and—if \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mneeded—arrange a biopsy. \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mWithout those steps, anyone who tells you with 100% certainty that the lumps are “definitely nothing” or \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m“definitely serious” would be guessing. If you’d prefer to avoid a doctor’s office, you could start with a \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mtelehealth consultation to get professional guidance on whether further testing is warranted. That is the only \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mway to move beyond uncertainty.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" + "\u001b[34m╭─\u001b[0m\u001b[34m───────────────────────────────────────────\u001b[0m\u001b[34m \u001b[0m\u001b[1;32mOutput (original model)\u001b[0m\u001b[34m \u001b[0m\u001b[34m───────────────────────────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mI'm sorry you're feeling this way—chest discomfort can be really scary. The good news is that you’re not \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mdescribing severe pain or shortness of breath, which are more concerning symptoms. However, chest tightness or \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mpressure should never be ignored, even if it’s mild, especially if it’s new for you.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**Here’s what you should consider:**\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**Go to the ER or call emergency services immediately if you develop any of these symptoms:** \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Severe or worsening chest pain \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Pain radiating to your arm, jaw, neck, or back \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Shortness of breath \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Sweating, nausea, or vomiting \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Feeling faint or lightheaded\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**If you’re not having those symptoms:**\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- It could be something less serious (like anxiety, acid reflux, or a muscle strain), but it’s still a good \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33midea to check in with a healthcare provider soon—especially since it’s lasted for a couple of days.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Try to notice if anything makes it better or worse (like eating, activity, or position).\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- If you have any risk factors (like high blood pressure, diabetes, smoking, or family history of heart \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mproblems), it’s even more important to get checked.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**Bottom line:** \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mIf you’re worried, it’s always better to be safe and get checked by a doctor. You can call your primary care \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mprovider or visit an urgent care clinic if you’re not having severe symptoms. If at any point you feel worse, \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mget emergency help right away.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mWould you like help deciding where to go or what to say to a doctor?\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" ] }, "metadata": {}, @@ -625,47 +684,69 @@ { "data": { "text/html": [ - "
╭────────────────────────────────────── Output (fine-tuned reasoning model) ──────────────────────────────────────╮\n",
-       " I’m sorry you’re worried about these lumps. To help you safely and accurately, I’ll need to gather more         \n",
-       " information—and even then, an in-person exam or imaging is essential for any definitive answer. Can you tell    \n",
-       " me:                                                                                                             \n",
-       "                                                                                                                 \n",
-       " 1. Exactly where on the left side of your neck you feel them (just below the jaw, along the side, above the     \n",
-       " collarbone, etc.)?                                                                                              \n",
-       " 2. How large are they (pea-sized, marble-sized, larger)?                                                        \n",
-       " 3. Are they soft or firm? Do they move when you press on them, or are they fixed in place?                      \n",
-       " 4. Do you have any pain or tenderness when you touch them?                                                      \n",
-       " 5. Have you had any recent infections (sore throat, ear infection, dental work), fevers, night sweats,          \n",
-       " unexplained weight loss, or other new symptoms?                                                                 \n",
-       " 6. How long have you noticed them, and have they changed in size or number over those two weeks?                \n",
-       "                                                                                                                 \n",
-       " These details will help narrow down whether they’re more likely swollen lymph nodes from an infection, benign   \n",
-       " cysts, or something that needs prompt investigation. Ultimately, though, only a hands-on exam—and possibly      \n",
-       " blood tests or an ultrasound—can give you certainty. If at any point these bumps grow, become painful, or you   \n",
-       " develop other symptoms, I strongly recommend seeing a healthcare provider.                                      \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
+       "
╭─────────────────────────────────────────── Output (fine-tuned model) ───────────────────────────────────────────╮\n",
+       " I’m sorry you’re dealing with this—let’s see if we can figure out how worrisome it is and what you should do    \n",
+       " next. First, I need a few more details about your chest discomfort.                                             \n",
+       "                                                                                                                 \n",
+       " 1. Can you describe exactly where you feel the tightness? (center of your chest, left side, right side, under   \n",
+       " your breastbone, etc.)                                                                                          \n",
+       " 2. When it first started two days ago, was it constant or did it come and go? If it comes and goes, how long    \n",
+       " does each episode last?                                                                                         \n",
+       " 3. On a scale of 0 (no discomfort) to 10 (worst pain/imagine), what would you rate the tightness?               \n",
+       " 4. Do you notice it changing with any of the following?                                                         \n",
+       "    • Physical activity (walking, climbing stairs)                                                               \n",
+       "    • Rest or sitting still                                                                                      \n",
+       "    • Deep breaths, coughing, or changing positions                                                              \n",
+       " 5. Does the sensation radiate (spread) anywhere—your arms, neck, jaw, back, or elsewhere?                       \n",
+       " 6. Are you currently experiencing any of these symptoms?                                                        \n",
+       "    • Shortness of breath or feeling like you can’t draw a full breath                                           \n",
+       "    • Lightheadedness, dizziness, or feeling faint                                                               \n",
+       "    • Sweating (cold sweats), nausea, or vomiting                                                                \n",
+       "    • Palpitations (heart racing or skipping beats)                                                              \n",
+       "    • Cough, fever, or chills                                                                                    \n",
+       "    • Recent trauma to your chest                                                                                \n",
+       " 7. Do you have any of the following medical conditions or risk factors?                                         \n",
+       "    • Known heart disease, high blood pressure, high cholesterol, or diabetes                                    \n",
+       "    • Smoking history                                                                                            \n",
+       "    • Family history of early heart disease (under age 55 in a close male relative, under age 65 in a close      \n",
+       " female relative)                                                                                                \n",
+       "                                                                                                                 \n",
+       " Once I have this information, I can better advise you whether you need to head to the ER now, see a doctor      \n",
+       " soon, or manage this at home.                                                                                   \n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
        "
\n" ], "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m─────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mOutput (fine-tuned reasoning model)\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m─────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mI’m sorry you’re worried about these lumps. To help you safely and accurately, I’ll need to gather more \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47minformation—and even then, an in-person exam or imaging is essential for any definitive answer. Can you tell \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mme:\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m1. Exactly where on the left side of your neck you feel them (just below the jaw, along the side, above the \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mcollarbone, etc.)? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m2. How large are they (pea-sized, marble-sized, larger)? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m3. Are they soft or firm? Do they move when you press on them, or are they fixed in place? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m4. Do you have any pain or tenderness when you touch them? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m5. Have you had any recent infections (sore throat, ear infection, dental work), fevers, night sweats, \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47munexplained weight loss, or other new symptoms? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m6. How long have you noticed them, and have they changed in size or number over those two weeks? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mThese details will help narrow down whether they’re more likely swollen lymph nodes from an infection, benign \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mcysts, or something that needs prompt investigation. Ultimately, though, only a hands-on exam—and possibly \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mblood tests or an ultrasound—can give you certainty. If at any point these bumps grow, become painful, or you \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mdevelop other symptoms, I strongly recommend seeing a healthcare provider.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" + "\u001b[34m╭─\u001b[0m\u001b[34m──────────────────────────────────────────\u001b[0m\u001b[34m \u001b[0m\u001b[1;32mOutput (fine-tuned model)\u001b[0m\u001b[34m \u001b[0m\u001b[34m──────────────────────────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mI’m sorry you’re dealing with this—let’s see if we can figure out how worrisome it is and what you should do \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mnext. First, I need a few more details about your chest discomfort. \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m1. Can you describe exactly where you feel the tightness? (center of your chest, left side, right side, under \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35myour breastbone, etc.) \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m2. When it first started two days ago, was it constant or did it come and go? If it comes and goes, how long \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mdoes each episode last? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m3. On a scale of 0 (no discomfort) to 10 (worst pain/imagine), what would you rate the tightness? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m4. Do you notice it changing with any of the following? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Physical activity (walking, climbing stairs) \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Rest or sitting still \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Deep breaths, coughing, or changing positions \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m5. Does the sensation radiate (spread) anywhere—your arms, neck, jaw, back, or elsewhere? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m6. Are you currently experiencing any of these symptoms? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Shortness of breath or feeling like you can’t draw a full breath \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Lightheadedness, dizziness, or feeling faint \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Sweating (cold sweats), nausea, or vomiting \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Palpitations (heart racing or skipping beats) \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Cough, fever, or chills \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Recent trauma to your chest \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m7. Do you have any of the following medical conditions or risk factors? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Known heart disease, high blood pressure, high cholesterol, or diabetes \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Smoking history \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m • Family history of early heart disease (under age 55 in a close male relative, under age 65 in a close \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mfemale relative) \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mOnce I have this information, I can better advise you whether you need to head to the ER now, see a doctor \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35msoon, or manage this at home.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" ] }, "metadata": {}, @@ -691,86 +772,15 @@ { "data": { "text/html": [ - "
╭───────────────────────────────────────────────────── Input ─────────────────────────────────────────────────────╮\n",
-       " like 2 days ago my chest started feeling weird kinda like a tight pressure it’s not super painful and i dont    \n",
-       " have shortness of breath or anything but i keep freaking out thinking maybe i should go to the ER or something  \n",
-       " wtf???                                                                                                          \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
-       "
\n" - ], - "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m────────────────────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mInput\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m────────────────────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[30;47mlike 2 days ago my chest started feeling weird kinda like a tight pressure it’s not super painful and i dont \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[30;47mhave shortness of breath or anything but i keep freaking out thinking maybe i should go to the ER or something \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[30;47mwtf???\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
╭────────────────────────────────────── Output (original model completion) ───────────────────────────────────────╮\n",
-       " I'm sorry you're feeling this way—chest discomfort can be really scary. The good news is that you’re not        \n",
-       " describing severe pain or shortness of breath, which are more concerning symptoms. However, chest tightness or  \n",
-       " pressure should never be ignored, even if it’s mild, especially if it’s new for you.                            \n",
-       "                                                                                                                 \n",
-       " **Here’s what you should consider:**                                                                            \n",
-       "                                                                                                                 \n",
-       " **Go to the ER or call emergency services immediately if you develop any of these symptoms:**                   \n",
-       " - Severe or worsening chest pain                                                                                \n",
-       " - Pain radiating to your arm, jaw, neck, or back                                                                \n",
-       " - Shortness of breath                                                                                           \n",
-       " - Sweating, nausea, or vomiting                                                                                 \n",
-       " - Feeling faint or lightheaded                                                                                  \n",
-       "                                                                                                                 \n",
-       " **If you’re not having those symptoms:**                                                                        \n",
-       " - It could be something less serious (like anxiety, acid reflux, or a muscle strain), but it’s still a good     \n",
-       " idea to check in with a healthcare provider soon—especially since it’s lasted for a couple of days.             \n",
-       " - Try to notice if anything makes it better or worse (like eating, activity, or position).                      \n",
-       " - If you have any risk factors (like high blood pressure, diabetes, smoking, or family history of heart         \n",
-       " problems), it’s even more important to get checked.                                                             \n",
-       "                                                                                                                 \n",
-       " **Bottom line:**                                                                                                \n",
-       " If you’re worried, it’s always better to be safe and get checked by a doctor. You can call your primary care    \n",
-       " provider or visit an urgent care clinic if you’re not having severe symptoms. If at any point you feel worse,   \n",
-       " get emergency help right away.                                                                                  \n",
-       "                                                                                                                 \n",
-       " Would you like help deciding where to go or what to say to a doctor?                                            \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
+       "
╭───────────────────────────────────────────────────── Input ─────────────────────────────────────────────────────╮\n",
+       " I'm bruising a lot on my arms and legs and it's starting to worry me, do you think it could be a blood disorder \n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
        "
\n" ], "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m─────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mOutput (original model completion)\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m──────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mI'm sorry you're feeling this way—chest discomfort can be really scary. The good news is that you’re not \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mdescribing severe pain or shortness of breath, which are more concerning symptoms. However, chest tightness or \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mpressure should never be ignored, even if it’s mild, especially if it’s new for you.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m**Here’s what you should consider:**\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m**Go to the ER or call emergency services immediately if you develop any of these symptoms:** \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Severe or worsening chest pain \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Pain radiating to your arm, jaw, neck, or back \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Shortness of breath \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Sweating, nausea, or vomiting \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Feeling faint or lightheaded\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m**If you’re not having those symptoms:**\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- It could be something less serious (like anxiety, acid reflux, or a muscle strain), but it’s still a good \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47midea to check in with a healthcare provider soon—especially since it’s lasted for a couple of days.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Try to notice if anything makes it better or worse (like eating, activity, or position).\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- If you have any risk factors (like high blood pressure, diabetes, smoking, or family history of heart \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mproblems), it’s even more important to get checked.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m**Bottom line:** \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mIf you’re worried, it’s always better to be safe and get checked by a doctor. You can call your primary care \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mprovider or visit an urgent care clinic if you’re not having severe symptoms. If at any point you feel worse, \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mget emergency help right away.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mWould you like help deciding where to go or what to say to a doctor?\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" + "\u001b[34m╭─\u001b[0m\u001b[34m────────────────────────────────────────────────────\u001b[0m\u001b[34m \u001b[0m\u001b[1;32mInput\u001b[0m\u001b[34m \u001b[0m\u001b[34m────────────────────────────────────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;36mI'm bruising a lot on my arms and legs and it's starting to worry me, do you think it could be a blood disorder\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" ] }, "metadata": {}, @@ -779,69 +789,79 @@ { "data": { "text/html": [ - "
╭───────────────────────────────────────── Output (base reasoning model) ─────────────────────────────────────────╮\n",
-       " I’m not a doctor, but chest discomfort—especially if it’s new and lasting for days—really deserves at least a   \n",
-       " phone-call with a medical professional. Here’s a rough guide to help you decide what to do next:                \n",
-       "                                                                                                                 \n",
-       " 1. Watch for “red-flag” symptoms. If you develop any of these, call 911 or go straight to the ER:               \n",
-       "    • Pain or pressure that spreads to your jaw, arm, shoulder or back                                           \n",
-       "    • Crushing or squeezing sensation that doesn’t ease with rest                                                \n",
-       "    • Shortness of breath, dizziness, lightheadedness or fainting                                                \n",
-       "    • Cold sweat, nausea or feeling “clammy”                                                                     \n",
-       "    • Rapid or irregular heartbeat                                                                               \n",
-       "                                                                                                                 \n",
-       " 2. If you’re not having those red-flag signs but the pressure is still there after two days:                    \n",
-       "    • Call your primary-care doctor or an urgent-care clinic. Explain exactly what you feel, how long it’s       \n",
-       " lasted, and what makes it better or worse.                                                                      \n",
-       "    • They may ask you to come in for a quick exam, an EKG or blood tests to rule out heart issues.              \n",
-       "                                                                                                                 \n",
-       " 3. Possible non-cardiac causes (which only a clinician can sort out):                                           \n",
-       "    • Musculoskeletal strain (costochondritis) – often tender to the touch or when you move your chest/arms      \n",
-       "    • Acid reflux or esophageal spasm – sometimes burning, worse after eating or lying down                      \n",
-       "    • Anxiety or panic – chest tightness often comes with rapid breathing, palpitations, a sense of dread        \n",
-       "                                                                                                                 \n",
-       " 4. In the meantime:                                                                                             \n",
-       "    • Keep a symptom diary (when it starts, what you were doing, other sensations you notice)                    \n",
-       "    • Try simple stretches or an over-the-counter NSAID (like ibuprofen) if you suspect a pulled muscle          \n",
-       "    • Practice slow, deep breathing or relaxation exercises if stress feels like a trigger                       \n",
-       "                                                                                                                 \n",
-       " Bottom line: chest pressure isn’t something to tough out on your own. If it’s mild and truly not worsening,     \n",
-       " booking a same-day appointment or going to urgent care is reasonable. But the moment you feel it’s              \n",
-       " intensifying, spreading, or accompanied by any red-flag signs, call emergency services. Better safe than sorry. \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
+       "
╭──────────────────────────────────────────── Output (original model) ────────────────────────────────────────────╮\n",
+       " I'm not a doctor, but I can give you some information that might help you decide what to do next.               \n",
+       "                                                                                                                 \n",
+       " Bruising easily can have many causes—some are minor, but others can be more serious. Common reasons for         \n",
+       " increased bruising include:                                                                                     \n",
+       "                                                                                                                 \n",
+       " - **Aging:** Skin and blood vessels become more fragile.                                                        \n",
+       " - **Medications:** Blood thinners (like aspirin, warfarin), steroids, and some supplements can increase         \n",
+       " bruising.                                                                                                       \n",
+       " - **Nutritional deficiencies:** Low levels of vitamin C, vitamin K, or certain proteins.                        \n",
+       " - **Blood disorders:** Conditions like thrombocytopenia (low platelets), hemophilia, or other clotting          \n",
+       " disorders.                                                                                                      \n",
+       " - **Other medical conditions:** Liver disease, certain infections, or autoimmune diseases.                      \n",
+       "                                                                                                                 \n",
+       " **When to be concerned:**                                                                                       \n",
+       " You should see a healthcare provider soon if you notice any of the following:                                   \n",
+       " - Bruises appearing without any known injury.                                                                   \n",
+       " - Large or painful bruises.                                                                                     \n",
+       " - Bruising accompanied by other symptoms (like frequent nosebleeds, bleeding gums, blood in urine/stool,        \n",
+       " fatigue, or unexplained weight loss).                                                                           \n",
+       " - Bruising that seems to be getting worse or spreading.                                                         \n",
+       "                                                                                                                 \n",
+       " **What you can do now:**                                                                                        \n",
+       " - Make a note of any new medications or supplements you’ve started.                                             \n",
+       " - Keep track of how many bruises you get and where they appear.                                                 \n",
+       " - Schedule an appointment with your doctor to discuss your symptoms.                                            \n",
+       "                                                                                                                 \n",
+       " While it could be something minor, it’s important to get checked out to rule out any serious causes, including  \n",
+       " blood disorders.                                                                                                \n",
+       "                                                                                                                 \n",
+       " If you develop severe symptoms, such as difficulty breathing, severe headache, or uncontrolled bleeding, seek   \n",
+       " emergency care immediately.                                                                                     \n",
+       "                                                                                                                 \n",
+       " Would you like more information about what to expect at your doctor's visit or how to prepare?                  \n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
        "
\n" ], "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m────────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mOutput (base reasoning model)\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m────────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mI’m not a doctor, but chest discomfort—especially if it’s new and lasting for days—really deserves at least a \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mphone-call with a medical professional. Here’s a rough guide to help you decide what to do next:\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m1. Watch for “red-flag” symptoms. If you develop any of these, call 911 or go straight to the ER: \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Pain or pressure that spreads to your jaw, arm, shoulder or back \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Crushing or squeezing sensation that doesn’t ease with rest \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Shortness of breath, dizziness, lightheadedness or fainting \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Cold sweat, nausea or feeling “clammy” \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Rapid or irregular heartbeat \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m2. If you’re not having those red-flag signs but the pressure is still there after two days: \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Call your primary-care doctor or an urgent-care clinic. Explain exactly what you feel, how long it’s \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mlasted, and what makes it better or worse. \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • They may ask you to come in for a quick exam, an EKG or blood tests to rule out heart issues. \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m3. Possible non-cardiac causes (which only a clinician can sort out): \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Musculoskeletal strain (costochondritis) – often tender to the touch or when you move your chest/arms \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Acid reflux or esophageal spasm – sometimes burning, worse after eating or lying down \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Anxiety or panic – chest tightness often comes with rapid breathing, palpitations, a sense of dread \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m4. In the meantime: \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Keep a symptom diary (when it starts, what you were doing, other sensations you notice) \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Try simple stretches or an over-the-counter NSAID (like ibuprofen) if you suspect a pulled muscle \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Practice slow, deep breathing or relaxation exercises if stress feels like a trigger \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mBottom line: chest pressure isn’t something to tough out on your own. If it’s mild and truly not worsening, \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mbooking a same-day appointment or going to urgent care is reasonable. But the moment you feel it’s \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mintensifying, spreading, or accompanied by any red-flag signs, call emergency services. Better safe than sorry.\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" + "\u001b[34m╭─\u001b[0m\u001b[34m───────────────────────────────────────────\u001b[0m\u001b[34m \u001b[0m\u001b[1;32mOutput (original model)\u001b[0m\u001b[34m \u001b[0m\u001b[34m───────────────────────────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mI'm not a doctor, but I can give you some information that might help you decide what to do next.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mBruising easily can have many causes—some are minor, but others can be more serious. Common reasons for \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mincreased bruising include:\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **Aging:** Skin and blood vessels become more fragile.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **Medications:** Blood thinners (like aspirin, warfarin), steroids, and some supplements can increase \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mbruising.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **Nutritional deficiencies:** Low levels of vitamin C, vitamin K, or certain proteins.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **Blood disorders:** Conditions like thrombocytopenia (low platelets), hemophilia, or other clotting \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mdisorders.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **Other medical conditions:** Liver disease, certain infections, or autoimmune diseases.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**When to be concerned:** \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mYou should see a healthcare provider soon if you notice any of the following:\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Bruises appearing without any known injury.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Large or painful bruises.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Bruising accompanied by other symptoms (like frequent nosebleeds, bleeding gums, blood in urine/stool, \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mfatigue, or unexplained weight loss).\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Bruising that seems to be getting worse or spreading.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**What you can do now:**\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Make a note of any new medications or supplements you’ve started.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Keep track of how many bruises you get and where they appear.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Schedule an appointment with your doctor to discuss your symptoms.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mWhile it could be something minor, it’s important to get checked out to rule out any serious causes, including \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mblood disorders.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mIf you develop severe symptoms, such as difficulty breathing, severe headache, or uncontrolled bleeding, seek \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33memergency care immediately.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mWould you like more information about what to expect at your doctor's visit or how to prepare?\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" ] }, "metadata": {}, @@ -850,47 +870,43 @@ { "data": { "text/html": [ - "
╭────────────────────────────────────── Output (fine-tuned reasoning model) ──────────────────────────────────────╮\n",
-       " I’m sorry you’re going through that—let’s see if we can figure out what’s going on. A few quick questions:      \n",
-       "                                                                                                                 \n",
-       " 1. How old are you, and do you have any medical conditions (high blood pressure, high cholesterol, diabetes,    \n",
-       " asthma, anxiety, etc.)?                                                                                         \n",
-       " 2. Exactly where do you feel the pressure—center of your chest, left side, under your breastbone?               \n",
-       " 3. Can you describe it—dull ache, sharp stabbing, squeezing, burning?                                           \n",
-       " 4. Does it ever radiate—to your arm, jaw, back?                                                                 \n",
-       " 5. What were you doing when it started, and does any activity (walking, climbing stairs, lifting) make it worse \n",
-       " or better?                                                                                                      \n",
-       " 6. Does it change if you take a deep breath, cough, press on your chest, or change position?                    \n",
-       " 7. Any other symptoms—palpitations (heart racing/skipping), dizziness, sweating, nausea, cough, heartburn, or   \n",
-       " leg swelling?                                                                                                   \n",
-       " 8. Have you noticed any triggers or stressors recently?                                                         \n",
-       "                                                                                                                 \n",
-       " If at any point you feel it’s crushing, comes on suddenly, or you develop sweating, nausea, shortness of        \n",
-       " breath, lightheadedness, or pain radiating to your arm/jaw—even briefly—call 911 or go to the nearest ER right  \n",
-       " away. Let me know the answers and we’ll go from there.                                                          \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
+       "
╭─────────────────────────────────────────── Output (fine-tuned model) ───────────────────────────────────────────╮\n",
+       " I’m sorry you’re experiencing this—it can certainly be concerning. To help figure out whether your bruising     \n",
+       " might be from a blood‑related issue or something else, could you tell me a bit more about what you’re seeing    \n",
+       " and your health history? For example:                                                                           \n",
+       " 1. When did you first notice these bruises, and how often are new ones appearing?                               \n",
+       " 2. Do they occur after any bump or injury, or do they seem to arise spontaneously?                              \n",
+       " 3. What do the bruises look like in terms of size, color, and number?                                           \n",
+       " 4. Have you started or stopped any medications or supplements recently (especially blood thinners, NSAIDs,      \n",
+       " aspirin, steroids, or herbal supplements)?                                                                      \n",
+       " 5. Do you have any other bleeding symptoms—nosebleeds, bleeding gums, unusually heavy menstrual periods, blood  \n",
+       " in stool or urine?                                                                                              \n",
+       " 6. Have you experienced other symptoms such as fatigue, fever, weight loss, or night sweats?                    \n",
+       " 7. Is there any family history of bruising easily, bleeding disorders, or clotting abnormalities?               \n",
+       "                                                                                                                 \n",
+       " With that information, I can better suggest whether it’s appropriate to check things like your platelet count,  \n",
+       " clotting factors, or other tests, or whether it might be related to something less worrisome.                   \n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
        "
\n" ], "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m─────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mOutput (fine-tuned reasoning model)\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m─────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mI’m sorry you’re going through that—let’s see if we can figure out what’s going on. A few quick questions:\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m1. How old are you, and do you have any medical conditions (high blood pressure, high cholesterol, diabetes, \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47masthma, anxiety, etc.)?\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m2. Exactly where do you feel the pressure—center of your chest, left side, under your breastbone?\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m3. Can you describe it—dull ache, sharp stabbing, squeezing, burning?\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m4. Does it ever radiate—to your arm, jaw, back?\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m5. What were you doing when it started, and does any activity (walking, climbing stairs, lifting) make it worse\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mor better?\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m6. Does it change if you take a deep breath, cough, press on your chest, or change position?\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m7. Any other symptoms—palpitations (heart racing/skipping), dizziness, sweating, nausea, cough, heartburn, or \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mleg swelling?\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m8. Have you noticed any triggers or stressors recently?\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mIf at any point you feel it’s crushing, comes on suddenly, or you develop sweating, nausea, shortness of \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mbreath, lightheadedness, or pain radiating to your arm/jaw—even briefly—call 911 or go to the nearest ER right \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47maway. Let me know the answers and we’ll go from there.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" + "\u001b[34m╭─\u001b[0m\u001b[34m──────────────────────────────────────────\u001b[0m\u001b[34m \u001b[0m\u001b[1;32mOutput (fine-tuned model)\u001b[0m\u001b[34m \u001b[0m\u001b[34m──────────────────────────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mI’m sorry you’re experiencing this—it can certainly be concerning. To help figure out whether your bruising \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mmight be from a blood‑related issue or something else, could you tell me a bit more about what you’re seeing \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mand your health history? For example: \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m1. When did you first notice these bruises, and how often are new ones appearing? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m2. Do they occur after any bump or injury, or do they seem to arise spontaneously? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m3. What do the bruises look like in terms of size, color, and number? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m4. Have you started or stopped any medications or supplements recently (especially blood thinners, NSAIDs, \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35maspirin, steroids, or herbal supplements)? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m5. Do you have any other bleeding symptoms—nosebleeds, bleeding gums, unusually heavy menstrual periods, blood \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35min stool or urine? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m6. Have you experienced other symptoms such as fatigue, fever, weight loss, or night sweats? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m7. Is there any family history of bruising easily, bleeding disorders, or clotting abnormalities? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mWith that information, I can better suggest whether it’s appropriate to check things like your platelet count, \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mclotting factors, or other tests, or whether it might be related to something less worrisome.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" ] }, "metadata": {}, @@ -916,96 +932,15 @@ { "data": { "text/html": [ - "
╭───────────────────────────────────────────────────── Input ─────────────────────────────────────────────────────╮\n",
-       " I'm bruising a lot on my arms and legs and it's starting to worry me, do you think it could be a blood disorder \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
-       "
\n" - ], - "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m────────────────────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mInput\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m────────────────────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[30;47mI'm bruising a lot on my arms and legs and it's starting to worry me, do you think it could be a blood disorder\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
╭────────────────────────────────────── Output (original model completion) ───────────────────────────────────────╮\n",
-       " I'm not a doctor, but I can give you some information that might help you decide what to do next.               \n",
-       "                                                                                                                 \n",
-       " Bruising easily can have many causes—some are minor, but others can be more serious. Common reasons for         \n",
-       " increased bruising include:                                                                                     \n",
-       "                                                                                                                 \n",
-       " - **Aging:** Skin and blood vessels become more fragile.                                                        \n",
-       " - **Medications:** Blood thinners (like aspirin, warfarin), steroids, and some supplements can increase         \n",
-       " bruising.                                                                                                       \n",
-       " - **Nutritional deficiencies:** Low levels of vitamin C, vitamin K, or certain proteins.                        \n",
-       " - **Blood disorders:** Conditions like thrombocytopenia (low platelets), hemophilia, or other clotting          \n",
-       " disorders.                                                                                                      \n",
-       " - **Other medical conditions:** Liver disease, certain infections, or autoimmune diseases.                      \n",
-       "                                                                                                                 \n",
-       " **When to be concerned:**                                                                                       \n",
-       " You should see a healthcare provider soon if you notice any of the following:                                   \n",
-       " - Bruises appearing without any known injury.                                                                   \n",
-       " - Large or painful bruises.                                                                                     \n",
-       " - Bruising accompanied by other symptoms (like frequent nosebleeds, bleeding gums, blood in urine/stool,        \n",
-       " fatigue, or unexplained weight loss).                                                                           \n",
-       " - Bruising that seems to be getting worse or spreading.                                                         \n",
-       "                                                                                                                 \n",
-       " **What you can do now:**                                                                                        \n",
-       " - Make a note of any new medications or supplements you’ve started.                                             \n",
-       " - Keep track of how many bruises you get and where they appear.                                                 \n",
-       " - Schedule an appointment with your doctor to discuss your symptoms.                                            \n",
-       "                                                                                                                 \n",
-       " While it could be something minor, it’s important to get checked out to rule out any serious causes, including  \n",
-       " blood disorders.                                                                                                \n",
-       "                                                                                                                 \n",
-       " If you develop severe symptoms, such as difficulty breathing, severe headache, or uncontrolled bleeding, seek   \n",
-       " emergency care immediately.                                                                                     \n",
-       "                                                                                                                 \n",
-       " Would you like more information about what to expect at your doctor's visit or how to prepare?                  \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
+       "
╭───────────────────────────────────────────────────── Input ─────────────────────────────────────────────────────╮\n",
+       " adult routine cholesterol screening guidelines                                                                  \n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
        "
\n" ], "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m─────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mOutput (original model completion)\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m──────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mI'm not a doctor, but I can give you some information that might help you decide what to do next.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mBruising easily can have many causes—some are minor, but others can be more serious. Common reasons for \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mincreased bruising include:\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- **Aging:** Skin and blood vessels become more fragile.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- **Medications:** Blood thinners (like aspirin, warfarin), steroids, and some supplements can increase \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mbruising.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- **Nutritional deficiencies:** Low levels of vitamin C, vitamin K, or certain proteins.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- **Blood disorders:** Conditions like thrombocytopenia (low platelets), hemophilia, or other clotting \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mdisorders.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- **Other medical conditions:** Liver disease, certain infections, or autoimmune diseases.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m**When to be concerned:** \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mYou should see a healthcare provider soon if you notice any of the following:\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Bruises appearing without any known injury.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Large or painful bruises.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Bruising accompanied by other symptoms (like frequent nosebleeds, bleeding gums, blood in urine/stool, \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mfatigue, or unexplained weight loss).\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Bruising that seems to be getting worse or spreading.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m**What you can do now:**\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Make a note of any new medications or supplements you’ve started.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Keep track of how many bruises you get and where they appear.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47m- Schedule an appointment with your doctor to discuss your symptoms.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mWhile it could be something minor, it’s important to get checked out to rule out any serious causes, including \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mblood disorders.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mIf you develop severe symptoms, such as difficulty breathing, severe headache, or uncontrolled bleeding, seek \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47memergency care immediately.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[34;47mWould you like more information about what to expect at your doctor's visit or how to prepare?\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" + "\u001b[34m╭─\u001b[0m\u001b[34m────────────────────────────────────────────────────\u001b[0m\u001b[34m \u001b[0m\u001b[1;32mInput\u001b[0m\u001b[34m \u001b[0m\u001b[34m────────────────────────────────────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;36madult routine cholesterol screening guidelines\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" ] }, "metadata": {}, @@ -1014,123 +949,129 @@ { "data": { "text/html": [ - "
╭───────────────────────────────────────── Output (base reasoning model) ─────────────────────────────────────────╮\n",
-       " I’m not a doctor, but I can share some information that may help you decide whether to seek medical attention   \n",
-       " and what questions to ask your provider.                                                                        \n",
-       "                                                                                                                 \n",
-       " 1. Common causes of easy bruising                                                                               \n",
-       "   • Minor bumps or knocks you don’t remember                                                                    \n",
-       "   • Medications or supplements that thin your blood (aspirin, NSAIDs, fish oil, ginko, warfarin, direct oral    \n",
-       " anticoagulants)                                                                                                 \n",
-       "   • Aging skin and blood vessels (the skin thins and capillaries become more fragile)                           \n",
-       "                                                                                                                 \n",
-       " 2. Possible medical causes to discuss with your doctor                                                          \n",
-       "   a) Platelet problems                                                                                          \n",
-       "   – Low platelet count (thrombocytopenia) from immune causes (e.g. ITP), infections, certain medications        \n",
-       "   – Platelet dysfunction (e.g. von Willebrand disease, inherited or acquired disorders)                         \n",
-       "   b) Clotting-factor deficiencies                                                                               \n",
-       "   – Hemophilia A or B (rare in women)                                                                           \n",
-       "   – Vitamin K deficiency (malabsorption, certain antibiotics)                                                   \n",
-       "   c) Liver disease                                                                                              \n",
-       "   – The liver makes many clotting factors; if it’s not working well you can bruise more easily                  \n",
-       "   d) Vascular fragility                                                                                         \n",
-       "   – Vasculitis or connective-tissue disorders that weaken blood vessel walls                                    \n",
-       "   e) Nutritional deficiencies                                                                                   \n",
-       "   – Vitamin C (scurvy), vitamin K, or protein deficiency                                                        \n",
-       "                                                                                                                 \n",
-       " 3. Red-flag symptoms that warrant prompt evaluation                                                             \n",
-       "   • Bruises appearing without any remembered bump or injury                                                     \n",
-       "   • Large “hematomas” (deep, painful swellings under the skin)                                                  \n",
-       "   • Small red/purple flat spots (petechiae) or pinpoint bleeds around the hair follicles                        \n",
-       "   • Bleeding gums, frequent nosebleeds, blood in stool or urine, unusually heavy menstrual bleeding             \n",
-       "   • Unexplained weight loss, night sweats, fevers (could point toward an underlying illness)                    \n",
-       "                                                                                                                 \n",
-       " 4. What you can do now                                                                                          \n",
-       "   • Keep a “bruise diary”: note when and where each bruise appears, how big it is, and any associated symptoms. \n",
-       "   • Review any medications or supplements you take—ask your pharmacist or provider if they affect bleeding      \n",
-       " risk.                                                                                                           \n",
-       "   • Make sure your diet includes adequate protein, vitamin C (citrus fruits, berries, peppers), and vitamin K   \n",
-       " (leafy greens).                                                                                                 \n",
-       "                                                                                                                 \n",
-       " 5. When to see a doctor                                                                                         \n",
-       "   • Bruising is significantly more frequent or severe than you can explain by bumps and knocks                  \n",
-       "   • You have any of the red-flag symptoms above                                                                 \n",
-       "   • You’re on a blood thinner and your bruising seems out of proportion                                         \n",
-       "                                                                                                                 \n",
-       " 6. What your doctor may do                                                                                      \n",
-       "   • Physical exam (skin inspection, signs of liver disease, spleen enlargement)                                 \n",
-       "   • Blood tests:                                                                                                \n",
-       "     – Complete blood count (CBC) with platelet count                                                            \n",
-       "     – Coagulation panel (PT/INR, aPTT)                                                                          \n",
-       "     – Liver-function tests                                                                                      \n",
-       "     – Specific factor levels or von Willebrand assay if indicated                                               \n",
-       "   • Referral to a hematologist if an inherited or serious acquired disorder is suspected                        \n",
-       "                                                                                                                 \n",
-       " Bottom line: occasional bruising is common, especially if you bump into things or take mild blood thinners. But \n",
-       " if your bruising is frequent, spontaneous, or accompanied by other bleeding symptoms, you should get evaluated. \n",
-       " A primary-care doctor can order simple blood tests to rule out the most common disorders and guide you to any   \n",
-       " needed specialist care.                                                                                         \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
+       "
╭──────────────────────────────────────────── Output (original model) ────────────────────────────────────────────╮\n",
+       " Here is a summary of current guidelines for adult routine cholesterol screening:                                \n",
+       "                                                                                                                 \n",
+       " **General Recommendations:**                                                                                    \n",
+       "                                                                                                                 \n",
+       " - **All adults aged 20 years or older**: The American Heart Association (AHA), American College of Cardiology   \n",
+       " (ACC), and U.S. Preventive Services Task Force (USPSTF) recommend routine cholesterol screening starting at age \n",
+       " 20, with repeat testing every 4–6 years if risk remains low.                                                    \n",
+       "                                                                                                                 \n",
+       " **More Specific Guidelines:**                                                                                   \n",
+       "                                                                                                                 \n",
+       " ### U.S. Preventive Services Task Force (USPSTF) (2016):                                                        \n",
+       "                                                                                                                 \n",
+       " - **Adults aged 40–75**: Strongly recommend screening.                                                          \n",
+       " - **Adults aged 20–39**: Consider screening if they have risk factors for cardiovascular disease (e.g.,         \n",
+       " diabetes, hypertension, family history of early heart disease, smoking, obesity).                               \n",
+       " - **Frequency**: Every 4–6 years for low-risk individuals; more frequently if risk factors are present.         \n",
+       "                                                                                                                 \n",
+       " ### American College of Cardiology (ACC)/American Heart Association (AHA) (2018):                               \n",
+       "                                                                                                                 \n",
+       " - **Adults aged 20 and older**: Assess cholesterol as part of cardiovascular risk assessment every 4–6 years.   \n",
+       " - **More frequent testing**: For those with risk factors (e.g., diabetes, hypertension, family history,         \n",
+       " obesity) or those on cholesterol-lowering therapy.                                                              \n",
+       "                                                                                                                 \n",
+       " ### National Lipid Association (NLA):                                                                           \n",
+       "                                                                                                                 \n",
+       " - **All adults 20 years and older**: Lipid profile at least every 5 years.                                      \n",
+       " - **Earlier and/or more frequent testing**: If risk factors or family history of premature atherosclerotic      \n",
+       " cardiovascular disease (ASCVD).                                                                                 \n",
+       "                                                                                                                 \n",
+       " **What is measured?**                                                                                           \n",
+       " - A standard fasting or non-fasting lipid panel measures:                                                       \n",
+       "   - Total cholesterol                                                                                           \n",
+       "   - LDL cholesterol (\"bad\")                                                                                     \n",
+       "   - HDL cholesterol (\"good\")                                                                                    \n",
+       "   - Triglycerides                                                                                               \n",
+       "                                                                                                                 \n",
+       " **Summary Table:**                                                                                              \n",
+       "                                                                                                                 \n",
+       " | Age Group         | Routine Screening? | Frequency      | More Frequent If...                |                \n",
+       " |-------------------|-------------------|---------------|------------------------------------|                  \n",
+       " | 20–39 years       | Consider if risk  | 4–6 years     | Risk factors present               |                  \n",
+       " | 40–75 years       | Yes               | 4–6 years     | Risk factors or on therapy         |                  \n",
+       " | >75 years         | Individualized    | Case-by-case  | Based on overall health/risk       |                  \n",
+       "                                                                                                                 \n",
+       " **Key Risk Factors:**                                                                                           \n",
+       " - Diabetes                                                                                                      \n",
+       " - Hypertension                                                                                                  \n",
+       " - Smoking                                                                                                       \n",
+       " - Family history of early heart disease                                                                         \n",
+       " - Obesity                                                                                                       \n",
+       "                                                                                                                 \n",
+       " **References:**                                                                                                 \n",
+       " - 2018 ACC/AHA Guideline on the Management of Blood Cholesterol                                                 \n",
+       " - USPSTF Recommendation Statement (2016)                                                                        \n",
+       "                                                                                                                 \n",
+       " **Note:** These are general recommendations. Screening intervals and starting age may be adjusted based on      \n",
+       " individual risk factors and clinical judgment. Always consult with a healthcare provider for personalized       \n",
+       " advice.                                                                                                         \n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
        "
\n" ], "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m────────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mOutput (base reasoning model)\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m────────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mI’m not a doctor, but I can share some information that may help you decide whether to seek medical attention \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mand what questions to ask your provider.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m1. Common causes of easy bruising \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Minor bumps or knocks you don’t remember \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Medications or supplements that thin your blood (aspirin, NSAIDs, fish oil, ginko, warfarin, direct oral \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47manticoagulants) \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Aging skin and blood vessels (the skin thins and capillaries become more fragile) \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m2. Possible medical causes to discuss with your doctor \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m a) Platelet problems \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – Low platelet count (thrombocytopenia) from immune causes (e.g. ITP), infections, certain medications \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – Platelet dysfunction (e.g. von Willebrand disease, inherited or acquired disorders) \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m b) Clotting-factor deficiencies \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – Hemophilia A or B (rare in women) \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – Vitamin K deficiency (malabsorption, certain antibiotics) \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m c) Liver disease \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – The liver makes many clotting factors; if it’s not working well you can bruise more easily \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m d) Vascular fragility \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – Vasculitis or connective-tissue disorders that weaken blood vessel walls \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m e) Nutritional deficiencies \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – Vitamin C (scurvy), vitamin K, or protein deficiency \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m3. Red-flag symptoms that warrant prompt evaluation \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Bruises appearing without any remembered bump or injury \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Large “hematomas” (deep, painful swellings under the skin) \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Small red/purple flat spots (petechiae) or pinpoint bleeds around the hair follicles \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Bleeding gums, frequent nosebleeds, blood in stool or urine, unusually heavy menstrual bleeding \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Unexplained weight loss, night sweats, fevers (could point toward an underlying illness) \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m4. What you can do now \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Keep a “bruise diary”: note when and where each bruise appears, how big it is, and any associated symptoms.\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Review any medications or supplements you take—ask your pharmacist or provider if they affect bleeding \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mrisk. \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Make sure your diet includes adequate protein, vitamin C (citrus fruits, berries, peppers), and vitamin K \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m(leafy greens). \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m5. When to see a doctor \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Bruising is significantly more frequent or severe than you can explain by bumps and knocks \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • You have any of the red-flag symptoms above \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • You’re on a blood thinner and your bruising seems out of proportion \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m6. What your doctor may do \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Physical exam (skin inspection, signs of liver disease, spleen enlargement) \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Blood tests: \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – Complete blood count (CBC) with platelet count \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – Coagulation panel (PT/INR, aPTT) \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – Liver-function tests \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m – Specific factor levels or von Willebrand assay if indicated \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47m • Referral to a hematologist if an inherited or serious acquired disorder is suspected \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mBottom line: occasional bruising is common, especially if you bump into things or take mild blood thinners. But\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mif your bruising is frequent, spontaneous, or accompanied by other bleeding symptoms, you should get evaluated.\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mA primary-care doctor can order simple blood tests to rule out the most common disorders and guide you to any \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[38;5;22;47mneeded specialist care.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" + "\u001b[34m╭─\u001b[0m\u001b[34m───────────────────────────────────────────\u001b[0m\u001b[34m \u001b[0m\u001b[1;32mOutput (original model)\u001b[0m\u001b[34m \u001b[0m\u001b[34m───────────────────────────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mHere is a summary of current guidelines for adult routine cholesterol screening:\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**General Recommendations:**\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **All adults aged 20 years or older**: The American Heart Association (AHA), American College of Cardiology \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m(ACC), and U.S. Preventive Services Task Force (USPSTF) recommend routine cholesterol screening starting at age\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m20, with repeat testing every 4–6 years if risk remains low.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**More Specific Guidelines:**\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m### U.S. Preventive Services Task Force (USPSTF) (2016):\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **Adults aged 40–75**: Strongly recommend screening.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **Adults aged 20–39**: Consider screening if they have risk factors for cardiovascular disease (e.g., \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mdiabetes, hypertension, family history of early heart disease, smoking, obesity).\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **Frequency**: Every 4–6 years for low-risk individuals; more frequently if risk factors are present.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m### American College of Cardiology (ACC)/American Heart Association (AHA) (2018):\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **Adults aged 20 and older**: Assess cholesterol as part of cardiovascular risk assessment every 4–6 years.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **More frequent testing**: For those with risk factors (e.g., diabetes, hypertension, family history, \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mobesity) or those on cholesterol-lowering therapy.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m### National Lipid Association (NLA):\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **All adults 20 years and older**: Lipid profile at least every 5 years.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- **Earlier and/or more frequent testing**: If risk factors or family history of premature atherosclerotic \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mcardiovascular disease (ASCVD).\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**What is measured?**\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- A standard fasting or non-fasting lipid panel measures:\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m - Total cholesterol\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m - LDL cholesterol (\"bad\")\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m - HDL cholesterol (\"good\")\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m - Triglycerides\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**Summary Table:**\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m| Age Group | Routine Screening? | Frequency | More Frequent If... |\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m|-------------------|-------------------|---------------|------------------------------------|\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m| 20–39 years | Consider if risk | 4–6 years | Risk factors present |\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m| 40–75 years | Yes | 4–6 years | Risk factors or on therapy |\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m| >75 years | Individualized | Case-by-case | Based on overall health/risk |\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**Key Risk Factors:**\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Diabetes\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Hypertension\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Smoking\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Family history of early heart disease\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- Obesity\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**References:**\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- 2018 ACC/AHA Guideline on the Management of Blood Cholesterol\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m- USPSTF Recommendation Statement (2016)\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33m**Note:** These are general recommendations. Screening intervals and starting age may be adjusted based on \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33mindividual risk factors and clinical judgment. Always consult with a healthcare provider for personalized \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;33madvice.\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" ] }, "metadata": {}, @@ -1139,77 +1080,29 @@ { "data": { "text/html": [ - "
╭────────────────────────────────────── Output (fine-tuned reasoning model) ──────────────────────────────────────╮\n",
-       " Let’s start by getting a clearer picture of your bruising and overall health. I’ll ask you a few questions—your \n",
-       " answers will help me decide whether we need to check for a bleeding disorder or consider other causes:          \n",
-       "                                                                                                                 \n",
-       " 1. Onset & Frequency                                                                                            \n",
-       "  • When did you first notice the bruising?                                                                      \n",
-       "  • How many new bruises do you get in a week?                                                                   \n",
-       "                                                                                                                 \n",
-       " 2. Trigger & Distribution                                                                                       \n",
-       "  • Do these bruises come up after bumps or minor knocks, or do they appear seemingly out of nowhere?            \n",
-       "  • Are they mostly on pressure points (shins, forearms) or also on areas you wouldn’t expect to hit?            \n",
-       "                                                                                                                 \n",
-       " 3. Bruise Characteristics                                                                                       \n",
-       "  • How large are they, on average?                                                                              \n",
-       "  • Do they change color normally (purple → green → yellow) and heal in a few weeks?                             \n",
-       "                                                                                                                 \n",
-       " 4. Other Bleeding Symptoms                                                                                      \n",
-       "  • Any easy bleeding of gums, frequent nosebleeds, or heavy periods (if applicable)?                            \n",
-       "  • Any blood in your urine or stool?                                                                            \n",
-       "                                                                                                                 \n",
-       " 5. Medications & Supplements                                                                                    \n",
-       "  • Are you taking any blood thinners (warfarin, heparin, DOACs), aspirin, NSAIDs, or herbal supplements (e.g.,  \n",
-       " fish oil, ginkgo)?                                                                                              \n",
-       "                                                                                                                 \n",
-       " 6. Medical & Family History                                                                                     \n",
-       "  • Do you have liver disease, kidney disease, or a history of malignancy?                                       \n",
-       "  • Any family history of easy bruising or known bleeding disorders?                                             \n",
-       "                                                                                                                 \n",
-       " 7. Systemic Symptoms                                                                                            \n",
-       "  • Have you noticed fatigue, weight loss, fevers, or night sweats?                                              \n",
-       "                                                                                                                 \n",
-       " Once I have this information, we can decide whether to check a platelet count, coagulation studies (PT/INR,     \n",
-       " aPTT), and possibly von Willebrand factor levels. Let me know what you’ve observed.                             \n",
-       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
+       "
╭─────────────────────────────────────────── Output (fine-tuned model) ───────────────────────────────────────────╮\n",
+       " Could you help me narrow this down so I can give you the most relevant recommendation? Specifically:            \n",
+       "                                                                                                                 \n",
+       " 1. Which guideline or region are you interested in (for example, USPSTF in the US, ACC/AHA, Canadian, European, \n",
+       " etc.)?                                                                                                          \n",
+       " 2. Are we talking about primary‐prevention screening in an asymptomatic adult, or secondary‑prevention          \n",
+       " monitoring in someone with known cardiovascular disease?                                                        \n",
+       " 3. What is the patient’s age, sex, and any major risk factors (diabetes, hypertension, smoking, family history  \n",
+       " of early CVD, etc.)?                                                                                            \n",
+       "╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
        "
\n" ], "text/plain": [ - "\u001b[30;47m╭─\u001b[0m\u001b[30;47m─────────────────────────────────────\u001b[0m\u001b[30;47m \u001b[0m\u001b[1;30;47mOutput (fine-tuned reasoning model)\u001b[0m\u001b[30;47m \u001b[0m\u001b[30;47m─────────────────────────────────────\u001b[0m\u001b[30;47m─╮\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mLet’s start by getting a clearer picture of your bruising and overall health. I’ll ask you a few questions—your\u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47manswers will help me decide whether we need to check for a bleeding disorder or consider other causes:\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m1. Onset & Frequency \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • When did you first notice the bruising? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • How many new bruises do you get in a week? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m2. Trigger & Distribution \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • Do these bruises come up after bumps or minor knocks, or do they appear seemingly out of nowhere? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • Are they mostly on pressure points (shins, forearms) or also on areas you wouldn’t expect to hit? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m3. Bruise Characteristics \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • How large are they, on average? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • Do they change color normally (purple → green → yellow) and heal in a few weeks? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m4. Other Bleeding Symptoms \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • Any easy bleeding of gums, frequent nosebleeds, or heavy periods (if applicable)? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • Any blood in your urine or stool? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m5. Medications & Supplements \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • Are you taking any blood thinners (warfarin, heparin, DOACs), aspirin, NSAIDs, or herbal supplements (e.g., \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mfish oil, ginkgo)? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m6. Medical & Family History \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • Do you have liver disease, kidney disease, or a history of malignancy? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • Any family history of easy bruising or known bleeding disorders? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m7. Systemic Symptoms \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47m • Have you noticed fatigue, weight loss, fevers, or night sweats? \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47mOnce I have this information, we can decide whether to check a platelet count, coagulation studies (PT/INR, \u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m│\u001b[0m\u001b[47m \u001b[0m\u001b[35;47maPTT), and possibly von Willebrand factor levels. Let me know what you’ve observed.\u001b[0m\u001b[47m \u001b[0m\u001b[47m \u001b[0m\u001b[30;47m│\u001b[0m\n", - "\u001b[30;47m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" + "\u001b[34m╭─\u001b[0m\u001b[34m──────────────────────────────────────────\u001b[0m\u001b[34m \u001b[0m\u001b[1;32mOutput (fine-tuned model)\u001b[0m\u001b[34m \u001b[0m\u001b[34m──────────────────────────────────────────\u001b[0m\u001b[34m─╮\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mCould you help me narrow this down so I can give you the most relevant recommendation? Specifically:\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m1. Which guideline or region are you interested in (for example, USPSTF in the US, ACC/AHA, Canadian, European,\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35metc.)? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m2. Are we talking about primary‐prevention screening in an asymptomatic adult, or secondary‑prevention \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mmonitoring in someone with known cardiovascular disease? \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35m3. What is the patient’s age, sex, and any major risk factors (diabetes, hypertension, smoking, family history \u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m│\u001b[0m \u001b[1;35mof early CVD, etc.)?\u001b[0m \u001b[34m│\u001b[0m\n", + "\u001b[34m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n" ] }, "metadata": {}, @@ -1236,37 +1129,39 @@ "source": [ "console = Console()\n", "\n", - "for test_datapoint in test_data:\n", - " console.print(Panel(\n", - " Text(test_datapoint['messages'][0]['content'], style=\"black\"),\n", - " title=\"[bold black]Input[/bold black]\",\n", - " border_style=\"black\",\n", - " style=\"on white\"\n", - " ))\n", + "for item in run_items.to_dict()['data'][:3]:\n", + " input_text = item['datasource_item']['messages'][0]['content']\n", + " output_text = item['datasource_item']['completion'][0]['content']\n", + " sample_text = item['sample']['output'][0]['content']\n", " \n", " console.print(Panel(\n", - " Text(test_datapoint['completion'][0]['content'], style=\"blue\"),\n", - " title=\"[bold black]Output (original model completion)[/bold black]\",\n", - " border_style=\"black\",\n", - " style=\"on white\"\n", + " Text(input_text, style=\"bold cyan\"),\n", + " title=\"[bold green]Input[/bold green]\",\n", + " border_style=\"blue\"\n", " ))\n", - "\n", + " \n", " console.print(Panel(\n", - " Text(test_datapoint['base_response'], style=\"dark_green\"),\n", - " title=\"[bold black]Output (base reasoning model)[/bold black]\",\n", - " border_style=\"black\",\n", - " style=\"on white\"\n", + " Text(output_text, style=\"bold yellow\"),\n", + " title=\"[bold green]Output (original model)[/bold green]\",\n", + " border_style=\"blue\"\n", " ))\n", " \n", " console.print(Panel(\n", - " Text(test_datapoint['finetuned_response'], style=\"magenta\"),\n", - " title=\"[bold black]Output (fine-tuned reasoning model)[/bold black]\",\n", - " border_style=\"black\",\n", - " style=\"on white\"\n", + " Text(sample_text, style=\"bold magenta\"),\n", + " title=\"[bold green]Output (fine-tuned model)[/bold green]\",\n", + " border_style=\"blue\"\n", " ))\n", " \n", " console.print(\"\\n\" + \"-\" * 80 + \"\\n\")" ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7652f842", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -1285,7 +1180,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.9" + "version": "3.11.8" } }, "nbformat": 4, diff --git a/examples/gpt-5-codex_prompting_guide.ipynb b/examples/gpt-5-codex_prompting_guide.ipynb deleted file mode 100644 index 32ebf9b71e..0000000000 --- a/examples/gpt-5-codex_prompting_guide.ipynb +++ /dev/null @@ -1,182 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "fefdd2b1", - "metadata": {}, - "source": [ - "## GPT-5-Codex Prompting Guide\n", - "Important details about `GPT-5-Codex` and this guide:\n", - "- This model is not a drop-in replacement for GPT-5, as it requires significantly different prompting.\n", - "- This model is only supported with the Responses API and does not support the verbosity parameter.\n", - "- This guide is meant for API users of `GPT-5-Codex` and creating developer prompts, not for Codex users, if you are a Codex user refer to this [prompting guide](https://developers.openai.com/codex/prompting)\n", - "\n", - "`GPT-5-Codex` is a new version of GPT‑5 further optimized for agentic and interactive coding tasks. GPT‑5-Codex was trained with a focus on real-world software engineering work; it’s equally proficient at quick, interactive sessions and at independently powering through long, complex tasks. The model builds on GPT-5’s strong coding abilities with additional improvements such as:\n", - "- **Improved steerability:** `GPT-5-Codex` delivers higher-quality code on complex engineering tasks like features, tests, debugging, refactors, and reviews without lengthy instructions.\n", - "- **Adaptive reasoning level:** `GPT-5-Codex` adjusts its reasoning time to task complexity. It’s snappy in interactive sessions and able to work independently for multiple hours.\n", - "- **Excellent at code review:** `GPT-5-Codex` is trained to conduct code reviews, navigating codebases and running code and tests to validate correctness.\n", - "\n", - "`GPT-5-Codex` is purpose-built for Codex CLI, the Codex IDE extension, the Codex cloud environment, and working in GitHub, and also supports versatile tool use. We recommend using `GPT-5-Codex` only for agentic and interactive coding use cases.\n", - "\n", - "Because the model is trained specifically for coding, many best practices you once had to prompt into general purpose models are built in, and over prompting can reduce quality. \n", - "\n", - "The core prompting principle for `GPT-5-Codex` is **“less is more.”**, this includes:\n", - "1. Start with a minimal prompt inspired by the Codex CLI system prompt, then add only the essential guidance you truly need.\n", - "2. Remove any prompting for preambles, because the model does not support them. Asking for preambles will lead to the model stopping early before completing the task.\n", - "3. Reduce the number of tools to only the a terminal tool, and apply_patch.\n", - "4. Make tool descriptions as concise as possible by removing unnecessary details.\n", - "\n", - "\n", - "## Codex CLI Prompt\n", - "Below is the full Codex CLI developer message, which you can use as the reference implementation for prompting `GPT-5-Codex`. Compared with the GPT-5 developer message, it uses about 40% as many tokens, reinforcing that minimal prompting is ideal for this model.\n", - "\n", - "\n", - "\n", - "Here is a link to the [GPT-5-Codex Prompt](https://github.com/openai/codex/blob/main/codex-rs/core/gpt_5_codex_prompt.md) within Codex CLI as well as the [GPT-5 prompt](https://github.com/openai/codex/blob/main/codex-rs/core/prompt.md). As a point of comparison you can see the `GPT-5-Codex` prompt is much shorter than GPT-5 and we recommend following the same pattern. \n", - "```\n", - "You are Codex, based on GPT-5. You are running as a coding agent in the Codex CLI on a user's computer.\n", - "\n", - "## General\n", - "\n", - "- The arguments to `shell` will be passed to execvp(). Most terminal commands should be prefixed with [\"bash\", \"-lc\"].\n", - "- Always set the `workdir` param when using the shell function. Do not use `cd` unless absolutely necessary.\n", - "- When searching for text or files, prefer using `rg` or `rg --files` respectively because `rg` is much faster than alternatives like `grep`. (If the `rg` command is not found, then use alternatives.)\n", - "\n", - "## Editing constraints\n", - "\n", - "- Default to ASCII when editing or creating files. Only introduce non-ASCII or other Unicode characters when there is a clear justification and the file already uses them.\n", - "- Add succinct code comments that explain what is going on if code is not self-explanatory. You should not add comments like \"Assigns the value to the variable\", but a brief comment might be useful ahead of a complex code block that the user would otherwise have to spend time parsing out. Usage of these comments should be rare.\n", - "- You may be in a dirty git worktree.\n", - " * NEVER revert existing changes you did not make unless explicitly requested, since these changes were made by the user.\n", - " * If asked to make a commit or code edits and there are unrelated changes to your work or changes that you didn't make in those files, don't revert those changes.\n", - " * If the changes are in files you've touched recently, you should read carefully and understand how you can work with the changes rather than reverting them.\n", - " * If the changes are in unrelated files, just ignore them and don't revert them.\n", - "- While you are working, you might notice unexpected changes that you didn't make. If this happens, STOP IMMEDIATELY and ask the user how they would like to proceed.\n", - "\n", - "## Plan tool\n", - "\n", - "When using the planning tool:\n", - "- Skip using the planning tool for straightforward tasks (roughly the easiest 25%).\n", - "- Do not make single-step plans.\n", - "- When you made a plan, update it after having performed one of the sub-tasks that you shared on the plan.\n", - "\n", - "## Codex CLI harness, sandboxing, and approvals\n", - "\n", - "The Codex CLI harness supports several different configurations for sandboxing and escalation approvals that the user can choose from.\n", - "\n", - "Filesystem sandboxing defines which files can be read or written. The options for `sandbox_mode` are:\n", - "- **read-only**: The sandbox only permits reading files.\n", - "- **workspace-write**: The sandbox permits reading files, and editing files in `cwd` and `writable_roots`. Editing files in other directories requires approval.\n", - "- **danger-full-access**: No filesystem sandboxing - all commands are permitted.\n", - "\n", - "Network sandboxing defines whether network can be accessed without approval. Options for `network_access` are:\n", - "- **restricted**: Requires approval\n", - "- **enabled**: No approval needed\n", - "\n", - "Approvals are your mechanism to get user consent to run shell commands without the sandbox. Possible configuration options for `approval_policy` are\n", - "- **untrusted**: The harness will escalate most commands for user approval, apart from a limited allowlist of safe \"read\" commands.\n", - "- **on-failure**: The harness will allow all commands to run in the sandbox (if enabled), and failures will be escalated to the user for approval to run again without the sandbox.\n", - "- **on-request**: Commands will be run in the sandbox by default, and you can specify in your tool call if you want to escalate a command to run without sandboxing. (Note that this mode is not always available. If it is, you'll see parameters for it in the `shell` command description.)\n", - "- **never**: This is a non-interactive mode where you may NEVER ask the user for approval to run commands. Instead, you must always persist and work around constraints to solve the task for the user. You MUST do your utmost best to finish the task and validate your work before yielding. If this mode is paired with `danger-full-access`, take advantage of it to deliver the best outcome for the user. Further, in this mode, your default testing philosophy is overridden: Even if you don't see local patterns for testing, you may add tests and scripts to validate your work. Just remove them before yielding.\n", - "\n", - "When you are running with `approval_policy == on-request`, and sandboxing enabled, here are scenarios where you'll need to request approval:\n", - "- You need to run a command that writes to a directory that requires it (e.g. running tests that write to /var)\n", - "- You need to run a GUI app (e.g., open/xdg-open/osascript) to open browsers or files.\n", - "- You are running sandboxed and need to run a command that requires network access (e.g. installing packages)\n", - "- If you run a command that is important to solving the user's query, but it fails because of sandboxing, rerun the command with approval. ALWAYS proceed to use the `with_escalated_permissions` and `justification` parameters - do not message the user before requesting approval for the command.\n", - "- You are about to take a potentially destructive action such as an `rm` or `git reset` that the user did not explicitly ask for\n", - "- (for all of these, you should weigh alternative paths that do not require approval)\n", - "\n", - "When `sandbox_mode` is set to read-only, you'll need to request approval for any command that isn't a read.\n", - "\n", - "You will be told what filesystem sandboxing, network sandboxing, and approval mode are active in a developer or user message. If you are not told about this, assume that you are running with workspace-write, network sandboxing enabled, and approval on-failure.\n", - "\n", - "Although they introduce friction to the user because your work is paused until the user responds, you should leverage them when necessary to accomplish important work. If the completing the task requires escalated permissions, Do not let these settings or the sandbox deter you from attempting to accomplish the user's task unless it is set to \"never\", in which case never ask for approvals.\n", - "\n", - "When requesting approval to execute a command that will require escalated privileges:\n", - " - Provide the `with_escalated_permissions` parameter with the boolean value true\n", - " - Include a short, 1 sentence explanation for why you need to enable `with_escalated_permissions` in the justification parameter\n", - "\n", - "## Special user requests\n", - "\n", - "- If the user makes a simple request (such as asking for the time) which you can fulfill by running a terminal command (such as `date`), you should do so.\n", - "- If the user asks for a \"review\", default to a code review mindset: prioritise identifying bugs, risks, behavioural regressions, and missing tests. Findings must be the primary focus of the response - keep summaries or overviews brief and only after enumerating the issues. Present findings first (ordered by severity with file/line references), follow with open questions or assumptions, and offer a change-summary only as a secondary detail. If no findings are discovered, state that explicitly and mention any residual risks or testing gaps.\n", - "\n", - "## Presenting your work and final message\n", - "\n", - "You are producing plain text that will later be styled by the CLI. Follow these rules exactly. Formatting should make results easy to scan, but not feel mechanical. Use judgment to decide how much structure adds value.\n", - "\n", - "- Default: be very concise; friendly coding teammate tone.\n", - "- Ask only when needed; suggest ideas; mirror the user's style.\n", - "- For substantial work, summarize clearly; follow final‑answer formatting.\n", - "- Skip heavy formatting for simple confirmations.\n", - "- Don't dump large files you've written; reference paths only.\n", - "- No \"save/copy this file\" - User is on the same machine.\n", - "- Offer logical next steps (tests, commits, build) briefly; add verify steps if you couldn't do something.\n", - "- For code changes:\n", - " * Lead with a quick explanation of the change, and then give more details on the context covering where and why a change was made. Do not start this explanation with \"summary\", just jump right in.\n", - " * If there are natural next steps the user may want to take, suggest them at the end of your response. Do not make suggestions if there are no natural next steps.\n", - " * When suggesting multiple options, use numeric lists for the suggestions so the user can quickly respond with a single number.\n", - "- The user does not command execution outputs. When asked to show the output of a command (e.g. `git show`), relay the important details in your answer or summarize the key lines so the user understands the result.\n", - "\n", - "### Final answer structure and style guidelines\n", - "\n", - "- Plain text; CLI handles styling. Use structure only when it helps scanability.\n", - "- Headers: optional; short Title Case (1-3 words) wrapped in **…**; no blank line before the first bullet; add only if they truly help.\n", - "- Bullets: use - ; merge related points; keep to one line when possible; 4–6 per list ordered by importance; keep phrasing consistent.\n", - "- Monospace: backticks for commands/paths/env vars/code ids and inline examples; use for literal keyword bullets; never combine with **.\n", - "- Code samples or multi-line snippets should be wrapped in fenced code blocks; add a language hint whenever obvious.\n", - "- Structure: group related bullets; order sections general → specific → supporting; for subsections, start with a bolded keyword bullet, then items; match complexity to the task.\n", - "- Tone: collaborative, concise, factual; present tense, active voice; self‑contained; no \"above/below\"; parallel wording.\n", - "- Don'ts: no nested bullets/hierarchies; no ANSI codes; don't cram unrelated keywords; keep keyword lists short—wrap/reformat if long; avoid naming formatting styles in answers.\n", - "- Adaptation: code explanations → precise, structured with code refs; simple tasks → lead with outcome; big changes → logical walkthrough + rationale + next actions; casual one-offs → plain sentences, no headers/bullets.\n", - "- File References: When referencing files in your response, make sure to include the relevant start line and always follow the below rules:\n", - " * Use inline code to make file paths clickable.\n", - " * Each reference should have a stand alone path. Even if it's the same file.\n", - " * Accepted: absolute, workspace‑relative, a/ or b/ diff prefixes, or bare filename/suffix.\n", - " * Line/column (1‑based, optional): :line[:column] or #Lline[Ccolumn] (column defaults to 1).\n", - " * Do not use URIs like file://, vscode://, or https://.\n", - " * Do not provide range of lines\n", - " * Examples: src/app.ts, src/app.ts:42, b/server/index.js#L10, C:\\repo\\project\\main.rs:12:5\n", - "```\n", - "#### Apply Patch\n", - "As shared previously in the `GPT-5` prompting guide, [here](https://github.com/openai/openai-cookbook/tree/main/examples/gpt-5/apply_patch.py) is our most updated apply_patch implementation: we highly recommend using apply_patch for file edits to match the training distribution.\n", - "\n", - "## Anti-Prompting\n", - "As noted above, because `GPT-5-Codex` was trained for optimal agentic coding, prompt tuning will more often mean removing guidance than adding it. Below are aspects you may not need to steer.\n", - "\n", - "#### Adaptive Reasoning\n", - "Adaptive reasoning is now the default in `GPT-5-Codex`. In the past, you might have prompted models to “think harder” or “respond quickly” based on task difficulty. `GPT-5-Codex` adjusts automatically: for a question like “How do I undo the last commit but keep all changes staged?”, it responds quickly without extra steering. For more complex coding tasks, it takes the time it needs and uses tools as appropriate.\n", - "\n", - "#### Planning\n", - "`GPT-5-Codex` was trained for a wide variety of coding tasks from long-running agentic tasks to shorter interactive coding tasks, so the model has a collaborative personality by default. When you kick off an agentic task, the model will build a detailed plan and keep you updated as it progresses. Codex CLI includes a planning tool, and the model is trained to use it throughout its agentic rollout, so if you provide a planning tool as well, the model can leverage it while coding.\n", - "The [”Planning” section of the GPT-5 dev message in Codex CLI](https://github.com/openai/codex/blob/main/codex-rs/core/prompt.md?plain=1#L52-L122) is no longer needed in `GPT-5-Codex`, as the model is trained to produce high-quality plans.\n", - "\n", - "#### Preambles\n", - "**`GPT-5-Codex` does not emit preambles!** Prompting and asking for it will likely result in the model stopping early. Instead, we have a custom summarizer that produces detailed summaries only when appropriate so you can render them inline.\n", - "\n", - "#### Frontend\n", - "`GPT-5-Codex` defaults to strong aesthetics and modern frontend best practices. If you have preferred libraries or frameworks, steer the model by adding short sections that spell them out, such as:\n", - "\n", - "```\n", - "Frontend Guidance\n", - "Use the following libraries unless the user or repo specifies otherwise:\n", - "Framework: React + TypeScript\n", - "Styling: Tailwind CSS\n", - "Components: shadcn/ui\n", - "Icons: lucide-react\n", - "Animation: Framer Motion\n", - "Charts: Recharts\n", - "Fonts: San Serif, Inter, Geist, Mona Sans, IBM Plex Sans, Manrope\n", - "```\n" - ] - } - ], - "metadata": { - "language_info": { - "name": "python" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/examples/partners/model_selection_guide/model_selection_guide.ipynb b/examples/partners/model_selection_guide/model_selection_guide.ipynb index 8b4b9e6604..19416b2e81 100644 --- a/examples/partners/model_selection_guide/model_selection_guide.ipynb +++ b/examples/partners/model_selection_guide/model_selection_guide.ipynb @@ -2425,7 +2425,7 @@ "\n", "- **[Orchestrating Agents: Routines and Handoffs](https://cookbook.openai.com/examples/orchestrating_agents)** Structuring multi-agent workflows with routines and handoffs, relevant to the ideation→ranking→critique pipeline.\n", "\n", - "- **[GPT-4.1 Prompting Guide](https://cookbook.openai.com/examples/gpt4-1_prompting_guide)** Advanced prompting, tool use, and task decomposition for improved accuracy in critique and safety reviews.\n", + "- **[GPT-4.1 Prompting Guide](https://cookbook.openai.com/examples/prompting/gpt4-1_prompting_guide)** Advanced prompting, tool use, and task decomposition for improved accuracy in critique and safety reviews.\n", "\n", "- **[Structured Outputs for Multi-Agent Systems](https://cookbook.openai.com/examples/structured_outputs_multi_agent)** Enforcing consistent JSON outputs with schema validation for agent interoperability.\n", "\n", @@ -3219,7 +3219,7 @@ "- [Data Extraction and Transformation](https://cookbook.openai.com/examples/data_extraction_transformation)\n", "\n", "### Prompting & Model Selection\n", - "- [GPT-4.1 Prompting Guide](https://cookbook.openai.com/examples/gpt4-1_prompting_guide)\n", + "- [GPT-4.1 Prompting Guide](https://cookbook.openai.com/examples/prompting/gpt4-1_prompting_guide)\n", "- [Prompt Engineering Best Practices](https://platform.openai.com/docs/guides/prompt-engineering)\n", "\n", "### Evaluation & Deployment\n", diff --git a/examples/Enhance_your_prompts_with_meta_prompting.ipynb b/examples/prompting/Enhance_your_prompts_with_meta_prompting.ipynb similarity index 100% rename from examples/Enhance_your_prompts_with_meta_prompting.ipynb rename to examples/prompting/Enhance_your_prompts_with_meta_prompting.ipynb diff --git a/examples/Optimize_Prompts.ipynb b/examples/prompting/Optimize_Prompts.ipynb similarity index 100% rename from examples/Optimize_Prompts.ipynb rename to examples/prompting/Optimize_Prompts.ipynb diff --git a/examples/Prompt_migration_guide.ipynb b/examples/prompting/Prompt_migration_guide.ipynb similarity index 100% rename from examples/Prompt_migration_guide.ipynb rename to examples/prompting/Prompt_migration_guide.ipynb diff --git a/examples/prompting/README.md b/examples/prompting/README.md new file mode 100644 index 0000000000..c530981fc1 --- /dev/null +++ b/examples/prompting/README.md @@ -0,0 +1,59 @@ +# Prompting: Guides & Examples + +This directory consolidates prompting-related guides, examples, and reusable prompt assets from across the Cookbook. It’s a single place to learn core prompting patterns, optimize prompts, and discover application-specific prompt sets. + +## Why this exists +Effective prompting solves a large share of practical model issues. The right prompt is as important as parameters like `temperature` or `reasoning_effort`. Centralizing examples and references makes them easier to find, reuse, and maintain. + +## Start here (recommended path) +1. **GPT-4.1 Prompting Guide** → techniques, structure, and patterns: + [`gpt4-1_prompting_guide.ipynb`](./gpt4-1_prompting_guide.ipynb) +2. **Prompt engineering best practices** (reference): + +3. **Orchestrating agents & handoffs** (for multi-agent apps): + `../orchestrating_agents` (see the top-level Examples index) +4. **Structured outputs** (JSON schemas, validation): + `../structured_outputs_multi_agent` + +> Tip: Keep prompts short, specific, and testable. Add minimal examples, define outputs precisely, and prefer explicit instructions over implications. + +## Contents + +### Core Guides +- **GPT-4.1 Prompting Guide** — system prompts, tool use, decomposition, evaluation + [`gpt4-1_prompting_guide.ipynb`](./gpt4-1_prompting_guide.ipynb) +- **Realtime prompting guide** — working with the Realtime API + [`Realtime_prompting_guide.ipynb`](./Realtime_prompting_guide.ipynb) +- **Whisper prompting guide** — task hints and formatting for speech recognition + [`Whisper_prompting_guide.ipynb`](./Whisper_prompting_guide.ipynb) + +### Prompt Optimization +- **Optimize Prompts** — automated checks & fixes for common prompt issues + [`Optimize_Prompts.ipynb`](./Optimize_Prompts.ipynb) +- **Enhance your prompts with meta-prompting** — programmatic refinement strategies + [`Enhance_your_prompts_with_meta_prompting.ipynb`](./Enhance_your_prompts_with_meta_prompting.ipynb) +- **Prompt migration guide** — safely updating existing prompts across changes + [`Prompt_migration_guide.ipynb`](./Prompt_migration_guide.ipynb) + +### Agent & App Prompts +- **Multi-agent portfolio collaboration prompts** — reusable prompt set for agent roles + [`../agents_sdk/multi-agent-portfolio-collaboration/prompts/`](../agents_sdk/multi-agent-portfolio-collaboration/prompts/) + +### Supporting Resources +- OpenAI Prompt Engineering (plain-text reference) + [`../data/oai_docs/prompt-engineering.txt`](../data/oai_docs/prompt-engineering.txt) + +## Usage pattern + +1. **Draft** a minimal instruction with explicit output shape (e.g., JSON schema). +2. **Ground** with constraints (tone, audience, knowledge limits) and a tiny example if needed. +3. **Test** with real inputs; watch for ambiguity and output drift. +4. **Evaluate** with checks (format validation, assertions). +5. **Iterate**: shorten, remove redundant rules, and pin “must-haves”. + +## Contributing + +- Keep notebooks **runnable end-to-end** (no hidden cell state). +- Prefer **relative links** within `examples/`, so both GitHub and the site render cleanly. +- When adding new files here, **update** `registry.yaml` so content appears on the site. +- If you introduce a new subfolder of prompts, include a short `README.md` explaining scope and usage. diff --git a/examples/Realtime_prompting_guide.ipynb b/examples/prompting/Realtime_prompting_guide.ipynb similarity index 100% rename from examples/Realtime_prompting_guide.ipynb rename to examples/prompting/Realtime_prompting_guide.ipynb diff --git a/examples/Unit_test_writing_using_a_multi-step_prompt.ipynb b/examples/prompting/Unit_test_writing_using_a_multi-step_prompt.ipynb similarity index 100% rename from examples/Unit_test_writing_using_a_multi-step_prompt.ipynb rename to examples/prompting/Unit_test_writing_using_a_multi-step_prompt.ipynb diff --git a/examples/Unit_test_writing_using_a_multi-step_prompt_with_older_completions_API.ipynb b/examples/prompting/Unit_test_writing_using_a_multi-step_prompt_with_older_completions_API.ipynb similarity index 100% rename from examples/Unit_test_writing_using_a_multi-step_prompt_with_older_completions_API.ipynb rename to examples/prompting/Unit_test_writing_using_a_multi-step_prompt_with_older_completions_API.ipynb diff --git a/examples/Whisper_prompting_guide.ipynb b/examples/prompting/Whisper_prompting_guide.ipynb similarity index 100% rename from examples/Whisper_prompting_guide.ipynb rename to examples/prompting/Whisper_prompting_guide.ipynb diff --git a/examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/code_interpreter.md b/examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/code_interpreter.md similarity index 100% rename from examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/code_interpreter.md rename to examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/code_interpreter.md diff --git a/examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/editor_base.md b/examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/editor_base.md similarity index 100% rename from examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/editor_base.md rename to examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/editor_base.md diff --git a/examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/fundamental_base.md b/examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/fundamental_base.md similarity index 100% rename from examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/fundamental_base.md rename to examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/fundamental_base.md diff --git a/examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/macro_base.md b/examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/macro_base.md similarity index 100% rename from examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/macro_base.md rename to examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/macro_base.md diff --git a/examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/pm_base.md b/examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/pm_base.md similarity index 100% rename from examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/pm_base.md rename to examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/pm_base.md diff --git a/examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/quant_base.md b/examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/quant_base.md similarity index 100% rename from examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/quant_base.md rename to examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/quant_base.md diff --git a/examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/tool_retry_prompt.md b/examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/tool_retry_prompt.md similarity index 100% rename from examples/agents_sdk/multi-agent-portfolio-collaboration/prompts/tool_retry_prompt.md rename to examples/prompting/agents_sdk/multi-agent-portfolio-collaboration/prompts/tool_retry_prompt.md diff --git a/examples/data/oai_docs/prompt-engineering.txt b/examples/prompting/data/oai_docs/prompt-engineering.txt similarity index 100% rename from examples/data/oai_docs/prompt-engineering.txt rename to examples/prompting/data/oai_docs/prompt-engineering.txt diff --git a/examples/gpt-5/gpt-5_prompting_guide.ipynb b/examples/prompting/gpt-5/gpt-5_prompting_guide.ipynb similarity index 100% rename from examples/gpt-5/gpt-5_prompting_guide.ipynb rename to examples/prompting/gpt-5/gpt-5_prompting_guide.ipynb diff --git a/examples/gpt-5/prompt-optimization-cookbook.ipynb b/examples/prompting/gpt-5/prompt-optimization-cookbook.ipynb similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook.ipynb rename to examples/prompting/gpt-5/prompt-optimization-cookbook.ipynb diff --git a/examples/gpt-5/prompt-optimization-cookbook/llm_as_judge.txt b/examples/prompting/gpt-5/prompt-optimization-cookbook/llm_as_judge.txt similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/llm_as_judge.txt rename to examples/prompting/gpt-5/prompt-optimization-cookbook/llm_as_judge.txt diff --git a/examples/gpt-5/prompt-optimization-cookbook/requirements.txt b/examples/prompting/gpt-5/prompt-optimization-cookbook/requirements.txt similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/requirements.txt rename to examples/prompting/gpt-5/prompt-optimization-cookbook/requirements.txt diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_failsafeqa_baseline.csv b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_failsafeqa_baseline.csv similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_failsafeqa_baseline.csv rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_failsafeqa_baseline.csv diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_failsafeqa_optimized.csv b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_failsafeqa_optimized.csv similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_failsafeqa_optimized.csv rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_failsafeqa_optimized.csv diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/judgement_summary.csv b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/judgement_summary.csv similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/judgement_summary.csv rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/judgement_summary.csv diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_01.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_01.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_01.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_01.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_02.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_02.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_02.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_02.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_03.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_03.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_03.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_03.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_04.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_04.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_04.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_04.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_05.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_05.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_05.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_05.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_06.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_06.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_06.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_06.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_07.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_07.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_07.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_07.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_08.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_08.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_08.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_08.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_09.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_09.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_09.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_09.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_10.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_10.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_10.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_10.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_11.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_11.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_11.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_11.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_12.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_12.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_12.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_12.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_13.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_13.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_13.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_13.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_14.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_14.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_14.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_14.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_15.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_15.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_15.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_15.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_16.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_16.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_16.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_16.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_17.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_17.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_17.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_17.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_18.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_18.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_18.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_18.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_19.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_19.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_19.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_19.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_20.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_20.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_20.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_20.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_21.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_21.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_21.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_21.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_22.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_22.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_22.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_22.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_23.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_23.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_23.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_23.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_24.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_24.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_24.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_24.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_25.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_25.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_25.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_25.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_26.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_26.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_26.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_26.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_27.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_27.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_27.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_27.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_28.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_28.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_28.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_28.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_29.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_29.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_29.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_29.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_30.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_30.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_30.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_baseline/run_30.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/judgement_summary.csv b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/judgement_summary.csv similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/judgement_summary.csv rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/judgement_summary.csv diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_01.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_01.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_01.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_01.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_02.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_02.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_02.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_02.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_03.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_03.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_03.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_03.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_04.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_04.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_04.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_04.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_05.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_05.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_05.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_05.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_06.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_06.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_06.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_06.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_07.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_07.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_07.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_07.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_08.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_08.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_08.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_08.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_09.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_09.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_09.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_09.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_10.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_10.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_10.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_10.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_11.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_11.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_11.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_11.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_12.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_12.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_12.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_12.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_13.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_13.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_13.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_13.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_14.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_14.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_14.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_14.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_15.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_15.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_15.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_15.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_16.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_16.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_16.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_16.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_17.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_17.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_17.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_17.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_18.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_18.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_18.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_18.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_19.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_19.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_19.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_19.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_20.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_20.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_20.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_20.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_21.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_21.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_21.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_21.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_22.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_22.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_22.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_22.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_23.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_23.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_23.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_23.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_24.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_24.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_24.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_24.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_25.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_25.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_25.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_25.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_26.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_26.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_26.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_26.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_27.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_27.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_27.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_27.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_28.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_28.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_28.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_28.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_29.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_29.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_29.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_29.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_30.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_30.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_30.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_llm_as_judge_optimized/run_30.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_01.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_01.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_01.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_01.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_02.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_02.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_02.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_02.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_03.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_03.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_03.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_03.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_04.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_04.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_04.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_04.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_05.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_05.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_05.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_05.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_06.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_06.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_06.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_06.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_07.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_07.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_07.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_07.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_08.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_08.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_08.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_08.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_09.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_09.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_09.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_09.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_10.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_10.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_10.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_10.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_11.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_11.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_11.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_11.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_12.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_12.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_12.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_12.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_13.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_13.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_13.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_13.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_14.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_14.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_14.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_14.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_15.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_15.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_15.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_15.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_16.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_16.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_16.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_16.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_17.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_17.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_17.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_17.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_18.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_18.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_18.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_18.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_19.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_19.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_19.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_19.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_20.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_20.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_20.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_20.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_21.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_21.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_21.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_21.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_22.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_22.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_22.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_22.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_23.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_23.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_23.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_23.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_24.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_24.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_24.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_24.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_25.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_25.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_25.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_25.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_26.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_26.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_26.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_26.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_27.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_27.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_27.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_27.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_28.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_28.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_28.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_28.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_29.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_29.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_29.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_29.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_30.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_30.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_30.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_30.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline.csv b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline.csv similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline.csv rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline.csv diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline_summary.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline_summary.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline_summary.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline_summary.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline_summary.txt b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline_summary.txt similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline_summary.txt rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_baseline/run_results_topk_baseline_summary.txt diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_01.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_01.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_01.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_01.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_02.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_02.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_02.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_02.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_03.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_03.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_03.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_03.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_04.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_04.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_04.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_04.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_05.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_05.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_05.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_05.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_06.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_06.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_06.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_06.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_07.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_07.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_07.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_07.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_08.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_08.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_08.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_08.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_09.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_09.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_09.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_09.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_10.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_10.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_10.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_10.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_11.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_11.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_11.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_11.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_12.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_12.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_12.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_12.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_13.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_13.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_13.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_13.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_14.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_14.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_14.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_14.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_15.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_15.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_15.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_15.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_16.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_16.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_16.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_16.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_17.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_17.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_17.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_17.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_18.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_18.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_18.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_18.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_19.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_19.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_19.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_19.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_20.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_20.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_20.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_20.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_21.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_21.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_21.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_21.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_22.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_22.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_22.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_22.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_23.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_23.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_23.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_23.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_24.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_24.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_24.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_24.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_25.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_25.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_25.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_25.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_26.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_26.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_26.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_26.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_27.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_27.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_27.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_27.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_28.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_28.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_28.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_28.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_29.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_29.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_29.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_29.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_30.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_30.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_30.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_30.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized.csv b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized.csv similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized.csv rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized.csv diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized_summary.json b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized_summary.json similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized_summary.json rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized_summary.json diff --git a/examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized_summary.txt b/examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized_summary.txt similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized_summary.txt rename to examples/prompting/gpt-5/prompt-optimization-cookbook/results_topk_optimized/run_results_topk_optimized_summary.txt diff --git a/examples/gpt-5/prompt-optimization-cookbook/run_FailSafeQA.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/run_FailSafeQA.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/run_FailSafeQA.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/run_FailSafeQA.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/scripts/__init__.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/__init__.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/scripts/__init__.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/__init__.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/scripts/gen_baseline.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/gen_baseline.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/scripts/gen_baseline.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/gen_baseline.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/scripts/gen_optimized.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/gen_optimized.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/scripts/gen_optimized.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/gen_optimized.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/scripts/llm_judge.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/llm_judge.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/scripts/llm_judge.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/llm_judge.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/scripts/results_summarizer.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/results_summarizer.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/scripts/results_summarizer.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/results_summarizer.py diff --git a/examples/gpt-5/prompt-optimization-cookbook/scripts/topk_eval.py b/examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/topk_eval.py similarity index 100% rename from examples/gpt-5/prompt-optimization-cookbook/scripts/topk_eval.py rename to examples/prompting/gpt-5/prompt-optimization-cookbook/scripts/topk_eval.py diff --git a/examples/gpt4-1_prompting_guide.ipynb b/examples/prompting/gpt4-1_prompting_guide.ipynb similarity index 100% rename from examples/gpt4-1_prompting_guide.ipynb rename to examples/prompting/gpt4-1_prompting_guide.ipynb diff --git a/examples/o-series/o3o4-mini_prompting_guide.ipynb b/examples/prompting/o-series/o3o4-mini_prompting_guide.ipynb similarity index 100% rename from examples/o-series/o3o4-mini_prompting_guide.ipynb rename to examples/prompting/o-series/o3o4-mini_prompting_guide.ipynb diff --git a/examples/voice_solutions/one_way_translation_using_realtime_api/src/utils/translation_prompts.js b/examples/prompting/voice_solutions/one_way_translation_using_realtime_api/src/utils/translation_prompts.js similarity index 100% rename from examples/voice_solutions/one_way_translation_using_realtime_api/src/utils/translation_prompts.js rename to examples/prompting/voice_solutions/one_way_translation_using_realtime_api/src/utils/translation_prompts.js diff --git a/registry.yaml b/registry.yaml index 3b38ccbe10..80c1033819 100644 --- a/registry.yaml +++ b/registry.yaml @@ -4,15 +4,6 @@ # should build pages for, and indicates metadata such as tags, creation date and # authors for each page. -- title: GPT-5-Codex Prompting Guide - path: examples/gpt-5-codex_prompting_guide.ipynb - date: 2025-09-23 - authors: - - daveleo-openai - tags: - - gpt-5 - - codex - - title: GPT-5 Troubleshooting Guide path: examples/gpt-5/gpt-5_troubleshooting_guide.ipynb date: 2025-09-17 @@ -42,7 +33,7 @@ - codex - title: Realtime Prompting Guide - path: examples/Realtime_prompting_guide.ipynb + path: examples/prompting/Realtime_prompting_guide.ipynb date: 2025-08-28 authors: - minh-hoque @@ -84,7 +75,7 @@ - gpt-oss-local - title: GPT-5 Prompt Migration and Improvement Using the New Optimizer - path: examples/gpt-5/prompt-optimization-cookbook.ipynb + path: examples/prompting/gpt-5/prompt-optimization-cookbook.ipynb date: 2025-08-07 authors: - rajpathak-openai @@ -96,7 +87,7 @@ - prompt-optimization - title: GPT-5 prompting guide - path: examples/gpt-5/gpt-5_prompting_guide.ipynb + path: examples/prompting/gpt-5/gpt-5_prompting_guide.ipynb date: 2025-08-07 authors: - anoop-openai @@ -249,7 +240,7 @@ - audio - title: Optimize Prompts - path: examples/Optimize_Prompts.ipynb + path: examples/prompting/Optimize_Prompts.ipynb date: 2025-07-14 authors: - corwin @@ -271,7 +262,7 @@ - automation - title: Prompt Migration Guide - path: examples/Prompt_migration_guide.ipynb + path: examples/prompting/Prompt_migration_guide.ipynb date: 2025-06-26 authors: - minh-hoque @@ -342,7 +333,7 @@ - mutli-agent-collaboration - title: o3/o4-mini Function Calling Guide - path: examples/o-series/o3o4-mini_prompting_guide.ipynb + path: examples/prompting/o-series/o3o4-mini_prompting_guide.ipynb date: 2025-05-26 authors: - billchen-openai @@ -796,7 +787,7 @@ - embeddings - title: Unit test writing using a multi-step prompt - path: examples/Unit_test_writing_using_a_multi-step_prompt.ipynb + path: examples/prompting/Unit_test_writing_using_a_multi-step_prompt.ipynb date: 2022-11-15 authors: - ted-at-openai @@ -804,8 +795,7 @@ - completions - title: Unit test writing using a multi-step prompt with legacy Completions - path: >- - examples/Unit_test_writing_using_a_multi-step_prompt_with_older_completions_API.ipynb + path: examples/prompting/Unit_test_writing_using_a_multi-step_prompt_with_older_completions_API.ipynb date: 2023-05-19 authors: - ted-at-openai @@ -875,7 +865,7 @@ archived: true - title: Whisper prompting guide - path: examples/Whisper_prompting_guide.ipynb + path: examples/prompting/Whisper_prompting_guide.ipynb date: 2023-06-27 authors: - prestontuggle @@ -2194,7 +2184,7 @@ - audio - title: Enhance your prompts with meta prompting - path: examples/Enhance_your_prompts_with_meta_prompting.ipynb + path: examples/prompting/Enhance_your_prompts_with_meta_prompting.ipynb date: 2024-10-23 authors: - teomusatoiu @@ -2377,7 +2367,7 @@ - chatgpt-productivity - title: GPT-4.1 Prompting Guide - path: examples/gpt4-1_prompting_guide.ipynb + path: examples/prompting/gpt4-1_prompting_guide.ipynb date: 2025-04-14 authors: - nm-openai @@ -2530,16 +2520,11 @@ tags: - images - -- title: Codex CLI to automatically fix CI failures - path: examples/codex/codex-cicd.ipynb - date: 2025-09-30 +- title: LLMs 101: A Practical Introduction + path: articles/openai-cookbook-llms-101.md + date: 2025-09-15 authors: - - himadri518 - - alwell-kevin + - paytonison tags: - - codex - - - - + - llms + - beginners