openai
diff --git a/‎articles/gpt-oss/fine-tune-korean.ipynb‎
Lines changed: 1196 additions & 0 deletions b/‎articles/gpt-oss/fine-tune-korean.ipynb‎
Lines changed: 1196 additions & 0 deletions
diff --git a/‎articles/gpt-oss/fine-tune-transfomers.ipynb‎
Lines changed: 673 additions & 0 deletions b/‎articles/gpt-oss/fine-tune-transfomers.ipynb‎
Lines changed: 673 additions & 0 deletions
diff --git a/‎articles/gpt-oss/handle-raw-cot.md‎
Lines changed: 123 additions & 0 deletions b/‎articles/gpt-oss/handle-raw-cot.md‎
Lines changed: 123 additions & 0 deletions
diff --git a/‎articles/gpt-oss/run-colab.ipynb‎
Lines changed: 247 additions & 0 deletions b/‎articles/gpt-oss/run-colab.ipynb‎
Lines changed: 247 additions & 0 deletions
@@ -0,0 +1,123 @@
+# How to handle the raw chain of thought in gpt-oss
+
+The [gpt-oss models](https://openai.com/open-models) provide access to a raw chain of thought (CoT) meant for analysis and safety research by model implementors, but it’s also crucial for the performance of tool calling, as tool calls can be performed as part of the CoT. At the same time, the raw CoT might contain potentially harmful content or could reveal information to users that the person implementing the model might not intend (like rules specified in the instructions given to the model). You therefore should not show raw CoT to end users.
+
+## Harmony / chat template handling
+
+The model encodes its raw CoT as part of our [harmony response format](https://cookbook.openai.com/articles/openai-harmony). If you are authoring your own chat templates or are handling tokens directly, make sure to [check out harmony guide first](https://cookbook.openai.com/articles/openai-harmony).
+
+To summarize a couple of things:
+
+1. CoT will be issued to the `analysis` channel
+2. After a message to the `final` channel in a subsequent sampling turn all `analysis` messages should be dropped. Function calls to the `commentary` channel can remain
+3. If the last message by the assistant was a tool call of any type, the analysis messages until the previous `final` message should be preserved on subsequent sampling until a `final` message gets issued
+
+## Chat Completions API
+
+If you are implementing a Chat Completions API, there is no official spec for handling chain of thought in the published OpenAI specs, as our hosted models will not offer this feature for the time being. We ask you to follow [the following convention from OpenRouter instead](https://openrouter.ai/docs/use-cases/reasoning-tokens). Including:
+
+1. Raw CoT will be returned as part of the response unless `reasoning: { exclude: true }` is specified as part of the request. [See details here](https://openrouter.ai/docs/use-cases/reasoning-tokens#legacy-parameters)
+2. The raw CoT is exposed as a `reasoning` property on the message in the output
+3. For delta events the delta has a `reasoning` property
+4. On subsequent turns you should be able to receive the previous reasoning (as `reasoning`) and handle it in accordance with the behavior specified in the chat template section above.
+
+When in doubt, please follow the convention / behavior of the OpenRouter implementation.
+
+## Responses API
+
+For the Responses API we augmented our Responses API spec to cover this case. Below are the changes to the spec as type definitions. At a high level we are:
+
+1. Introducing a new `content` property on `reasoning`. This allows a reasoning `summary` that could be displayed to the end user to be returned at the same time as the raw CoT (which should not be shown to the end user, but which might be helpful for interpretability research).
+2. Introducing a new content type called `reasoning_text`
+3. Introducing two new events `response.reasoning_text.delta` to stream the deltas of the raw CoT and `response.reasoning_text.done` to indicate a turn of CoT to be completed
+4. On subsequent turns you should be able to receive the previous reasoning and handle it in accordance with the behavior specified in the chat template section above.
+
+**Item type changes**
+
+```typescript
+type ReasoningItem = {
+  id: string;
+  type: "reasoning";
+  summary: SummaryContent[];
+  // new
+  content: ReasoningTextContent[];
+};
+
+type ReasoningTextContent = {
+  type: "reasoning_text";
+  text: string;
+};
+
+type ReasoningTextDeltaEvent = {
+  type: "response.reasoning_text.delta";
+  sequence_number: number;
+  item_id: string;
+  output_index: number;
+  content_index: number;
+  delta: string;
+};
+
+type ReasoningTextDoneEvent = {
+  type: "response.reasoning_text.done";
+  sequence_number: number;
+  item_id: string;
+  output_index: number;
+  content_index: number;
+  text: string;
+};
+```
+
+**Event changes**
+
+```typescript
+...
+{
+	type: "response.content_part.added"
+	...
+}
+{
+	type: "response.reasoning_text.delta",
+	sequence_number: 14,
+	item_id: "rs_67f47a642e788191aec9b5c1a35ab3c3016f2c95937d6e91",
+	output_index: 0,
+	content_index: 0,
+	delta: "The "
+}
+...
+{
+	type: "response.reasoning_text.done",
+	sequence_number: 18,
+	item_id: "rs_67f47a642e788191aec9b5c1a35ab3c3016f2c95937d6e91",
+	output_index: 0,
+	content_index: 0,
+	text: "The user asked me to think"
+}
+```
+
+**Example responses output**
+
+```typescript
+"output": [
+  {
+    "type": "reasoning",
+    "id": "rs_67f47a642e788191aec9b5c1a35ab3c3016f2c95937d6e91",
+    "summary": [
+      {
+        "type": "summary_text",
+        "text": "**Calculating volume of gold for Pluto layer**\n\nStarting with the approximation..."
+      }
+    ],
+    "content": [
+      {
+        "type": "reasoning_text",
+        "text": "The user asked me to think..."
+      }
+    ]
+  }
+]
+
+```
+
+## Displaying raw CoT to end-users
+
+If you are providing a chat interface to users, you should not show the raw CoT because it might contain potentially harmful content or other information that you might not intend to show to users (like, for example, instructions in the developer message). Instead, we recommend showing a summarized CoT, similar to our production implementations in the API or ChatGPT, where a summarizer model reviews and blocks harmful content from being shown.
@@ -0,0 +1,247 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "wGfI8meEHXfM"
+      },
+      "source": [
+        "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openai/openai-cookbook/blob/main/articles/gpt-oss/run-colab.ipynb)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gj6KvThm8Jjn"
+      },
+      "source": [
+        "# Run OpenAI gpt-oss 20B in a FREE Google Colab\n",
+        "\n",
+        "OpenAI released `gpt-oss` [120B](https://hf.co/openai/gpt-oss-120b) and [20B](https://hf.co/openai/gpt-oss-20b). Both models are Apache 2.0 licensed.\n",
+        "\n",
+        "Specifically, `gpt-oss-20b` was made for lower latency and local or specialized use cases (21B parameters with 3.6B active parameters).\n",
+        "\n",
+        "Since the models were trained in native MXFP4 quantization it makes it easy to run the 20B even in resource constrained environments like Google Colab.\n",
+        "\n",
+        "Authored by: [Pedro](https://huggingface.co/pcuenq) and [VB](https://huggingface.co/reach-vb)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Kv2foJJa9Xkc"
+      },
+      "source": [
+        "## Setup environment"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "zMRXDOpY1Q3Q"
+      },
+      "source": [
+        "Since support for mxfp4 in transformers is bleeding edge, we need a recent version of PyTorch and CUDA, in order to be able to install the `mxfp4` triton kernels.\n",
+        "\n",
+        "We also need to install transformers from source, and we uninstall `torchvision` and `torchaudio` to remove dependency conflicts."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "4gUEKrLEvJmf"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install -q --upgrade torch"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "3N00UT7gtpkp"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install -q transformers triton==3.4 kernels"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "7GW0knW2w3ND"
+      },
+      "outputs": [],
+      "source": [
+        "!pip uninstall -q torchvision torchaudio -y"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "pxU0WKwtH19m"
+      },
+      "source": [
+        "Please, restart your Colab runtime session after installing the packages above."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "D3xCxY159frD"
+      },
+      "source": [
+        "## Load the model from Hugging Face in Google Colab\n",
+        "\n",
+        "We load the model from here: [openai/gpt-oss-20b](https://hf.co/openai/gpt-oss-20b)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "k2HFwdkXu2R1"
+      },
+      "outputs": [],
+      "source": [
+        "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
+        "\n",
+        "model_id = \"openai/gpt-oss-20b\"\n",
+        "\n",
+        "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
+        "model = AutoModelForCausalLM.from_pretrained(\n",
+        "    model_id,\n",
+        "    torch_dtype=\"auto\",\n",
+        "    device_map=\"cuda\",\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Jbeq6kN79ql0"
+      },
+      "source": [
+        "## Setup messages/ chat\n",
+        "\n",
+        "You can provide an optional system prompt or directly the input."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "P5dJV3xsu_89"
+      },
+      "outputs": [],
+      "source": [
+        "messages = [\n",
+        "    {\"role\": \"system\", \"content\": \"Always respond in riddles\"},\n",
+        "    {\"role\": \"user\", \"content\": \"What is the weather like in Madrid?\"},\n",
+        "]\n",
+        "\n",
+        "inputs = tokenizer.apply_chat_template(\n",
+        "    messages,\n",
+        "    add_generation_prompt=True,\n",
+        "    return_tensors=\"pt\",\n",
+        "    return_dict=True,\n",
+        ").to(model.device)\n",
+        "\n",
+        "generated = model.generate(**inputs, max_new_tokens=500)\n",
+        "print(tokenizer.decode(generated[0][inputs[\"input_ids\"].shape[-1]:]))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ksxo7bjR_-th"
+      },
+      "source": [
+        "## Specify Reasoning Effort"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "fcv6QdcQKLr0"
+      },
+      "source": [
+        "Simply pass it as an additional argument to `apply_chat_template()`. Supported values are `\"low\"`, `\"medium\"` (default), or `\"high\"`."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "CmnkAle608Hl"
+      },
+      "outputs": [],
+      "source": [
+        "messages = [\n",
+        "    {\"role\": \"system\", \"content\": \"Always respond in riddles\"},\n",
+        "    {\"role\": \"user\", \"content\": \"Explain why the meaning of life is 42\"},\n",
+        "]\n",
+        "\n",
+        "inputs = tokenizer.apply_chat_template(\n",
+        "    messages,\n",
+        "    add_generation_prompt=True,\n",
+        "    return_tensors=\"pt\",\n",
+        "    return_dict=True,\n",
+        "    reasoning_effort=\"high\",\n",
+        ").to(model.device)\n",
+        "\n",
+        "generated = model.generate(**inputs, max_new_tokens=500)\n",
+        "print(tokenizer.decode(generated[0][inputs[\"input_ids\"].shape[-1]:]))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Tf2-ocGqEC_r"
+      },
+      "source": [
+        "## Try out other prompts and ideas!\n",
+        "\n",
+        "Check out our blogpost for other ideas: [https://hf.co/blog/welcome-openai-gpt-oss](https://hf.co/blog/welcome-openai-gpt-oss)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "2QrnTpcCKd_n"
+      },
+      "outputs": [],
+      "source": []
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "gpuType": "T4",
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3 (ipykernel)",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.12.7"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}