From e368d1dc3a3195b1aec4b92fcc8fbe0f7ed7ffac Mon Sep 17 00:00:00 2001 From: palenciaVik Date: Wed, 6 Aug 2025 23:17:59 -0700 Subject: [PATCH] Update README.md Addresses multiple small typos and grammatical errors in the main README.md as well as some improvements in phrasing for clarity. --- README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 7d4f2791..dacf936a 100644 --- a/README.md +++ b/README.md @@ -27,13 +27,13 @@ Both models were trained using our [harmony response format][harmony] and should - **Full chain-of-thought:** Provides complete access to the model's reasoning process, facilitating easier debugging and greater trust in outputs. This information is not intended to be shown to end users. - **Fine-tunable:** Fully customize models to your specific use case through parameter fine-tuning. - **Agentic capabilities:** Use the models' native capabilities for function calling, [web browsing](#browser), [Python code execution](#python), and Structured Outputs. -- **Native MXFP4 quantization:** The models are trained with native MXFP4 precision for the MoE layer, allowing `gpt-oss-120b` to run on a single H100 GPU and `gpt-oss-20b` to run within 16GB of memory.. +- **Native MXFP4 quantization:** The models are trained with native MXFP4 precision for the MoE layer, allowing `gpt-oss-120b` to run on a single H100 GPU and `gpt-oss-20b` to run within 16GB of memory. ### Inference examples #### Transformers -You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use Transformers's chat template it will automatically apply the [harmony response format][harmony]. If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [`openai-harmony`][harmony] package. +You can use `gpt-oss-120b` and `gpt-oss-20b` with the Transformers library. If you use Transformers' chat template, it will automatically apply the [harmony response format][harmony]. If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [`openai-harmony`][harmony] package. ```python from transformers import pipeline @@ -171,7 +171,7 @@ huggingface-cli download openai/gpt-oss-20b --include "original/*" --local-dir g We include an inefficient reference PyTorch implementation in [gpt_oss/torch/model.py](gpt_oss/torch/model.py). This code uses basic PyTorch operators to show the exact model architecture, with a small addition of supporting tensor parallelism in MoE so that the larger model can run with this code (e.g., on 4xH100 or 2xH200). In this implementation, we upcast all weights to BF16 and run the model in BF16. -To run the reference implementation. Install dependencies: +To run the reference implementation, install the dependencies: ```shell pip install -e .[torch] @@ -227,7 +227,7 @@ To perform inference you'll need to first convert the SafeTensor weights from Hu python gpt_oss/metal/scripts/create-local-model.py -s -d ``` -Or downloaded the pre-converted weight: +Or download the pre-converted weight: ```shell huggingface-cli download openai/gpt-oss-120b --include "metal/*" --local-dir gpt-oss-120b/metal/ @@ -279,7 +279,7 @@ options: ``` > [!NOTE] -> The torch and triton implementation requires original checkpoint under `gpt-oss-120b/original/` and `gpt-oss-20b/original/` respectively. While vLLM uses the Hugging Face converted checkpoint under `gpt-oss-120b/` and `gpt-oss-20b/` root directory respectively. +> The torch and triton implementations require original checkpoint under `gpt-oss-120b/original/` and `gpt-oss-20b/original/` respectively. While vLLM uses the Hugging Face converted checkpoint under `gpt-oss-120b/` and `gpt-oss-20b/` root directory respectively. ### Responses API @@ -398,7 +398,7 @@ if last_message.recipient.startswith("browser"): #### Details -To control the context window size this tool use a scrollable window of text that the model can interact with. So it might fetch the first 50 lines of a page and then scroll to the next 20 lines after that. The model has also been trained to then use citations from this tool in its answers. +To control the context window size this tool uses a scrollable window of text that the model can interact with. So it might fetch the first 50 lines of a page and then scroll to the next 20 lines after that. The model has also been trained to then use citations from this tool in its answers. To improve performance the tool caches requests so that the model can revisit a different part of a page without having to reload the page. For that reason you should create a new browser instance for every request. @@ -468,10 +468,10 @@ if last_message.recipient == "python": We released the models with native quantization support. Specifically, we use [MXFP4](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf) for the linear projection weights in the MoE layer. We store the MoE tensor in two parts: -- `tensor.blocks` stores the actual fp4 values. We pack every two value in one `uint8` value. +- `tensor.blocks` stores the actual fp4 values. We pack every two values in one `uint8` value. - `tensor.scales` stores the block scale. The block scaling is done among the last dimension for all MXFP4 tensors. -All other tensors will be in BF16. We also recommend use BF16 as the activation precision for the model. +All other tensors will be in BF16. We also recommend using BF16 as the activation precision for the model. ### Recommended Sampling Parameters