Skip to content

Commit 220a058

Browse files
Fix typos and improve grammar in README (#61)
Corrected several typographical errors and improved grammar throughout the README for better clarity and professionalism. Changes include fixing word forms, possessives, and minor phrasing issues. Co-authored-by: Romain Huet <[email protected]>
1 parent 7ba69ff commit 220a058

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ print(outputs[0]["generated_text"][-1])
6363

6464
#### vLLM
6565

66-
vLLM recommends using [`uv`](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible webserver. The following command will automatically download the model and start the server.
66+
vLLM recommends using [`uv`](https://docs.astral.sh/uv/) for Python dependency management. You can use vLLM to spin up an OpenAI-compatible web server. The following command will automatically download the model and start the server.
6767

6868
```bash
6969
uv pip install --pre vllm==0.10.1+gptoss \
@@ -130,7 +130,7 @@ This repository provides a collection of reference implementations:
130130

131131
### Requirements
132132

133-
- python 3.12
133+
- Python 3.12
134134
- On macOS: Install the Xcode CLI tools --> `xcode-select --install`
135135
- On Linux: These reference implementations require CUDA
136136
- On Windows: These reference implementations have not been tested on Windows. Try using solutions like Ollama if you are trying to run the model locally.
@@ -171,7 +171,7 @@ hf download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/
171171

172172
We include an inefficient reference PyTorch implementation in [gpt_oss/torch/model.py](gpt_oss/torch/model.py). This code uses basic PyTorch operators to show the exact model architecture, with a small addition of supporting tensor parallelism in MoE so that the larger model can run with this code (e.g., on 4xH100 or 2xH200). In this implementation, we upcast all weights to BF16 and run the model in BF16.
173173

174-
To run the reference implementation, install these dependencies:
174+
To run the reference implementation, install the dependencies:
175175

176176
```shell
177177
pip install -e ".[torch]"
@@ -227,7 +227,7 @@ To perform inference you'll need to first convert the SafeTensor weights from Hu
227227
python gpt_oss/metal/scripts/create-local-model.py -s <model_dir> -d <output_file>
228228
```
229229

230-
Or downloaded the pre-converted weight:
230+
Or download the pre-converted weight:
231231

232232
```shell
233233
hf download openai/gpt-oss-120b --include "metal/*" --local-dir gpt-oss-120b/metal/
@@ -250,7 +250,7 @@ We also include two system tools for the model: browsing and python container. C
250250

251251
### Terminal Chat
252252

253-
The terminal chat application is a basic example on how to use the harmony format together with the PyTorch, Triton, and vLLM implementations. It also exposes both the python and browser tool as optional tools that can be used.
253+
The terminal chat application is a basic example of how to use the harmony format together with the PyTorch, Triton, and vLLM implementations. It also exposes both the python and browser tool as optional tools that can be used.
254254

255255
```bash
256256
usage: python -m gpt_oss.chat [-h] [-r REASONING_EFFORT] [-a] [-b] [--show-browser-results] [-p] [--developer-message DEVELOPER_MESSAGE] [-c CONTEXT] [--raw] [--backend {triton,torch,vllm}] FILE
@@ -289,7 +289,7 @@ You can start this server with the following inference backends:
289289

290290
- `triton` — uses the triton implementation
291291
- `metal` — uses the metal implementation on Apple Silicon only
292-
- `ollama` — uses the Ollama /api/generate API as a inference solution
292+
- `ollama` — uses the Ollama /api/generate API as an inference solution
293293
- `vllm` — uses your installed vllm version to perform inference
294294
- `transformers` — uses your installed transformers version to perform local inference
295295

@@ -468,10 +468,10 @@ if last_message.recipient == "python":
468468

469469
We released the models with native quantization support. Specifically, we use [MXFP4](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf) for the linear projection weights in the MoE layer. We store the MoE tensor in two parts:
470470

471-
- `tensor.blocks` stores the actual fp4 values. We pack every two value in one `uint8` value.
471+
- `tensor.blocks` stores the actual fp4 values. We pack every two values in one `uint8` value.
472472
- `tensor.scales` stores the block scale. The block scaling is done among the last dimension for all MXFP4 tensors.
473473

474-
All other tensors will be in BF16. We also recommend use BF16 as the activation precision for the model.
474+
All other tensors will be in BF16. We also recommend using BF16 as the activation precision for the model.
475475

476476
### Recommended Sampling Parameters
477477

0 commit comments

Comments
 (0)