|
| 1 | +# How to run gpt-oss locally with LM Studio |
| 2 | + |
| 3 | +[LM Studio](https://lmstudio.ai) is a performant and friendly desktop application for running large language models (LLMs) on local hardware. This guide will walk you through how to set up and run **gpt-oss-20b** or **gpt-oss-120b** models using LM Studio, including how to chat with them, use MCP servers, or interact with the models through LM Studio's local development API. |
| 4 | + |
| 5 | +Note that this guide is meant for consumer hardware, like running gpt-oss on a PC or Mac. For server applications with dedicated GPUs like NVIDIA's H100s, [check out our vLLM guide](https://cookbook.openai.com/articles/gpt-oss/run-vllm). |
| 6 | + |
| 7 | +## Pick your model |
| 8 | + |
| 9 | +LM Studio supports both model sizes of gpt-oss: |
| 10 | + |
| 11 | +- [**`openai/gpt-oss-20b`**](https://lmstudio.ai/models/openai/gpt-oss-20b) |
| 12 | + - The smaller model |
| 13 | + - Only requires at least **16GB of VRAM** |
| 14 | + - Perfect for higher-end consumer GPUs or Apple Silicon Macs |
| 15 | +- [**`openai/gpt-oss-120b`**](https://lmstudio.ai/models/openai/gpt-oss-120b) |
| 16 | + - Our larger full-sized model |
| 17 | + - Best with **≥60GB VRAM** |
| 18 | + - Ideal for multi-GPU or beefy workstation setup |
| 19 | + |
| 20 | +LM Studio ships both a [llama.cpp](https://github.com/ggml-org/llama.cpp) inferencing engine (running GGUF formatted models), as well as an [Apple MLX](https://github.com/ml-explore/mlx) engine for Apple Silicon Macs. |
| 21 | + |
| 22 | +## Quick setup |
| 23 | + |
| 24 | +1. **Install LM Studio** |
| 25 | + LM Studio is available for Windows, macOS, and Linux. [Get it here](https://lmstudio.ai/download). |
| 26 | + |
| 27 | +2. **Download the gpt-oss model** → |
| 28 | + |
| 29 | +```shell |
| 30 | +# For 20B |
| 31 | +lms get openai/gpt-oss-20b |
| 32 | +# or for 120B |
| 33 | +lms get openai/gpt-oss-120b |
| 34 | +``` |
| 35 | + |
| 36 | +3. **Load the model in LM Studio** |
| 37 | + → Open LM Studio and use the model loading interface to load the gpt-oss model you downloaded. Alternatively, you can use the command line: |
| 38 | + |
| 39 | +```shell |
| 40 | +# For 20B |
| 41 | +lms load openai/gpt-oss-20b |
| 42 | +# or for 120B |
| 43 | +lms load openai/gpt-oss-120b |
| 44 | +``` |
| 45 | + |
| 46 | +4. **Use the model** → Once loaded, you can interact with the model directly in LM Studio's chat interface or through the API. |
| 47 | + |
| 48 | +## Chat with gpt-oss |
| 49 | + |
| 50 | +Use LM Studio's chat interface to start a conversation with gpt-oss, or use the `chat` command in the terminal: |
| 51 | + |
| 52 | +```shell |
| 53 | +lms chat openai/gpt-oss-20b |
| 54 | +``` |
| 55 | + |
| 56 | +Note about prompt formatting: LM Studio utilizes OpenAI's [Harmony](https://cookbook.openai.com/articles/openai-harmony) library to construct the input to gpt-oss models, both when running via llama.cpp and MLX. |
| 57 | + |
| 58 | +## Use gpt-oss with a local /v1/chat/completions endpoint |
| 59 | + |
| 60 | +LM Studio exposes a **Chat Completions-compatible API** so you can use the OpenAI SDK without changing much. Here’s a Python example: |
| 61 | + |
| 62 | +```py |
| 63 | +from openai import OpenAI |
| 64 | + |
| 65 | +client = OpenAI( |
| 66 | + base_url="http://localhost:1234/v1", |
| 67 | + api_key="not-needed" # LM Studio does not require an API key |
| 68 | +) |
| 69 | + |
| 70 | +result = client.chat.completions.create( |
| 71 | + model="openai/gpt-oss-20b", |
| 72 | + messages=[ |
| 73 | + {"role": "system", "content": "You are a helpful assistant."}, |
| 74 | + {"role": "user", "content": "Explain what MXFP4 quantization is."} |
| 75 | + ] |
| 76 | +) |
| 77 | + |
| 78 | +print(result.choices[0].message.content) |
| 79 | +``` |
| 80 | + |
| 81 | +If you’ve used the OpenAI SDK before, this will feel instantly familiar and your existing code should work by changing the base URL. |
| 82 | + |
| 83 | +## How to use MCPs in the chat UI |
| 84 | + |
| 85 | +LM Studio is an [MCP client](https://lmstudio.ai/docs/app/plugins/mcp), which means you can connect MCP servers to it. This allows you to provide external tools to gpt-oss models. |
| 86 | + |
| 87 | +LM Studio's mcp.json file is located in: |
| 88 | + |
| 89 | +```shell |
| 90 | +~/.lmstudio/mcp.json |
| 91 | +``` |
| 92 | + |
| 93 | +## Local tool use with gpt-oss in Python or TypeScript |
| 94 | + |
| 95 | +LM Studio's SDK is available in both [Python](https://github.com/lmstudio-ai/lmstudio-python) and [TypeScript](https://github.com/lmstudio-ai/lmstudio-js). You can leverage the SDK to implement tool calling and local function execution with gpt-oss. |
| 96 | + |
| 97 | +The way to achieve this is via the `.act()` call, which allows you to provide tools to the gpt-oss and have it go between calling tools and reasoning, until it completes your task. |
| 98 | + |
| 99 | +The example below shows how to provide a single tool to the model that is able to create files on your local filesystem. You can use this example as a starting point, and extend it with more tools. See docs about tool definitions here for [Python](https://lmstudio.ai/docs/python/agent/tools) and [TypeScript](https://lmstudio.ai/docs/typescript/agent/tools). |
| 100 | + |
| 101 | +```shell |
| 102 | +uv pip install lmstudio |
| 103 | +``` |
| 104 | + |
| 105 | +```python |
| 106 | +import readline # Enables input line editing |
| 107 | +from pathlib import Path |
| 108 | + |
| 109 | +import lmstudio as lms |
| 110 | + |
| 111 | +# Define a function that can be called by the model and provide them as tools to the model. |
| 112 | +# Tools are just regular Python functions. They can be anything at all. |
| 113 | +def create_file(name: str, content: str): |
| 114 | + """Create a file with the given name and content.""" |
| 115 | + dest_path = Path(name) |
| 116 | + if dest_path.exists(): |
| 117 | + return "Error: File already exists." |
| 118 | + try: |
| 119 | + dest_path.write_text(content, encoding="utf-8") |
| 120 | + except Exception as exc: |
| 121 | + return "Error: {exc!r}" |
| 122 | + return "File created." |
| 123 | + |
| 124 | +def print_fragment(fragment, round_index=0): |
| 125 | + # .act() supplies the round index as the second parameter |
| 126 | + # Setting a default value means the callback is also |
| 127 | + # compatible with .complete() and .respond(). |
| 128 | + print(fragment.content, end="", flush=True) |
| 129 | + |
| 130 | +model = lms.llm("openai/gpt-oss-20b") |
| 131 | +chat = lms.Chat("You are a helpful assistant running on the user's computer.") |
| 132 | + |
| 133 | +while True: |
| 134 | + try: |
| 135 | + user_input = input("User (leave blank to exit): ") |
| 136 | + except EOFError: |
| 137 | + print() |
| 138 | + break |
| 139 | + if not user_input: |
| 140 | + break |
| 141 | + chat.add_user_message(user_input) |
| 142 | + print("Assistant: ", end="", flush=True) |
| 143 | + model.act( |
| 144 | + chat, |
| 145 | + [create_file], |
| 146 | + on_message=chat.append, |
| 147 | + on_prediction_fragment=print_fragment, |
| 148 | + ) |
| 149 | + print() |
| 150 | + |
| 151 | +``` |
| 152 | + |
| 153 | +For TypeScript developers who want to utilize gpt-oss locally, here's a similar example using `lmstudio-js`: |
| 154 | + |
| 155 | +```shell |
| 156 | +npm install @lmstudio/sdk |
| 157 | +``` |
| 158 | + |
| 159 | +```typescript |
| 160 | +import { Chat, LMStudioClient, tool } from "@lmstudio/sdk"; |
| 161 | +import { existsSync } from "fs"; |
| 162 | +import { writeFile } from "fs/promises"; |
| 163 | +import { createInterface } from "readline/promises"; |
| 164 | +import { z } from "zod"; |
| 165 | + |
| 166 | +const rl = createInterface({ input: process.stdin, output: process.stdout }); |
| 167 | +const client = new LMStudioClient(); |
| 168 | +const model = await client.llm.model("openai/gpt-oss-20b"); |
| 169 | +const chat = Chat.empty(); |
| 170 | + |
| 171 | +const createFileTool = tool({ |
| 172 | + name: "createFile", |
| 173 | + description: "Create a file with the given name and content.", |
| 174 | + parameters: { name: z.string(), content: z.string() }, |
| 175 | + implementation: async ({ name, content }) => { |
| 176 | + if (existsSync(name)) { |
| 177 | + return "Error: File already exists."; |
| 178 | + } |
| 179 | + await writeFile(name, content, "utf-8"); |
| 180 | + return "File created."; |
| 181 | + }, |
| 182 | +}); |
| 183 | + |
| 184 | +while (true) { |
| 185 | + const input = await rl.question("User: "); |
| 186 | + // Append the user input to the chat |
| 187 | + chat.append("user", input); |
| 188 | + |
| 189 | + process.stdout.write("Assistant: "); |
| 190 | + await model.act(chat, [createFileTool], { |
| 191 | + // When the model finish the entire message, push it to the chat |
| 192 | + onMessage: (message) => chat.append(message), |
| 193 | + onPredictionFragment: ({ content }) => { |
| 194 | + process.stdout.write(content); |
| 195 | + }, |
| 196 | + }); |
| 197 | + process.stdout.write("\n"); |
| 198 | +} |
| 199 | +``` |
0 commit comments