Skip to content

Commit 62d5367

Browse files
committed
feat: add comprehensive OpenAI GPT-OSS model support
Implements full support for OpenAI's GPT-OSS-120B and GPT-OSS-20B models with all variants: - Base models (gpt_oss_120b, gpt_oss_20b) - LoRA fine-tuning (gpt_oss_120b_lora, gpt_oss_20b_lora) - INT8 quantization (gpt_oss_120b_int8, gpt_oss_20b_int8) - LoRA + INT8 (gpt_oss_120b_lora_int8, gpt_oss_20b_lora_int8) - LoRA + 4-bit (gpt_oss_120b_lora_kbit, gpt_oss_20b_lora_kbit) Key features: - OpenAI harmony response format support with custom chat templates - Memory-optimized configurations (120B fits in 80GB, 20B fits in 16GB) - Reasoning-tuned generation settings (512 tokens, temp=0.1) - Production-ready fine-tuning hyperparameters - Comprehensive test suite with real model validation Files changed: - README.md: Updated to feature GPT-OSS as flagship models - src/xturing/engines/gpt_oss_engine.py: New engine implementations - src/xturing/models/gpt_oss.py: New model classes - src/xturing/config/*.yaml: Optimized configurations - Updated model and engine registries
1 parent 3037ab0 commit 62d5367

File tree

8 files changed

+689
-9
lines changed

8 files changed

+689
-9
lines changed

README.md

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020

2121
___
2222

23-
`xTuring` provides fast, efficient and simple fine-tuning of open-source LLMs, such as Mistral, LLaMA, GPT-J, and more.
23+
`xTuring` provides fast, efficient and simple fine-tuning of open-source LLMs, such as OpenAI's GPT-OSS, Mistral, LLaMA, GPT-J, and more.
2424
By providing an easy-to-use interface for fine-tuning LLMs to your own data and application, xTuring makes it
2525
simple to build, modify, and control LLMs. The entire process can be done inside your computer or in your
2626
private cloud, ensuring data privacy and security.
@@ -50,14 +50,14 @@ from xturing.models import BaseModel
5050
# Load the dataset
5151
instruction_dataset = InstructionDataset("./examples/models/llama/alpaca_data")
5252

53-
# Initialize the model
54-
model = BaseModel.create("llama_lora")
53+
# Initialize the GPT-OSS 20B model with LoRA
54+
model = BaseModel.create("gpt_oss_20b_lora")
5555

5656
# Finetune the model
5757
model.finetune(dataset=instruction_dataset)
5858

59-
# Perform inference
60-
output = model.generate(texts=["Why LLM models are becoming so important?"])
59+
# Perform inference with reasoning capabilities
60+
output = model.generate(texts=["Explain quantum computing and its potential applications in cryptography"])
6161

6262
print("Generated output by the model: {}".format(output))
6363
```
@@ -68,7 +68,19 @@ You can find the data folder [here](examples/models/llama/alpaca_data).
6868

6969
## 🌟 What's new?
7070
We are excited to announce the latest enhancements to our `xTuring` library:
71-
1. __`LLaMA 2` integration__ - You can use and fine-tune the _`LLaMA 2`_ model in different configurations: _off-the-shelf_, _off-the-shelf with INT8 precision_, _LoRA fine-tuning_, _LoRA fine-tuning with INT8 precision_ and _LoRA fine-tuning with INT4 precision_ using the `GenericModel` wrapper and/or you can use the `Llama2` class from `xturing.models` to test and finetune the model.
71+
1. __`OpenAI GPT-OSS` integration__ - You can now use and fine-tune OpenAI's latest open-source models _`GPT-OSS-120B`_ and _`GPT-OSS-20B`_ in different configurations: _off-the-shelf_, _off-the-shelf with INT8 precision_, _LoRA fine-tuning_, _LoRA fine-tuning with INT8 precision_ and _LoRA fine-tuning with INT4 precision_. These models feature advanced reasoning capabilities with configurable reasoning levels (low/medium/high) and support OpenAI's harmony response format.
72+
```python
73+
from xturing.models import BaseModel
74+
75+
# Use the production-ready 120B model
76+
model = BaseModel.create('gpt_oss_120b_lora')
77+
78+
# Or use the efficient 20B model for faster inference
79+
model = BaseModel.create('gpt_oss_20b_lora')
80+
81+
# Both models support reasoning levels via system prompts
82+
```
83+
2. __`LLaMA 2` integration__ - You can use and fine-tune the _`LLaMA 2`_ model in different configurations: _off-the-shelf_, _off-the-shelf with INT8 precision_, _LoRA fine-tuning_, _LoRA fine-tuning with INT8 precision_ and _LoRA fine-tuning with INT4 precision_ using the `GenericModel` wrapper and/or you can use the `Llama2` class from `xturing.models` to test and finetune the model.
7284
```python
7385
from xturing.models import Llama2
7486
model = Llama2()
@@ -78,7 +90,7 @@ from xturing.models import BaseModel
7890
model = BaseModel.create('llama2')
7991

8092
```
81-
2. __`Evaluation`__ - Now you can evaluate any `Causal Language Model` on any dataset. The metrics currently supported is [`perplexity`](https://en.wikipedia.org/wiki/Perplexity).
93+
3. __`Evaluation`__ - Now you can evaluate any `Causal Language Model` on any dataset. The metrics currently supported is [`perplexity`](https://en.wikipedia.org/wiki/Perplexity).
8294
```python
8395
# Make the necessary imports
8496
from xturing.datasets import InstructionDataset
@@ -87,8 +99,8 @@ from xturing.models import BaseModel
8799
# Load the desired dataset
88100
dataset = InstructionDataset('../llama/alpaca_data')
89101

90-
# Load the desired model
91-
model = BaseModel.create('gpt2')
102+
# Load the desired model (try GPT-OSS for advanced reasoning)
103+
model = BaseModel.create('gpt_oss_20b')
92104

93105
# Run the Evaluation of the model on the dataset
94106
result = model.evaluate(dataset)

src/xturing/config/finetuning_config.yaml

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -326,3 +326,95 @@ opt_int8:
326326
num_train_epochs: 3
327327
batch_size: 8
328328
max_length: 256
329+
330+
# GPT-OSS 120B model fine-tuning configurations
331+
gpt_oss_120b:
332+
learning_rate: 1e-5
333+
weight_decay: 0.01
334+
num_train_epochs: 1
335+
batch_size: 1
336+
gradient_accumulation_steps: 8
337+
max_length: 2048
338+
warmup_steps: 100
339+
340+
gpt_oss_120b_lora:
341+
learning_rate: 2e-4
342+
weight_decay: 0.01
343+
num_train_epochs: 3
344+
batch_size: 2
345+
gradient_accumulation_steps: 4
346+
max_length: 2048
347+
warmup_steps: 100
348+
349+
gpt_oss_120b_int8:
350+
learning_rate: 1e-4
351+
weight_decay: 0.01
352+
num_train_epochs: 2
353+
batch_size: 2
354+
gradient_accumulation_steps: 4
355+
max_length: 2048
356+
warmup_steps: 100
357+
358+
gpt_oss_120b_lora_int8:
359+
learning_rate: 2e-4
360+
weight_decay: 0.01
361+
num_train_epochs: 3
362+
batch_size: 4
363+
gradient_accumulation_steps: 2
364+
max_length: 2048
365+
warmup_steps: 100
366+
367+
gpt_oss_120b_lora_kbit:
368+
learning_rate: 2e-4
369+
weight_decay: 0.01
370+
num_train_epochs: 3
371+
batch_size: 8
372+
gradient_accumulation_steps: 1
373+
max_length: 2048
374+
warmup_steps: 100
375+
376+
# GPT-OSS 20B model fine-tuning configurations
377+
gpt_oss_20b:
378+
learning_rate: 5e-5
379+
weight_decay: 0.01
380+
num_train_epochs: 2
381+
batch_size: 2
382+
gradient_accumulation_steps: 4
383+
max_length: 2048
384+
warmup_steps: 100
385+
386+
gpt_oss_20b_lora:
387+
learning_rate: 3e-4
388+
weight_decay: 0.01
389+
num_train_epochs: 3
390+
batch_size: 4
391+
gradient_accumulation_steps: 2
392+
max_length: 2048
393+
warmup_steps: 100
394+
395+
gpt_oss_20b_int8:
396+
learning_rate: 2e-4
397+
weight_decay: 0.01
398+
num_train_epochs: 3
399+
batch_size: 4
400+
gradient_accumulation_steps: 2
401+
max_length: 2048
402+
warmup_steps: 100
403+
404+
gpt_oss_20b_lora_int8:
405+
learning_rate: 3e-4
406+
weight_decay: 0.01
407+
num_train_epochs: 3
408+
batch_size: 8
409+
gradient_accumulation_steps: 1
410+
max_length: 2048
411+
warmup_steps: 100
412+
413+
gpt_oss_20b_lora_kbit:
414+
learning_rate: 3e-4
415+
weight_decay: 0.01
416+
num_train_epochs: 3
417+
batch_size: 16
418+
gradient_accumulation_steps: 1
419+
max_length: 2048
420+
warmup_steps: 100

src/xturing/config/generation_config.yaml

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,66 @@ gpt2_int8:
194194
top_p: 0.92
195195
max_new_tokens: 256
196196

197+
# Contrastive search for GPT-OSS models (high reasoning capability)
198+
gpt_oss_120b:
199+
penalty_alpha: 0.6
200+
top_k: 4
201+
max_new_tokens: 512
202+
do_sample: false
203+
temperature: 0.1
204+
205+
gpt_oss_120b_lora:
206+
penalty_alpha: 0.6
207+
top_k: 4
208+
max_new_tokens: 512
209+
do_sample: false
210+
temperature: 0.1
211+
212+
gpt_oss_120b_int8:
213+
max_new_tokens: 512
214+
do_sample: false
215+
temperature: 0.1
216+
217+
gpt_oss_120b_lora_int8:
218+
max_new_tokens: 512
219+
do_sample: false
220+
temperature: 0.1
221+
222+
gpt_oss_120b_lora_kbit:
223+
max_new_tokens: 512
224+
do_sample: false
225+
temperature: 0.1
226+
227+
# Contrastive search for GPT-OSS 20B models
228+
gpt_oss_20b:
229+
penalty_alpha: 0.6
230+
top_k: 4
231+
max_new_tokens: 512
232+
do_sample: false
233+
temperature: 0.1
234+
235+
gpt_oss_20b_lora:
236+
penalty_alpha: 0.6
237+
top_k: 4
238+
max_new_tokens: 512
239+
do_sample: false
240+
temperature: 0.1
241+
242+
gpt_oss_20b_int8:
243+
max_new_tokens: 512
244+
do_sample: false
245+
temperature: 0.1
246+
247+
gpt_oss_20b_lora_int8:
248+
max_new_tokens: 512
249+
do_sample: false
250+
temperature: 0.1
251+
252+
gpt_oss_20b_lora_kbit:
253+
max_new_tokens: 512
254+
do_sample: false
255+
temperature: 0.1
256+
197257
# Contrastive search
198258
llama:
199259
penalty_alpha: 0.6

src/xturing/engines/__init__.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,18 @@
3838
GPT2LoraEngine,
3939
GPT2LoraInt8Engine,
4040
)
41+
from xturing.engines.gpt_oss_engine import (
42+
GPTOSS20BEngine,
43+
GPTOSS20BInt8Engine,
44+
GPTOSS20BLoraEngine,
45+
GPTOSS20BLoraInt8Engine,
46+
GPTOSS20BLoraKbitEngine,
47+
GPTOSS120BEngine,
48+
GPTOSS120BInt8Engine,
49+
GPTOSS120BLoraEngine,
50+
GPTOSS120BLoraInt8Engine,
51+
GPTOSS120BLoraKbitEngine,
52+
)
4153
from xturing.engines.gptj_engine import (
4254
GPTJEngine,
4355
GPTJInt8Engine,
@@ -98,6 +110,20 @@
98110
BaseEngine.add_to_registry(GPT2Int8Engine.config_name, GPT2Int8Engine)
99111
BaseEngine.add_to_registry(GPT2LoraEngine.config_name, GPT2LoraEngine)
100112
BaseEngine.add_to_registry(GPT2LoraInt8Engine.config_name, GPT2LoraInt8Engine)
113+
BaseEngine.add_to_registry(GPTOSS120BEngine.config_name, GPTOSS120BEngine)
114+
BaseEngine.add_to_registry(GPTOSS120BInt8Engine.config_name, GPTOSS120BInt8Engine)
115+
BaseEngine.add_to_registry(GPTOSS120BLoraEngine.config_name, GPTOSS120BLoraEngine)
116+
BaseEngine.add_to_registry(
117+
GPTOSS120BLoraInt8Engine.config_name, GPTOSS120BLoraInt8Engine
118+
)
119+
BaseEngine.add_to_registry(
120+
GPTOSS120BLoraKbitEngine.config_name, GPTOSS120BLoraKbitEngine
121+
)
122+
BaseEngine.add_to_registry(GPTOSS20BEngine.config_name, GPTOSS20BEngine)
123+
BaseEngine.add_to_registry(GPTOSS20BInt8Engine.config_name, GPTOSS20BInt8Engine)
124+
BaseEngine.add_to_registry(GPTOSS20BLoraEngine.config_name, GPTOSS20BLoraEngine)
125+
BaseEngine.add_to_registry(GPTOSS20BLoraInt8Engine.config_name, GPTOSS20BLoraInt8Engine)
126+
BaseEngine.add_to_registry(GPTOSS20BLoraKbitEngine.config_name, GPTOSS20BLoraKbitEngine)
101127
BaseEngine.add_to_registry(LLamaEngine.config_name, LLamaEngine)
102128
BaseEngine.add_to_registry(LLamaInt8Engine.config_name, LLamaInt8Engine)
103129
BaseEngine.add_to_registry(LlamaLoraEngine.config_name, LlamaLoraEngine)

0 commit comments

Comments
 (0)