You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h3align="center">Build, modify, and control your own personalized LLMs</h3>
5
+
<h3align="center">Fine‑tune, evaluate, and run private, personalized LLMs</h3>
6
6
7
7
<palign="center">
8
8
<ahref="https://pypi.org/project/xturing/">
@@ -20,17 +20,14 @@
20
20
21
21
___
22
22
23
-
`xTuring` provides fast, efficient and simple fine-tuning of open-source LLMs, such as OpenAI's GPT-OSS, Mistral, LLaMA, GPT-J, and more.
24
-
By providing an easy-to-use interface for fine-tuning LLMs to your own data and application, xTuring makes it
25
-
simple to build, modify, and control LLMs. The entire process can be done inside your computer or in your
26
-
private cloud, ensuring data privacy and security.
23
+
`xTuring` makes it simple, fast, and cost‑efficient to fine‑tune open‑source LLMs (e.g., GPT‑OSS, LLaMA/LLaMA 2, Falcon, GPT‑J, GPT‑2, OPT, Bloom, Cerebras, Galactica) on your own data — locally or in your private cloud.
27
24
28
-
With `xTuring` you can,
29
-
-Ingest data from different sources and preprocess them to a format LLMs can understand
30
-
-Scale from single to multiple GPUs for faster fine-tuning
31
-
-Leverage memory-efficient methods (i.e. INT4, LoRA fine-tuning) to reduce hardware costs by up to 90%
32
-
-Explore different fine-tuning methods and benchmark them to find the best performing model
33
-
- Evaluate fine-tuned models on well-defined metrics for in-depth analysis
25
+
Why xTuring:
26
+
-Simple API for data prep, training, and inference
27
+
-Private by default: run locally or in your VPC
28
+
-Efficient: LoRA and low‑precision (INT8/INT4) to cut costs
29
+
-Scales from CPU/laptop to multi‑GPU easily
30
+
- Evaluate models with built‑in metrics (e.g., perplexity)
output = model.generate(texts=["Explain quantum computing and its potential applications in cryptography"])
63
+
```python
64
+
from xturing.models import BaseModel
61
65
62
-
print("Generated output by the model: {}".format(output))
66
+
# 120B or 20B variants; also support LoRA/INT8/INT4 configs
67
+
model = BaseModel.create("gpt_oss_20b_lora")
63
68
```
64
69
65
70
You can find the data folder [here](examples/models/llama/alpaca_data).
66
71
67
72
<br>
68
73
69
74
## 🌟 What's new?
70
-
We are excited to announce the latest enhancements to our `xTuring` library:
71
-
1.__`OpenAI GPT-OSS` integration__- You can now use and fine-tune OpenAI's latest open-source models _`GPT-OSS-120B`_ and _`GPT-OSS-20B`_ in different configurations: _off-the-shelf_, _off-the-shelf with INT8 precision_, _LoRA fine-tuning_, _LoRA fine-tuning with INT8 precision_and _LoRA fine-tuning with INT4 precision_. These models feature advanced reasoning capabilities with configurable reasoning levels (low/medium/high) and support OpenAI's harmony response format.
75
+
Highlights from recent updates:
76
+
1.__GPT‑OSS integration__– Use and fine‑tune `gpt_oss_120b` and `gpt_oss_20b` with off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 options. Includes configurable reasoning levels and harmony response format support.
72
77
```python
73
78
from xturing.models import BaseModel
74
79
@@ -80,7 +85,7 @@ model = BaseModel.create('gpt_oss_20b_lora')
80
85
81
86
# Both models support reasoning levels via system prompts
82
87
```
83
-
2.__`LLaMA 2` integration__- You can use and fine-tune the_`LLaMA 2`_ model in different configurations: _off-the-shelf_, _off-the-shelf with INT8 precision_, _LoRA fine-tuning_, _LoRA fine-tuning with INT8 precision_and _LoRA fine-tuning with INT4 precision_ using the `GenericModel`wrapper and/or you can use the `Llama2` class from `xturing.models` to test and finetune the model.
88
+
2.__LLaMA 2 integration__– Off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 via `GenericModel` or `Llama2`.
84
89
```python
85
90
from xturing.models import Llama2
86
91
model = Llama2()
@@ -90,7 +95,7 @@ from xturing.models import BaseModel
90
95
model = BaseModel.create('llama2')
91
96
92
97
```
93
-
3.__`Evaluation`__ - Now you can evaluate any `Causal Language Model`on any dataset. The metrics currently supported is[`perplexity`](https://en.wikipedia.org/wiki/Perplexity).
98
+
3.__Evaluation__ – Evaluate any causal LM on any dataset. Currently supports[`perplexity`](https://en.wikipedia.org/wiki/Perplexity).
94
99
```python
95
100
# Make the necessary imports
96
101
from xturing.datasets import InstructionDataset
@@ -109,7 +114,7 @@ result = model.evaluate(dataset)
109
114
print(f"Perplexity of the evalution: {result}")
110
115
111
116
```
112
-
3.__`INT4` Precision__ - You can now use and fine-tune any LLM with `INT4 Precision` using `GenericLoraKbitModel`.
117
+
4.__INT4 precision__ – Fine‑tune many LLMs with INT4 using `GenericLoraKbitModel`.
113
118
```python
114
119
# Make the necessary imports
115
120
from xturing.datasets import InstructionDataset
@@ -125,7 +130,7 @@ model = GenericLoraKbitModel('tiiuae/falcon-7b')
125
130
model.finetune(dataset)
126
131
```
127
132
128
-
4.__CPU inference__- The CPU, including laptop CPUs, is now fully equipped to handle LLM inference. We integrated [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) to conserve memory by compressing the model with [weight-only quantization algorithms](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md)and accelerate the inference by leveraging its highly optimized kernel on Intel platforms.
133
+
5.__CPU inference__– Run inference on CPUs (including laptops) via [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), using weight‑only quantization and optimized kernels on Intel platforms.
129
134
130
135
```python
131
136
# Make the necessary imports
@@ -140,7 +145,7 @@ output = model.generate(texts=["Why LLM models are becoming so important?"])
140
145
print(output)
141
146
```
142
147
143
-
5.__Batch integration__ - By tweaking the 'batch_size' in the .generate() and .evaluate() functions, you can expedite results. Using a 'batch_size' greater than 1 typically enhances processing efficiency.
148
+
6.__Batching__ – Set `batch_size` in `.generate()` and `.evaluate()` to speed up processing.
144
149
```python
145
150
# Make the necessary imports
146
151
from xturing.datasets import InstructionDataset
@@ -232,7 +237,7 @@ Contribute to this by submitting your performance results on other GPUs by creat
232
237
233
238
<br>
234
239
235
-
## 📎 Fine-tuned model checkpoints
240
+
## 📎 Fine‑tuned model checkpoints
236
241
We have already fine-tuned some models that you can use as your base or start playing with.
237
242
Here is how you would load them:
238
243
@@ -258,25 +263,26 @@ Below is a list of all the supported models via `BaseModel` class of `xTuring` a
258
263
|DistilGPT-2 | distilgpt2|
259
264
|Falcon-7B | falcon|
260
265
|Galactica | galactica|
266
+
|GPT-OSS (20B/120B) | gpt_oss_20b, gpt_oss_120b|
261
267
|GPT-J | gptj|
262
268
|GPT-2 | gpt2|
263
-
|LlaMA| llama|
264
-
|LlaMA2| llama2|
269
+
|LLaMA| llama|
270
+
|LLaMA2| llama2|
265
271
|OPT-1.3B | opt|
266
272
267
-
The above mentioned are the base variants of the LLMs. Below are the templates to get their `LoRA`, `INT8`, `INT8 + LoRA`and `INT4 + LoRA` versions.
273
+
The above are the base variants. Use these templates for `LoRA`, `INT8`, and `INT8 + LoRA` versions:
268
274
269
275
| Version | Template |
270
276
| -- | -- |
271
277
| LoRA| <model_key>_lora|
272
278
| INT8| <model_key>_int8|
273
279
| INT8 + LoRA| <model_key>_lora_int8|
274
280
275
-
** In order to load any model's __`INT4+LoRA`__version, you will need to make use of`GenericLoraKbitModel` class from `xturing.models`. Below is how to use it:
281
+
To load a model’s __INT4 + LoRA__version, use the`GenericLoraKbitModel` class:
276
282
```python
277
283
model = GenericLoraKbitModel('<model_path>')
278
284
```
279
-
The `model_path` can be replaced with you local directory or any HuggingFace library model like `facebook/opt-1.3b`.
285
+
Replace `<model_path>`with a local directory or a Hugging Face model like `facebook/opt-1.3b`.
280
286
281
287
## 📈 Roadmap
282
288
-[x] Support for `LLaMA`, `GPT-J`, `GPT-2`, `OPT`, `Cerebras-GPT`, `Galactica` and `Bloom` models
0 commit comments