Skip to content

Commit 88a44eb

Browse files
committed
docs(readme): clarify value prop and CPU quickstart
README.md: tighten messaging (privacy, efficiency, evaluation) README.md: CPU-friendly Quickstart (distilgpt2 LoRA) README.md: condense What's New; fix numbering README.md: add GPT-OSS keys; correct LLaMA casing README.md: copy and grammar polish
1 parent 62d5367 commit 88a44eb

File tree

1 file changed

+39
-33
lines changed

1 file changed

+39
-33
lines changed

README.md

Lines changed: 39 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
<img src=".github/stochastic_logo_light.svg#gh-light-mode-only" width="250" alt="Stochastic.ai"/>
33
<img src=".github/stochastic_logo_dark.svg#gh-dark-mode-only" width="250" alt="Stochastic.ai"/>
44
</p>
5-
<h3 align="center">Build, modify, and control your own personalized LLMs</h3>
5+
<h3 align="center">Fine‑tune, evaluate, and run private, personalized LLMs</h3>
66

77
<p align="center">
88
<a href="https://pypi.org/project/xturing/">
@@ -20,17 +20,14 @@
2020

2121
___
2222

23-
`xTuring` provides fast, efficient and simple fine-tuning of open-source LLMs, such as OpenAI's GPT-OSS, Mistral, LLaMA, GPT-J, and more.
24-
By providing an easy-to-use interface for fine-tuning LLMs to your own data and application, xTuring makes it
25-
simple to build, modify, and control LLMs. The entire process can be done inside your computer or in your
26-
private cloud, ensuring data privacy and security.
23+
`xTuring` makes it simple, fast, and cost‑efficient to fine‑tune open‑source LLMs (e.g., GPT‑OSS, LLaMA/LLaMA 2, Falcon, GPT‑J, GPT‑2, OPT, Bloom, Cerebras, Galactica) on your own data — locally or in your private cloud.
2724

28-
With `xTuring` you can,
29-
- Ingest data from different sources and preprocess them to a format LLMs can understand
30-
- Scale from single to multiple GPUs for faster fine-tuning
31-
- Leverage memory-efficient methods (i.e. INT4, LoRA fine-tuning) to reduce hardware costs by up to 90%
32-
- Explore different fine-tuning methods and benchmark them to find the best performing model
33-
- Evaluate fine-tuned models on well-defined metrics for in-depth analysis
25+
Why xTuring:
26+
- Simple API for data prep, training, and inference
27+
- Private by default: run locally or in your VPC
28+
- Efficient: LoRA and low‑precision (INT8/INT4) to cut costs
29+
- Scales from CPU/laptop to multi‑GPU easily
30+
- Evaluate models with built‑in metrics (e.g., perplexity)
3431

3532
<br>
3633

@@ -43,32 +40,40 @@ pip install xturing
4340

4441
## 🚀 Quickstart
4542

43+
Run a small, CPU‑friendly example first:
44+
4645
```python
4746
from xturing.datasets import InstructionDataset
4847
from xturing.models import BaseModel
4948

50-
# Load the dataset
51-
instruction_dataset = InstructionDataset("./examples/models/llama/alpaca_data")
49+
# Load a toy instruction dataset (Alpaca format)
50+
dataset = InstructionDataset("./examples/models/llama/alpaca_data")
5251

53-
# Initialize the GPT-OSS 20B model with LoRA
54-
model = BaseModel.create("gpt_oss_20b_lora")
52+
# Start small for quick iterations (works on CPU)
53+
model = BaseModel.create("distilgpt2_lora")
54+
55+
# Fine‑tune and then generate
56+
model.finetune(dataset=dataset)
57+
output = model.generate(texts=["Explain quantum computing for beginners."])
58+
print(f"Model output: {output}")
59+
```
5560

56-
# Finetune the model
57-
model.finetune(dataset=instruction_dataset)
61+
Want bigger models and reasoning controls? Try GPT‑OSS variants (requires significant resources):
5862

59-
# Perform inference with reasoning capabilities
60-
output = model.generate(texts=["Explain quantum computing and its potential applications in cryptography"])
63+
```python
64+
from xturing.models import BaseModel
6165

62-
print("Generated output by the model: {}".format(output))
66+
# 120B or 20B variants; also support LoRA/INT8/INT4 configs
67+
model = BaseModel.create("gpt_oss_20b_lora")
6368
```
6469

6570
You can find the data folder [here](examples/models/llama/alpaca_data).
6671

6772
<br>
6873

6974
## 🌟 What's new?
70-
We are excited to announce the latest enhancements to our `xTuring` library:
71-
1. __`OpenAI GPT-OSS` integration__ - You can now use and fine-tune OpenAI's latest open-source models _`GPT-OSS-120B`_ and _`GPT-OSS-20B`_ in different configurations: _off-the-shelf_, _off-the-shelf with INT8 precision_, _LoRA fine-tuning_, _LoRA fine-tuning with INT8 precision_ and _LoRA fine-tuning with INT4 precision_. These models feature advanced reasoning capabilities with configurable reasoning levels (low/medium/high) and support OpenAI's harmony response format.
75+
Highlights from recent updates:
76+
1. __GPT‑OSS integration__ – Use and finetune `gpt_oss_120b` and `gpt_oss_20b` with off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 options. Includes configurable reasoning levels and harmony response format support.
7277
```python
7378
from xturing.models import BaseModel
7479

@@ -80,7 +85,7 @@ model = BaseModel.create('gpt_oss_20b_lora')
8085

8186
# Both models support reasoning levels via system prompts
8287
```
83-
2. __`LLaMA 2` integration__ - You can use and fine-tune the _`LLaMA 2`_ model in different configurations: _off-the-shelf_, _off-the-shelf with INT8 precision_, _LoRA fine-tuning_, _LoRA fine-tuning with INT8 precision_ and _LoRA fine-tuning with INT4 precision_ using the `GenericModel` wrapper and/or you can use the `Llama2` class from `xturing.models` to test and finetune the model.
88+
2. __LLaMA 2 integration__ – Off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 via `GenericModel` or `Llama2`.
8489
```python
8590
from xturing.models import Llama2
8691
model = Llama2()
@@ -90,7 +95,7 @@ from xturing.models import BaseModel
9095
model = BaseModel.create('llama2')
9196

9297
```
93-
3. __`Evaluation`__ - Now you can evaluate any `Causal Language Model` on any dataset. The metrics currently supported is [`perplexity`](https://en.wikipedia.org/wiki/Perplexity).
98+
3. __Evaluation__ – Evaluate any causal LM on any dataset. Currently supports [`perplexity`](https://en.wikipedia.org/wiki/Perplexity).
9499
```python
95100
# Make the necessary imports
96101
from xturing.datasets import InstructionDataset
@@ -109,7 +114,7 @@ result = model.evaluate(dataset)
109114
print(f"Perplexity of the evalution: {result}")
110115

111116
```
112-
3. __`INT4` Precision__ - You can now use and fine-tune any LLM with `INT4 Precision` using `GenericLoraKbitModel`.
117+
4. __INT4 precision__ – Fine‑tune many LLMs with INT4 using `GenericLoraKbitModel`.
113118
```python
114119
# Make the necessary imports
115120
from xturing.datasets import InstructionDataset
@@ -125,7 +130,7 @@ model = GenericLoraKbitModel('tiiuae/falcon-7b')
125130
model.finetune(dataset)
126131
```
127132

128-
4. __CPU inference__ - The CPU, including laptop CPUs, is now fully equipped to handle LLM inference. We integrated [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers) to conserve memory by compressing the model with [weight-only quantization algorithms](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md) and accelerate the inference by leveraging its highly optimized kernel on Intel platforms.
133+
5. __CPU inference__ – Run inference on CPUs (including laptops) via [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers), using weightonly quantization and optimized kernels on Intel platforms.
129134

130135
```python
131136
# Make the necessary imports
@@ -140,7 +145,7 @@ output = model.generate(texts=["Why LLM models are becoming so important?"])
140145
print(output)
141146
```
142147

143-
5. __Batch integration__ - By tweaking the 'batch_size' in the .generate() and .evaluate() functions, you can expedite results. Using a 'batch_size' greater than 1 typically enhances processing efficiency.
148+
6. __Batching__ – Set `batch_size` in `.generate()` and `.evaluate()` to speed up processing.
144149
```python
145150
# Make the necessary imports
146151
from xturing.datasets import InstructionDataset
@@ -232,7 +237,7 @@ Contribute to this by submitting your performance results on other GPUs by creat
232237

233238
<br>
234239

235-
## 📎 Fine-tuned model checkpoints
240+
## 📎 Finetuned model checkpoints
236241
We have already fine-tuned some models that you can use as your base or start playing with.
237242
Here is how you would load them:
238243

@@ -258,25 +263,26 @@ Below is a list of all the supported models via `BaseModel` class of `xTuring` a
258263
|DistilGPT-2 | distilgpt2|
259264
|Falcon-7B | falcon|
260265
|Galactica | galactica|
266+
|GPT-OSS (20B/120B) | gpt_oss_20b, gpt_oss_120b|
261267
|GPT-J | gptj|
262268
|GPT-2 | gpt2|
263-
|LlaMA | llama|
264-
|LlaMA2 | llama2|
269+
|LLaMA | llama|
270+
|LLaMA2 | llama2|
265271
|OPT-1.3B | opt|
266272

267-
The above mentioned are the base variants of the LLMs. Below are the templates to get their `LoRA`, `INT8`, `INT8 + LoRA` and `INT4 + LoRA` versions.
273+
The above are the base variants. Use these templates for `LoRA`, `INT8`, and `INT8 + LoRA` versions:
268274

269275
| Version | Template |
270276
| -- | -- |
271277
| LoRA| <model_key>_lora|
272278
| INT8| <model_key>_int8|
273279
| INT8 + LoRA| <model_key>_lora_int8|
274280

275-
** In order to load any model's __`INT4+LoRA`__ version, you will need to make use of `GenericLoraKbitModel` class from `xturing.models`. Below is how to use it:
281+
To load a model’s __INT4 + LoRA__ version, use the `GenericLoraKbitModel` class:
276282
```python
277283
model = GenericLoraKbitModel('<model_path>')
278284
```
279-
The `model_path` can be replaced with you local directory or any HuggingFace library model like `facebook/opt-1.3b`.
285+
Replace `<model_path>` with a local directory or a Hugging Face model like `facebook/opt-1.3b`.
280286

281287
## 📈 Roadmap
282288
- [x] Support for `LLaMA`, `GPT-J`, `GPT-2`, `OPT`, `Cerebras-GPT`, `Galactica` and `Bloom` models

0 commit comments

Comments
 (0)