You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you would like to contribute to this project, we recommend following the ["fork-and-pull" Git workflow](https://www.atlassian.com/git/tutorials/comparing-workflows/forking-workflow).
4
+
5
+
1.**Fork** the repo on GitHub
6
+
2.**Clone** the project to your own machine
7
+
3.**Commit** changes to your own branch
8
+
4.**Push** your work back up to your fork
9
+
5. Submit a **Pull request** so that we can review your changes
10
+
11
+
NOTE: Be sure to merge the latest from "upstream" before making a pull request!
<summary>Install with Docker [Recommended]</summary>
29
+
30
+
```shell
31
+
docker build -t llm-toolkit
32
+
```
33
+
34
+
```shell
35
+
# CPU
36
+
docker run -it llm-toolkit
37
+
# GPU
38
+
docker run -it --gpus all llm-toolkit
39
+
```
40
+
41
+
</details>
42
+
43
+
<details>
44
+
<summary>Poetry (recommended)</summary>
45
+
46
+
See poetry documentation page for poetry [installation instructions](https://python-poetry.org/docs/#installation)
47
+
48
+
```shell
49
+
poetry install
50
+
```
51
+
52
+
</details>
53
+
<details>
54
+
<summary>pip</summary>
55
+
We recommend using a virtual environment like `venv` or `conda` for installation
56
+
57
+
```shell
58
+
pip install -e .
59
+
```
60
+
61
+
</details>
62
+
</details>
63
+
64
+
### Checklist Before Pull Request (Optional)
65
+
66
+
1. Use `ruff check --fix` to check and fix lint errors
67
+
2. Use `ruff format` to apply formatting
68
+
69
+
NOTE: Ruff linting and formatting checks are done when PR is raised via Git Action. Before raising a PR, it is a good practice to check and fix lint errors, as well as apply formatting.
70
+
71
+
### Releasing
72
+
73
+
To manually release a PyPI package, please run:
74
+
75
+
```shell
76
+
make build-release
77
+
```
78
+
79
+
Note: Make sure you have a pypi token for this [PyPI repo](https://pypi.org/project/llm-toolkit/).
Copy file name to clipboardExpand all lines: README.md
+15-90Lines changed: 15 additions & 90 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,17 +6,17 @@
6
6
7
7
## Overview
8
8
9
-
LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM finetuning experiments on your data and gathering their results. From one single `yaml` config file, control all elements of a typical experimentation pipeline - **prompts**, **open-source LLMs**, **optimization strategy** and **LLM testing**.
9
+
LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM fine-tuning experiments on your data and gathering their results. From one single `yaml` config file, control all elements of a typical experimentation pipeline - **prompts**, **open-source LLMs**, **optimization strategy** and **LLM testing**.
pipx installs the package and depdencies in a seperate virtual environment
19
+
[pipx](https://pipx.pypa.io/stable/) installs the package and dependencies in a separate virtual environment
20
20
21
21
```shell
22
22
pipx install llm-toolkit
@@ -39,8 +39,8 @@ This guide contains 3 stages that will enable you to get the most out of this to
39
39
### Basic
40
40
41
41
```shell
42
-
llmtune generate config
43
-
llmtune run --config_path ./config.yml
42
+
llmtune generate config
43
+
llmtune run ./config.yml
44
44
```
45
45
46
46
The first command generates a helpful starter `config.yml` file and saves in the current working directory. This is provided to users to quickly get started and as a base for further modification.
@@ -166,21 +166,21 @@ qa:
166
166
167
167
#### Artifact Outputs
168
168
169
-
This config will run finetuning and save the results under directory `./experiment/[unique_hash]`. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to exit in the middle of the training, by relaunching the script, the program will automatically load the existing dataset that has been generated under the directory, instead of doing it all over again.
169
+
This config will run fine-tuning and save the results under directory `./experiment/[unique_hash]`. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to exit in the middle of the training, by relaunching the script, the program will automatically load the existing dataset that has been generated under the directory, instead of doing it all over again.
170
170
171
171
After the script finishes running you will see these distinct artifacts:
172
172
173
173
```shell
174
-
/dataset # generated pkl file in hf datasets format
175
-
/model # peft model weights in hf format
176
-
/results # csv of prompt, ground truth, and predicted values
177
-
/qa # csv of test results: e.g. vector similarity between ground truth and prediction
174
+
/dataset # generated pkl file in hf datasets format
175
+
/model # peft model weights in hf format
176
+
/results # csv of prompt, ground truth, and predicted values
177
+
/qa # csv of test results: e.g. vector similarity between ground truth and prediction
178
178
```
179
179
180
180
Once all the changes have been incorporated in the YAML file, you can simply use it to run a custom fine-tuning experiment!
181
181
182
-
```python
183
-
python toolkit.py --config-path <path to custom YAML file>
182
+
```shell
183
+
python toolkit.py --config-path <path to custom YAML file>
184
184
```
185
185
186
186
### Advanced
@@ -236,84 +236,9 @@ lora:
236
236
237
237
## Extending
238
238
239
-
The toolkit provides a modular and extensible architecture that allows developers to customize and enhance its functionality to suit their specific needs. Each component of the toolkit, such as data ingestion, finetuning, inference, and quality assurance testing, is designed to be easily extendable.
239
+
The toolkit provides a modular and extensible architecture that allows developers to customize and enhance its functionality to suit their specific needs. Each component of the toolkit, such as data ingestion, fine-tuning, inference, and quality assurance testing, is designed to be easily extendable.
240
240
241
241
## Contributing
242
242
243
-
If you would like to contribute to this project, we recommend following the "fork-and-pull" Git workflow.
244
-
245
-
1. **Fork** the repo on GitHub
246
-
2. **Clone** the project to your own machine
247
-
3. **Commit** changes to your own branch
248
-
4. **Push** your work back up to your fork
249
-
5. Submit a **Pull request** so that we can review your changes
250
-
251
-
NOTE: Be sure to merge the latest from "upstream" before making a pull request!
<summary>Install with Docker [Recommended]</summary>
269
-
270
-
```shell
271
-
docker build -t llm-toolkit
272
-
```
273
-
274
-
```shell
275
-
# CPU
276
-
docker run -it llm-toolkit
277
-
# GPU
278
-
docker run -it --gpus all llm-toolkit
279
-
```
280
-
281
-
</details>
282
-
283
-
<details>
284
-
<summary>Poetry (recommended)</summary>
285
-
286
-
See poetry documentation page for poetry [installation instructions](https://python-poetry.org/docs/#installation)
287
-
288
-
```shell
289
-
poetry install
290
-
```
291
-
292
-
</details>
293
-
<details>
294
-
<summary>pip</summary>
295
-
We recommend using a virtual environment like `venv` or `conda` for installation
296
-
297
-
```shell
298
-
pip install -e .
299
-
```
300
-
301
-
</details>
302
-
</details>
303
-
304
-
### Checklist Before Pull Request (Optional)
305
-
306
-
1. Use `ruff check --fix` to check and fix lint errors
307
-
2. Use `ruff format` to apply formatting
308
-
309
-
NOTE: Ruff linting and formatting checks are done when PR is raised via Git Action. Before raising a PR, it is a good practice to check and fix lint errors, as well as apply formatting.
310
-
311
-
### Releasing
312
-
313
-
To manually release a PyPI package, please run:
314
-
315
-
```shell
316
-
make build-release
317
-
```
318
-
319
-
Note: Make sure you have pypi token for this [PyPI repo](https://pypi.org/project/llm-toolkit/).
243
+
Open-source contributions to this toolkit are welcome and encouraged.
244
+
If you would like to contribute, please see [CONTRIBUTING.md](CONTRIBUTING.md).
0 commit comments