Skip to content

Commit 14d35d5

Browse files
committed
Merge branch 'main' of https://github.com/georgian-io/LLM-Finetuning-Hub into unit-tests
2 parents ecbf57b + 79a6889 commit 14d35d5

File tree

8 files changed

+127
-150
lines changed

8 files changed

+127
-150
lines changed

CONTRIBUTING.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
## Contributing
2+
3+
If you would like to contribute to this project, we recommend following the ["fork-and-pull" Git workflow](https://www.atlassian.com/git/tutorials/comparing-workflows/forking-workflow).
4+
5+
1. **Fork** the repo on GitHub
6+
2. **Clone** the project to your own machine
7+
3. **Commit** changes to your own branch
8+
4. **Push** your work back up to your fork
9+
5. Submit a **Pull request** so that we can review your changes
10+
11+
NOTE: Be sure to merge the latest from "upstream" before making a pull request!
12+
13+
### Set Up Dev Environment
14+
15+
<details>
16+
<summary>1. Clone Repo</summary>
17+
18+
```shell
19+
git clone https://github.com/georgian-io/LLM-Finetuning-Toolkit.git
20+
cd LLM-Finetuning-Toolkit/
21+
```
22+
23+
</details>
24+
25+
<details>
26+
<summary>2. Install Dependencies</summary>
27+
<details>
28+
<summary>Install with Docker [Recommended]</summary>
29+
30+
```shell
31+
docker build -t llm-toolkit
32+
```
33+
34+
```shell
35+
# CPU
36+
docker run -it llm-toolkit
37+
# GPU
38+
docker run -it --gpus all llm-toolkit
39+
```
40+
41+
</details>
42+
43+
<details>
44+
<summary>Poetry (recommended)</summary>
45+
46+
See poetry documentation page for poetry [installation instructions](https://python-poetry.org/docs/#installation)
47+
48+
```shell
49+
poetry install
50+
```
51+
52+
</details>
53+
<details>
54+
<summary>pip</summary>
55+
We recommend using a virtual environment like `venv` or `conda` for installation
56+
57+
```shell
58+
pip install -e .
59+
```
60+
61+
</details>
62+
</details>
63+
64+
### Checklist Before Pull Request (Optional)
65+
66+
1. Use `ruff check --fix` to check and fix lint errors
67+
2. Use `ruff format` to apply formatting
68+
69+
NOTE: Ruff linting and formatting checks are done when PR is raised via Git Action. Before raising a PR, it is a good practice to check and fix lint errors, as well as apply formatting.
70+
71+
### Releasing
72+
73+
To manually release a PyPI package, please run:
74+
75+
```shell
76+
make build-release
77+
```
78+
79+
Note: Make sure you have a pypi token for this [PyPI repo](https://pypi.org/project/llm-toolkit/).

README.md

Lines changed: 15 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,17 @@
66

77
## Overview
88

9-
LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM finetuning experiments on your data and gathering their results. From one single `yaml` config file, control all elements of a typical experimentation pipeline - **prompts**, **open-source LLMs**, **optimization strategy** and **LLM testing**.
9+
LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM fine-tuning experiments on your data and gathering their results. From one single `yaml` config file, control all elements of a typical experimentation pipeline - **prompts**, **open-source LLMs**, **optimization strategy** and **LLM testing**.
1010

1111
<p align="center">
1212
<img src="https://github.com/georgian-io/LLM-Finetuning-Toolkit/blob/main/assets/overview_diagram.png?raw=true" width="900" />
1313
</p>
1414

1515
## Installation
1616

17-
### pipx (recommended)
17+
### [pipx](https://pipx.pypa.io/stable/) (recommended)
1818

19-
pipx installs the package and depdencies in a seperate virtual environment
19+
[pipx](https://pipx.pypa.io/stable/) installs the package and dependencies in a separate virtual environment
2020

2121
```shell
2222
pipx install llm-toolkit
@@ -39,8 +39,8 @@ This guide contains 3 stages that will enable you to get the most out of this to
3939
### Basic
4040

4141
```shell
42-
llmtune generate config
43-
llmtune run --config_path ./config.yml
42+
llmtune generate config
43+
llmtune run ./config.yml
4444
```
4545

4646
The first command generates a helpful starter `config.yml` file and saves in the current working directory. This is provided to users to quickly get started and as a base for further modification.
@@ -166,21 +166,21 @@ qa:
166166

167167
#### Artifact Outputs
168168

169-
This config will run finetuning and save the results under directory `./experiment/[unique_hash]`. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to exit in the middle of the training, by relaunching the script, the program will automatically load the existing dataset that has been generated under the directory, instead of doing it all over again.
169+
This config will run fine-tuning and save the results under directory `./experiment/[unique_hash]`. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to exit in the middle of the training, by relaunching the script, the program will automatically load the existing dataset that has been generated under the directory, instead of doing it all over again.
170170

171171
After the script finishes running you will see these distinct artifacts:
172172

173173
```shell
174-
/dataset # generated pkl file in hf datasets format
175-
/model # peft model weights in hf format
176-
/results # csv of prompt, ground truth, and predicted values
177-
/qa # csv of test results: e.g. vector similarity between ground truth and prediction
174+
/dataset # generated pkl file in hf datasets format
175+
/model # peft model weights in hf format
176+
/results # csv of prompt, ground truth, and predicted values
177+
/qa # csv of test results: e.g. vector similarity between ground truth and prediction
178178
```
179179

180180
Once all the changes have been incorporated in the YAML file, you can simply use it to run a custom fine-tuning experiment!
181181

182-
```python
183-
python toolkit.py --config-path <path to custom YAML file>
182+
```shell
183+
python toolkit.py --config-path <path to custom YAML file>
184184
```
185185

186186
### Advanced
@@ -236,84 +236,9 @@ lora:
236236

237237
## Extending
238238

239-
The toolkit provides a modular and extensible architecture that allows developers to customize and enhance its functionality to suit their specific needs. Each component of the toolkit, such as data ingestion, finetuning, inference, and quality assurance testing, is designed to be easily extendable.
239+
The toolkit provides a modular and extensible architecture that allows developers to customize and enhance its functionality to suit their specific needs. Each component of the toolkit, such as data ingestion, fine-tuning, inference, and quality assurance testing, is designed to be easily extendable.
240240

241241
## Contributing
242242

243-
If you would like to contribute to this project, we recommend following the "fork-and-pull" Git workflow.
244-
245-
1. **Fork** the repo on GitHub
246-
2. **Clone** the project to your own machine
247-
3. **Commit** changes to your own branch
248-
4. **Push** your work back up to your fork
249-
5. Submit a **Pull request** so that we can review your changes
250-
251-
NOTE: Be sure to merge the latest from "upstream" before making a pull request!
252-
253-
### Set Up Dev Environment
254-
255-
<details>
256-
<summary>1. Clone Repo</summary>
257-
258-
```shell
259-
git clone https://github.com/georgian-io/LLM-Finetuning-Toolkit.git
260-
cd LLM-Finetuning-Toolkit/
261-
```
262-
263-
</details>
264-
265-
<details>
266-
<summary>2. Install Dependencies</summary>
267-
<details>
268-
<summary>Install with Docker [Recommended]</summary>
269-
270-
```shell
271-
docker build -t llm-toolkit
272-
```
273-
274-
```shell
275-
# CPU
276-
docker run -it llm-toolkit
277-
# GPU
278-
docker run -it --gpus all llm-toolkit
279-
```
280-
281-
</details>
282-
283-
<details>
284-
<summary>Poetry (recommended)</summary>
285-
286-
See poetry documentation page for poetry [installation instructions](https://python-poetry.org/docs/#installation)
287-
288-
```shell
289-
poetry install
290-
```
291-
292-
</details>
293-
<details>
294-
<summary>pip</summary>
295-
We recommend using a virtual environment like `venv` or `conda` for installation
296-
297-
```shell
298-
pip install -e .
299-
```
300-
301-
</details>
302-
</details>
303-
304-
### Checklist Before Pull Request (Optional)
305-
306-
1. Use `ruff check --fix` to check and fix lint errors
307-
2. Use `ruff format` to apply formatting
308-
309-
NOTE: Ruff linting and formatting checks are done when PR is raised via Git Action. Before raising a PR, it is a good practice to check and fix lint errors, as well as apply formatting.
310-
311-
### Releasing
312-
313-
To manually release a PyPI package, please run:
314-
315-
```shell
316-
make build-release
317-
```
318-
319-
Note: Make sure you have pypi token for this [PyPI repo](https://pypi.org/project/llm-toolkit/).
243+
Open-source contributions to this toolkit are welcome and encouraged.
244+
If you would like to contribute, please see [CONTRIBUTING.md](CONTRIBUTING.md).

llmtune/cli/toolkit.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@
33
from pathlib import Path
44

55
import torch
6+
import transformers
67
import typer
78
import yaml
89
from pydantic import ValidationError
9-
from transformers import utils as hf_utils
1010
from typing_extensions import Annotated
1111

1212
import llmtune
@@ -15,13 +15,15 @@
1515
from llmtune.finetune.lora import LoRAFinetune
1616
from llmtune.inference.lora import LoRAInference
1717
from llmtune.pydantic_models.config_model import Config
18+
from llmtune.qa.generics import LLMTestSuite, QaTestRegistry
1819
from llmtune.ui.rich_ui import RichUI
1920
from llmtune.utils.ablation_utils import generate_permutations
2021
from llmtune.utils.save_utils import DirectoryHelper
2122

2223

23-
hf_utils.logging.set_verbosity_error()
24+
transformers.logging.set_verbosity(transformers.logging.CRITICAL)
2425
torch._logging.set_logs(all=logging.CRITICAL)
26+
logging.captureWarnings(True)
2527

2628

2729
app = typer.Typer()
@@ -83,15 +85,13 @@ def run_one_experiment(config: Config, config_path: Path) -> None:
8385
else:
8486
RichUI.results_found(results_path)
8587

86-
# QA -------------------------------
87-
# RichUI.before_qa()
88-
# qa_path = dir_helper.save_paths.qa
89-
# if not exists(qa_path) or not listdir(qa_path):
90-
# # TODO: Instantiate unit test classes
91-
# # TODO: Load results.csv
92-
# # TODO: Run Unit Tests
93-
# # TODO: Save Unit Test Results
94-
# pass
88+
RichUI.before_qa()
89+
qa_file_path = dir_helper.save_paths.qa_file
90+
if not qa_file_path.exists():
91+
llm_tests = config.qa.llm_tests
92+
tests = QaTestRegistry.create_tests_from_list(llm_tests)
93+
test_suite = LLMTestSuite.from_csv(results_file_path, tests)
94+
test_suite.save_test_results(qa_file_path)
9595

9696

9797
@app.command("run")
@@ -125,7 +125,7 @@ def generate_config():
125125
"""
126126
Generate an example `config.yml` file in current directory
127127
"""
128-
module_path = Path(llmtune.__file__).parent
128+
module_path = Path(llmtune.__file__)
129129
example_config_path = module_path.parent / EXAMPLE_CONFIG_FNAME
130130
destination = Path.cwd()
131131
shutil.copy(example_config_path, destination)

config.yml renamed to llmtune/config.yml

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,15 @@ data:
1717
prompt_stub:
1818
>- # Stub to add for training at the end of prompt, for test set or inference, this is omitted; make sure only one variable is present
1919
{output}
20-
test_size: 0.1 # Proportion of test as % of total; if integer then # of samples
21-
train_size: 0.9 # Proportion of train as % of total; if integer then # of samples
20+
test_size: 25 # Proportion of test as % of total; if integer then # of samples
21+
train_size: 500 # Proportion of train as % of total; if integer then # of samples
2222
train_test_split_seed: 42
2323

2424
# Model Definition -------------------
2525
model:
26-
hf_model_ckpt: "NousResearch/Llama-2-7b-hf"
26+
hf_model_ckpt: "mistralai/Mistral-7B-Instruct-v0.2"
2727
torch_dtype: "bfloat16"
28-
attn_implementation: "flash_attention_2"
28+
#attn_implementation: "flash_attention_2"
2929
quantize: true
3030
bitsandbytes:
3131
load_in_4bit: true
@@ -36,6 +36,7 @@ model:
3636
lora:
3737
task_type: "CAUSAL_LM"
3838
r: 32
39+
lora_alpha: 64
3940
lora_dropout: 0.1
4041
target_modules:
4142
- q_proj
@@ -49,12 +50,12 @@ lora:
4950
# Training -------------------
5051
training:
5152
training_args:
52-
num_train_epochs: 5
53+
num_train_epochs: 1
5354
per_device_train_batch_size: 4
5455
gradient_accumulation_steps: 4
5556
gradient_checkpointing: True
5657
optim: "paged_adamw_32bit"
57-
logging_steps: 100
58+
logging_steps: 1
5859
learning_rate: 2.0e-4
5960
bf16: true # Set to true for mixed precision training on Newer GPUs
6061
tf32: true
@@ -67,7 +68,7 @@ training:
6768
# neftune_noise_alpha: None
6869

6970
inference:
70-
max_new_tokens: 1024
71+
max_new_tokens: 256
7172
use_cache: True
7273
do_sample: True
7374
top_p: 0.9

0 commit comments

Comments
 (0)