Skip to content

Commit 311b422

Browse files
authored
add guide on finetuning any text dataset (#344)
1 parent 713a0b1 commit 311b422

File tree

5 files changed

+26
-0
lines changed

5 files changed

+26
-0
lines changed

howto/finetune_adapter.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ The steps here only need to be done once:
2525

2626
or [prepare your own dataset](#tune-on-your-dataset).
2727

28+
See also: [Finetuning on an unstructured dataset](unstructured_dataset.md)
29+
2830
## Running the finetuning
2931

3032
```bash

howto/finetune_adapter_v2.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ The steps here only need to be done once:
3030

3131
or [prepare your own dataset](#tune-on-your-dataset).
3232

33+
See also: [Finetuning on an unstructured dataset](unstructured_dataset.md)
34+
3335
## Running the finetuning
3436

3537
```bash

howto/finetune_full.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ The steps here only need to be done once:
2222

2323
or [prepare your own dataset](#tune-on-your-own-dataset).
2424

25+
See also: [Finetuning on an unstructured dataset](unstructured_dataset.md)
26+
2527
## Running the finetuning
2628

2729
```bash

howto/finetune_lora.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ The steps here only need to be done once:
1515
python scripts/prepare_alpaca.py
1616
```
1717

18+
See also: [Finetuning on an unstructured dataset](unstructured_dataset.md)
19+
1820
## Running the finetuning
1921

2022
```bash

howto/unstructured_dataset.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Finetuning on an unstructured dataset
2+
3+
While most scripts were made to finetune on instruction datasets, it is possible to finetune on any dataset. This is useful for experimentation while not being as expensive as training a full model.
4+
5+
This guide is only to prepare the finetuning, as either LoRA or Adapter-v1 methods support this dataset type!
6+
7+
## Preparation
8+
9+
1. Gather your text into an input file named `input.txt`
10+
2. Divide the data into training and validation sets using the following script:
11+
12+
```bash
13+
python scripts/prepare_any_text.py
14+
```
15+
16+
3. Modify relevant scripts for your finetuning method under `finetune/` and `evaluate/`, setting the `instruction_tuning` variable to `False`
17+
18+
And then you're set! Proceed to run the [LoRA guide](./finetune_lora.md) or [Adapter v1 guide](./finetune_adapter.md).

0 commit comments

Comments
 (0)