add guide on finetuning any text dataset (#344)

lun-4 · web-flow · commit 311b4221059f · 2023-06-02T11:24:45.000+02:00
diff --git a/howto/finetune_adapter.md b/howto/finetune_adapter.md
@@ -25,6 +25,8 @@ The steps here only need to be done once:
 
    or [prepare your own dataset](#tune-on-your-dataset).
 
+See also: [Finetuning on an unstructured dataset](unstructured_dataset.md)
+
 ## Running the finetuning
 
 ```bash
diff --git a/howto/finetune_adapter_v2.md b/howto/finetune_adapter_v2.md
@@ -30,6 +30,8 @@ The steps here only need to be done once:
 
    or [prepare your own dataset](#tune-on-your-dataset).
 
+See also: [Finetuning on an unstructured dataset](unstructured_dataset.md)
+
 ## Running the finetuning
 
 ```bash
diff --git a/howto/finetune_full.md b/howto/finetune_full.md
@@ -22,6 +22,8 @@ The steps here only need to be done once:
 
    or [prepare your own dataset](#tune-on-your-own-dataset).
 
+See also: [Finetuning on an unstructured dataset](unstructured_dataset.md)
+
 ## Running the finetuning
 
 ```bash
diff --git a/howto/finetune_lora.md b/howto/finetune_lora.md
@@ -15,6 +15,8 @@ The steps here only need to be done once:
    python scripts/prepare_alpaca.py
    ```
 
+See also: [Finetuning on an unstructured dataset](unstructured_dataset.md)
+
 ## Running the finetuning
 
 ```bash
diff --git a/howto/unstructured_dataset.md b/howto/unstructured_dataset.md
@@ -0,0 +1,18 @@
+# Finetuning on an unstructured dataset
+
+While most scripts were made to finetune on instruction datasets, it is possible to finetune on any dataset. This is useful for experimentation while not being as expensive as training a full model.
+
+This guide is only to prepare the finetuning, as either LoRA or Adapter-v1 methods support this dataset type!
+
+## Preparation
+
+1. Gather your text into an input file named `input.txt`
+2. Divide the data into training and validation sets using the following script:
+
+    ```bash
+    python scripts/prepare_any_text.py
+    ```
+
+3. Modify relevant scripts for your finetuning method under `finetune/` and `evaluate/`, setting the `instruction_tuning` variable to `False`
+
+And then you're set! Proceed to run the [LoRA guide](./finetune_lora.md) or [Adapter v1 guide](./finetune_adapter.md).