georgian-io
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 79 additions & 0 deletions b/‎CONTRIBUTING.md‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 15 additions & 90 deletions b/‎README.md‎
Lines changed: 15 additions & 90 deletions
diff --git a/‎llmtune/cli/toolkit.py‎
Lines changed: 12 additions & 12 deletions b/‎llmtune/cli/toolkit.py‎
Lines changed: 12 additions & 12 deletions
diff --git a/‎config.yml‎ renamed to ‎llmtune/config.yml‎
Lines changed: 8 additions & 7 deletions b/‎config.yml‎ renamed to ‎llmtune/config.yml‎
Lines changed: 8 additions & 7 deletions
@@ -0,0 +1,79 @@
+## Contributing
+
+If you would like to contribute to this project, we recommend following the ["fork-and-pull" Git workflow](https://www.atlassian.com/git/tutorials/comparing-workflows/forking-workflow).
+
+1.  **Fork** the repo on GitHub
+2.  **Clone** the project to your own machine
+3.  **Commit** changes to your own branch
+4.  **Push** your work back up to your fork
+5.  Submit a **Pull request** so that we can review your changes
+
+NOTE: Be sure to merge the latest from "upstream" before making a pull request!
+
+### Set Up Dev Environment
+
+<details>
+<summary>1. Clone Repo</summary>
+
+```shell
+git clone https://github.com/georgian-io/LLM-Finetuning-Toolkit.git
+cd LLM-Finetuning-Toolkit/
+```
+
+</details>
+
+<details>
+<summary>2. Install Dependencies</summary>
+<details>
+<summary>Install with Docker [Recommended]</summary>
+
+```shell
+docker build -t llm-toolkit
+```
+
+```shell
+# CPU
+docker run -it llm-toolkit
+# GPU
+docker run -it --gpus all llm-toolkit
+```
+
+</details>
+
+<details>
+<summary>Poetry (recommended)</summary>
+
+See poetry documentation page for poetry [installation instructions](https://python-poetry.org/docs/#installation)
+
+```shell
+poetry install
+```
+
+</details>
+<details>
+<summary>pip</summary>
+We recommend using a virtual environment like `venv` or `conda` for installation
+
+```shell
+pip install -e .
+```
+
+</details>
+</details>
+
+### Checklist Before Pull Request (Optional)
+
+1. Use `ruff check --fix` to check and fix lint errors
+2. Use `ruff format` to apply formatting
+
+NOTE: Ruff linting and formatting checks are done when PR is raised via Git Action. Before raising a PR, it is a good practice to check and fix lint errors, as well as apply formatting.
+
+### Releasing
+
+To manually release a PyPI package, please run:
+
+```shell
+make build-release
+```
+
+Note: Make sure you have a pypi token for this [PyPI repo](https://pypi.org/project/llm-toolkit/).
@@ -6,17 +6,17 @@
 
 ## Overview
 
-LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM finetuning experiments on your data and gathering their results. From one single `yaml` config file, control all elements of a typical experimentation pipeline - **prompts**, **open-source LLMs**, **optimization strategy** and **LLM testing**.
+LLM Finetuning toolkit is a config-based CLI tool for launching a series of LLM fine-tuning experiments on your data and gathering their results. From one single `yaml` config file, control all elements of a typical experimentation pipeline - **prompts**, **open-source LLMs**, **optimization strategy** and **LLM testing**.
 
 <p align="center">
 <img src="https://github.com/georgian-io/LLM-Finetuning-Toolkit/blob/main/assets/overview_diagram.png?raw=true" width="900" />
 </p>
 
 ## Installation
 
-### pipx (recommended)
+### [pipx](https://pipx.pypa.io/stable/) (recommended)
 
-pipx installs the package and depdencies in a seperate virtual environment
+[pipx](https://pipx.pypa.io/stable/) installs the package and dependencies in a separate virtual environment
 
 ```shell
 pipx install llm-toolkit
@@ -39,8 +39,8 @@ This guide contains 3 stages that will enable you to get the most out of this to
 ### Basic
 
 ```shell
-   llmtune generate config
-   llmtune run --config_path ./config.yml
+llmtune generate config
+llmtune run ./config.yml
 ```
 
 The first command generates a helpful starter `config.yml` file and saves in the current working directory. This is provided to users to quickly get started and as a base for further modification.
@@ -166,21 +166,21 @@ qa:
 
 #### Artifact Outputs
 
-This config will run finetuning and save the results under directory `./experiment/[unique_hash]`. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to exit in the middle of the training, by relaunching the script, the program will automatically load the existing dataset that has been generated under the directory, instead of doing it all over again.
+This config will run fine-tuning and save the results under directory `./experiment/[unique_hash]`. Each unique configuration will generate a unique hash, so that our tool can automatically pick up where it left off. For example, if you need to exit in the middle of the training, by relaunching the script, the program will automatically load the existing dataset that has been generated under the directory, instead of doing it all over again.
 
 After the script finishes running you will see these distinct artifacts:
 
 ```shell
-  /dataset # generated pkl file in hf datasets format
-  /model # peft model weights in hf format
-  /results # csv of prompt, ground truth, and predicted values
-  /qa # csv of test results: e.g. vector similarity between ground truth and prediction
+/dataset # generated pkl file in hf datasets format
+/model # peft model weights in hf format
+/results # csv of prompt, ground truth, and predicted values
+/qa # csv of test results: e.g. vector similarity between ground truth and prediction
 ```
 
 Once all the changes have been incorporated in the YAML file, you can simply use it to run a custom fine-tuning experiment!
 
-```python
-   python toolkit.py --config-path <path to custom YAML file>
+```shell
+python toolkit.py --config-path <path to custom YAML file>
 ```
 
 ### Advanced
@@ -236,84 +236,9 @@ lora:
 
 ## Extending
 
-The toolkit provides a modular and extensible architecture that allows developers to customize and enhance its functionality to suit their specific needs. Each component of the toolkit, such as data ingestion, finetuning, inference, and quality assurance testing, is designed to be easily extendable.
+The toolkit provides a modular and extensible architecture that allows developers to customize and enhance its functionality to suit their specific needs. Each component of the toolkit, such as data ingestion, fine-tuning, inference, and quality assurance testing, is designed to be easily extendable.
 
 ## Contributing
 
-If you would like to contribute to this project, we recommend following the "fork-and-pull" Git workflow.
-
-1.  **Fork** the repo on GitHub
-2.  **Clone** the project to your own machine
-3.  **Commit** changes to your own branch
-4.  **Push** your work back up to your fork
-5.  Submit a **Pull request** so that we can review your changes
-
-NOTE: Be sure to merge the latest from "upstream" before making a pull request!
-
-### Set Up Dev Environment
-
-<details>
-<summary>1. Clone Repo</summary>
-  
-```shell
-   git clone https://github.com/georgian-io/LLM-Finetuning-Toolkit.git
-   cd LLM-Finetuning-Toolkit/
-```
-
-</details>
-
-<details>
-<summary>2. Install Dependencies</summary>
-<details>
-<summary>Install with Docker [Recommended]</summary>
-
-```shell
-   docker build -t llm-toolkit
-```
-
-```shell
-   # CPU
-   docker run -it llm-toolkit
-   # GPU
-   docker run -it --gpus all llm-toolkit
-```
-
-</details>
-
-<details>
-<summary>Poetry (recommended)</summary>
-
-See poetry documentation page for poetry [installation instructions](https://python-poetry.org/docs/#installation)
-
-```shell
-   poetry install
-```
-
-</details>
-<details>
-<summary>pip</summary>
-We recommend using a virtual environment like `venv` or `conda` for installation
-
-```shell
-   pip install -e .
-```
-
-</details>
-</details>
-
-### Checklist Before Pull Request (Optional)
-
-1. Use `ruff check --fix` to check and fix lint errors
-2. Use `ruff format` to apply formatting
-
-NOTE: Ruff linting and formatting checks are done when PR is raised via Git Action. Before raising a PR, it is a good practice to check and fix lint errors, as well as apply formatting.
-
-### Releasing
-
-To manually release a PyPI package, please run:
-
-```shell
-   make build-release
-```
-
-Note: Make sure you have pypi token for this [PyPI repo](https://pypi.org/project/llm-toolkit/).
+Open-source contributions to this toolkit are welcome and encouraged.
+If you would like to contribute, please see [CONTRIBUTING.md](CONTRIBUTING.md).
@@ -3,10 +3,10 @@
 from pathlib import Path
 
 import torch
+import transformers
 import typer
 import yaml
 from pydantic import ValidationError
-from transformers import utils as hf_utils
 from typing_extensions import Annotated
 
 import llmtune
@@ -15,13 +15,15 @@
 from llmtune.finetune.lora import LoRAFinetune
 from llmtune.inference.lora import LoRAInference
 from llmtune.pydantic_models.config_model import Config
+from llmtune.qa.generics import LLMTestSuite, QaTestRegistry
 from llmtune.ui.rich_ui import RichUI
 from llmtune.utils.ablation_utils import generate_permutations
 from llmtune.utils.save_utils import DirectoryHelper
 
 
-hf_utils.logging.set_verbosity_error()
+transformers.logging.set_verbosity(transformers.logging.CRITICAL)
 torch._logging.set_logs(all=logging.CRITICAL)
+logging.captureWarnings(True)
 
 
 app = typer.Typer()
@@ -83,15 +85,13 @@ def run_one_experiment(config: Config, config_path: Path) -> None:
     else:
         RichUI.results_found(results_path)
 
-    # QA -------------------------------
-    # RichUI.before_qa()
-    # qa_path = dir_helper.save_paths.qa
-    # if not exists(qa_path) or not listdir(qa_path):
-    #     # TODO: Instantiate unit test classes
-    #     # TODO: Load results.csv
-    #     # TODO: Run Unit Tests
-    #     # TODO: Save Unit Test Results
-    #     pass
+    RichUI.before_qa()
+    qa_file_path = dir_helper.save_paths.qa_file
+    if not qa_file_path.exists():
+        llm_tests = config.qa.llm_tests
+        tests = QaTestRegistry.create_tests_from_list(llm_tests)
+        test_suite = LLMTestSuite.from_csv(results_file_path, tests)
+        test_suite.save_test_results(qa_file_path)
 
 
 @app.command("run")
@@ -125,7 +125,7 @@ def generate_config():
     """
     Generate an example `config.yml` file in current directory
     """
-    module_path = Path(llmtune.__file__).parent
+    module_path = Path(llmtune.__file__)
     example_config_path = module_path.parent / EXAMPLE_CONFIG_FNAME
     destination = Path.cwd()
     shutil.copy(example_config_path, destination)
 
@@ -17,15 +17,15 @@ data:
   prompt_stub:
     >- # Stub to add for training at the end of prompt, for test set or inference, this is omitted; make sure only one variable is present
     {output}
-  test_size: 0.1 # Proportion of test as % of total; if integer then # of samples
-  train_size: 0.9 # Proportion of train as % of total; if integer then # of samples
+  test_size: 25 # Proportion of test as % of total; if integer then # of samples
+  train_size: 500 # Proportion of train as % of total; if integer then # of samples
   train_test_split_seed: 42
 
 # Model Definition -------------------
 model:
-  hf_model_ckpt: "NousResearch/Llama-2-7b-hf"
+  hf_model_ckpt: "mistralai/Mistral-7B-Instruct-v0.2"
   torch_dtype: "bfloat16"
-  attn_implementation: "flash_attention_2"
+  #attn_implementation: "flash_attention_2"
   quantize: true
   bitsandbytes:
     load_in_4bit: true
@@ -36,6 +36,7 @@ model:
 lora:
   task_type: "CAUSAL_LM"
   r: 32
+  lora_alpha: 64
   lora_dropout: 0.1
   target_modules:
     - q_proj
@@ -49,12 +50,12 @@ lora:
 # Training -------------------
 training:
   training_args:
-    num_train_epochs: 5
+    num_train_epochs: 1
     per_device_train_batch_size: 4
     gradient_accumulation_steps: 4
     gradient_checkpointing: True
     optim: "paged_adamw_32bit"
-    logging_steps: 100
+    logging_steps: 1
     learning_rate: 2.0e-4
     bf16: true # Set to true for mixed precision training on Newer GPUs
     tf32: true
@@ -67,7 +68,7 @@ training:
     # neftune_noise_alpha: None
 
 inference:
-  max_new_tokens: 1024
+  max_new_tokens: 256
   use_cache: True
   do_sample: True
   top_p: 0.9