Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions gamesense/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,16 @@ GameSense leverages Parameter-Efficient Fine-Tuning (PEFT) techniques to customi
- Python 3.8+
- GPU with at least 24GB VRAM (for full model training)
- ZenML installed and configured
- Neptune.ai account for experiment tracking

### Environment Setup

1. Set up your Neptune.ai credentials:
```bash
# Set your Neptune project name and API token as environment variables
export NEPTUNE_PROJECT="your-neptune-workspace/your-project-name"
export NEPTUNE_API_TOKEN="your-neptune-api-token"
```

### Quick Setup

Expand Down Expand Up @@ -206,3 +216,44 @@ For custom data sources, you'll need to prepare the splits in a Hugging Face dat
## 📚 Documentation

For learning more about how to use ZenML to build your own MLOps pipelines, refer to our comprehensive [ZenML documentation](https://docs.zenml.io/).

## Running on CPU-only Environment

If you don't have access to a GPU, you can still run this project with the CPU-only configuration. We've made several optimizations to make this project work on CPU, including:

- Smaller batch sizes for reduced memory footprint
- Fewer training steps
- Disabled GPU-specific features (quantization, bf16, etc.)
- Using smaller test datasets for evaluation
- Special handling for Phi-3.5 model caching issues on CPU

To run the project on CPU:

```bash
python run.py --config phi3.5_finetune_cpu.yaml
```

Note that training on CPU will be significantly slower than training on a GPU. The CPU configuration uses:

1. A smaller model (Phi-3.5-mini-instruct) which is more CPU-friendly
2. Reduced batch size and increased gradient accumulation steps
3. Fewer total training steps (50 instead of 300)
4. Half-precision (float16) where possible to reduce memory usage
5. Smaller dataset subsets (100 training samples, 20 validation samples, 10 test samples)
6. Special compatibility settings for Phi models running on CPU

For best results, we recommend:
- Using a machine with at least 16GB of RAM
- Being patient! LLM training on CPU is much slower than on GPU
- If you still encounter memory issues, try reducing the max_train_samples parameter even further in the config file

### Known Issues and Workarounds

Some large language models like Phi-3.5 have caching mechanisms that are optimized for GPU usage and may encounter issues when running on CPU. Our CPU configuration includes several workarounds:

1. Disabling KV caching for model generation
2. Using torch.float16 data type to reduce memory usage
3. Disabling flash attention which isn't needed on CPU
4. Using standard AdamW optimizer instead of 8-bit optimizers that require GPU

These changes allow the model to run on CPU with less memory and avoid compatibility issues, although at the cost of some performance.
85 changes: 85 additions & 0 deletions gamesense/configs/phi3.5_finetune_cpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Apache Software License 2.0
#
# Copyright (c) ZenML GmbH 2024. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

model:
name: llm-peft-phi-3.5-mini-instruct-cpu
description: "Fine-tune Phi-3.5-mini-instruct on CPU."
tags:
- llm
- peft
- phi-3.5
- cpu
version: 100_steps

settings:
docker:
parent_image: pytorch/pytorch:2.2.2-runtime
requirements: requirements.txt
python_package_installer: uv
python_package_installer_args:
system: null
apt_packages:
- git
environment:
MKL_SERVICE_FORCE_INTEL: "1"
# Explicitly disable MPS
PYTORCH_ENABLE_MPS_FALLBACK: "0"
PYTORCH_MPS_HIGH_WATERMARK_RATIO: "0.0"

parameters:
# Uses a smaller model for CPU training
base_model_id: microsoft/Phi-3.5-mini-instruct
use_fast: False
load_in_4bit: False
load_in_8bit: False
cpu_only: True # Enable CPU-only mode
# Extra conservative dataset size for CPU
max_train_samples: 50
max_val_samples: 10
max_test_samples: 5
system_prompt: |
Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values.
This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute'].
The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier']


steps:
prepare_data:
parameters:
dataset_name: gem/viggo
# These settings are now defined at the pipeline level
# max_train_samples: 100
# max_val_samples: 20
# max_test_samples: 10

finetune:
parameters:
max_steps: 25 # Further reduced steps for CPU training
eval_steps: 5 # More frequent evaluation
bf16: False # Disable bf16 for CPU compatibility
per_device_train_batch_size: 1 # Smallest batch size for CPU
gradient_accumulation_steps: 2 # Reduced for CPU
optimizer: "adamw_torch" # Use standard AdamW rather than 8-bit for CPU
logging_steps: 2 # More frequent logging
save_steps: 25 # Save less frequently
save_total_limit: 1 # Keep only the best model
evaluation_strategy: "steps"

promote:
parameters:
metric: rouge2
target_stage: staging
43 changes: 35 additions & 8 deletions gamesense/pipelines/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ def llm_peft_full_finetune(
use_fast: bool = True,
load_in_8bit: bool = False,
load_in_4bit: bool = False,
cpu_only: bool = False,
max_train_samples: int = None,
max_val_samples: int = None,
max_test_samples: int = None,
):
"""Pipeline for finetuning an LLM with peft.

Expand All @@ -42,20 +46,39 @@ def llm_peft_full_finetune(
- finetune: finetune the model
- evaluate_model: evaluate the base and finetuned model
- promote: promote the model to the target stage, if evaluation was successful

Args:
system_prompt: The system prompt to use.
base_model_id: The base model id to use.
use_fast: Whether to use the fast tokenizer.
load_in_8bit: Whether to load in 8-bit precision (requires GPU).
load_in_4bit: Whether to load in 4-bit precision (requires GPU).
cpu_only: Whether to force using CPU only and disable quantization.
max_train_samples: Maximum number of training samples to use (for CPU or testing).
max_val_samples: Maximum number of validation samples to use (for CPU or testing).
max_test_samples: Maximum number of test samples to use (for CPU or testing).
"""
if not load_in_8bit and not load_in_4bit:
raise ValueError(
"At least one of `load_in_8bit` and `load_in_4bit` must be True."
)
if load_in_4bit and load_in_8bit:
raise ValueError(
"Only one of `load_in_8bit` and `load_in_4bit` can be True."
)
if not cpu_only:
if not load_in_8bit and not load_in_4bit:
raise ValueError(
"At least one of `load_in_8bit` and `load_in_4bit` must be True when not in CPU-only mode."
)
if load_in_4bit and load_in_8bit:
raise ValueError(
"Only one of `load_in_8bit` and `load_in_4bit` can be True."
)

if cpu_only:
load_in_8bit = False
load_in_4bit = False

datasets_dir = prepare_data(
base_model_id=base_model_id,
system_prompt=system_prompt,
use_fast=use_fast,
max_train_samples=max_train_samples,
max_val_samples=max_val_samples,
max_test_samples=max_test_samples,
)

evaluate_model(
Expand All @@ -66,6 +89,7 @@ def llm_peft_full_finetune(
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
cpu_only=cpu_only,
id="evaluate_base",
)
log_metadata_from_step_artifact(
Expand All @@ -82,6 +106,8 @@ def llm_peft_full_finetune(
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
use_accelerate=False,
cpu_only=cpu_only,
bf16=not cpu_only,
)

evaluate_model(
Expand All @@ -92,6 +118,7 @@ def llm_peft_full_finetune(
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
cpu_only=cpu_only,
id="evaluate_finetuned",
)
log_metadata_from_step_artifact(
Expand Down
16 changes: 16 additions & 0 deletions gamesense/pipelines/train_accelerated.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ def llm_peft_full_finetune(
use_fast: bool = True,
load_in_8bit: bool = False,
load_in_4bit: bool = False,
max_train_samples: int = None,
max_val_samples: int = None,
max_test_samples: int = None,
):
"""Pipeline for finetuning an LLM with peft.

Expand All @@ -43,6 +46,16 @@ def llm_peft_full_finetune(
- finetune: finetune the model
- evaluate_model: evaluate the base and finetuned model
- promote: promote the model to the target stage, if evaluation was successful

Args:
system_prompt: The system prompt to use.
base_model_id: The base model id to use.
use_fast: Whether to use the fast tokenizer.
load_in_8bit: Whether to load in 8-bit precision (requires GPU).
load_in_4bit: Whether to load in 4-bit precision (requires GPU).
max_train_samples: Maximum number of training samples to use (for CPU or testing).
max_val_samples: Maximum number of validation samples to use (for CPU or testing).
max_test_samples: Maximum number of test samples to use (for CPU or testing).
"""
if not load_in_8bit and not load_in_4bit:
raise ValueError(
Expand All @@ -57,6 +70,9 @@ def llm_peft_full_finetune(
base_model_id=base_model_id,
system_prompt=system_prompt,
use_fast=use_fast,
max_train_samples=max_train_samples,
max_val_samples=max_val_samples,
max_test_samples=max_test_samples,
)

evaluate_model(
Expand Down
14 changes: 13 additions & 1 deletion gamesense/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,19 @@ def main(
if not config:
raise RuntimeError("Config file is required to run a pipeline.")

pipeline_args["config_path"] = os.path.join(config_folder, config)
config_path = os.path.join(config_folder, config)
pipeline_args["config_path"] = config_path

# Display a message if using CPU configuration
if "cpu" in config:
print("\n" + "="*80)
print("RUNNING IN CPU-ONLY MODE")
print("This will use a CPU-optimized configuration with:")
print("- Smaller batch sizes")
print("- Fewer training steps")
print("- Disabled GPU-specific features (quantization, bf16, etc)")
print("Note: Training will be much slower but should require less memory")
print("="*80 + "\n")

if accelerate:
from pipelines.train_accelerated import llm_peft_full_finetune
Expand Down
Loading
Loading