Skip to content

Commit 1aa3759

Browse files
wjayeshstrickvl
andauthored
Make the gamesense project run on CPU (#184)
* fix syntax * make the project run on CPU * fix logging logic * update readme and add config file * update readme to reflect what the project does * Update README.md * Apply suggestions from code review Co-authored-by: Alex Strick van Linschoten <[email protected]> * neptune optional readme --------- Co-authored-by: Alex Strick van Linschoten <[email protected]>
1 parent 04696a5 commit 1aa3759

File tree

11 files changed

+605
-71
lines changed

11 files changed

+605
-71
lines changed

gamesense/README.md

Lines changed: 139 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,78 @@
1-
# 🎮 GameSense: The LLM That Understands Gamers
1+
# 🎮 GameSense: An LLM That Transforms Gaming Conversations into Structured Data
22

3-
Elevate your gaming platform with an AI that translates player language into actionable data. A model that understands gaming terminology, extracts key attributes, and structures conversations for intelligent recommendations and support.
3+
GameSense is a specialized language model that converts unstructured gaming conversations into structured, actionable data. It listens to how gamers talk and extracts valuable information that can power recommendations, support systems, and analytics.
44

5-
## 🚀 Product Overview
5+
## 🎯 What GameSense Does
66

7-
GameSense is a specialized language model designed specifically for gaming platforms and communities. By fine-tuning powerful open-source LLMs on gaming conversations and terminology, GameSense can:
7+
**Input**: Gamers' natural language about games from forums, chats, reviews, etc.
88

9-
- **Understand Gaming Jargon**: Recognize specialized terms across different game genres and communities
10-
- **Extract Player Sentiment**: Identify frustrations, excitement, and other emotions in player communications
11-
- **Structure Unstructured Data**: Transform casual player conversations into structured, actionable data
12-
- **Generate Personalized Responses**: Create contextually appropriate replies that resonate with gamers
13-
- **Power Intelligent Recommendations**: Suggest games, content, or solutions based on player preferences and history
9+
**Output**: Structured data with categorized information about games, platforms, preferences, etc.
1410

15-
Built on ZenML's enterprise-grade MLOps framework, GameSense delivers a production-ready solution that can be deployed, monitored, and continuously improved with minimal engineering overhead.
11+
Here's a concrete example from our training data:
1612

17-
## 💡 How It Works
13+
### Input Example (Gaming Conversation)
14+
```
15+
"Dirt: Showdown from 2012 is a sport racing game for the PlayStation, Xbox, PC rated E 10+ (for Everyone 10 and Older). It's not available on Steam, Linux, or Mac."
16+
```
17+
18+
### Output Example (Structured Information)
19+
```
20+
inform(
21+
name[Dirt: Showdown],
22+
release_year[2012],
23+
esrb[E 10+ (for Everyone 10 and Older)],
24+
genres[driving/racing, sport],
25+
platforms[PlayStation, Xbox, PC],
26+
available_on_steam[no],
27+
has_linux_release[no],
28+
has_mac_release[no]
29+
)
30+
```
31+
32+
This structured output can be used to:
33+
- Answer specific questions about games ("Is Dirt: Showdown available on Mac?")
34+
- Track trends in gaming discussions
35+
- Power recommendation engines
36+
- Extract user opinions and sentiment
37+
- Build gaming knowledge graphs
38+
- Enhance customer support
39+
40+
## 🚀 How GameSense Transforms Gaming Conversations
41+
42+
GameSense listens to gaming chats, forum posts, customer support tickets, social media, and other sources where gamers communicate. As gamers discuss different titles, features, opinions, and issues, GameSense:
43+
44+
1. **Recognizes gaming jargon** across different genres and communities
45+
2. **Extracts key information** about games, platforms, features, and opinions
46+
3. **Structures this information** into a standardized format
47+
4. **Makes it available** for downstream applications
48+
49+
## 💡 Real-World Applications
1850

19-
GameSense leverages Parameter-Efficient Fine-Tuning (PEFT) techniques to customize powerful foundation models like Microsoft's Phi-2 or Llama 3.1 for gaming-specific applications. The system follows a streamlined pipeline:
51+
### Community Analysis
52+
Monitor conversations across Discord, Reddit, and other platforms to track what games are being discussed, what features players care about, and emerging trends.
2053

21-
1. **Data Preparation**: Gaming conversations are processed and tokenized
22-
2. **Model Fine-Tuning**: The base model is efficiently customized using LoRA adapters
23-
3. **Evaluation**: The model is rigorously tested against gaming-specific benchmarks
24-
4. **Deployment**: High-performing models are automatically promoted to production
54+
### Intelligent Customer Support
55+
When a player says: "I can't get Dirt: Showdown to run on my Mac," GameSense identifies:
56+
- The specific game (Dirt: Showdown)
57+
- The platform issue (Mac)
58+
- The fact that the game doesn't support Mac (from structured knowledge)
59+
- Can immediately inform the player about platform incompatibility
60+
61+
### Smart Recommendations
62+
When a player has been discussing racing games for PlayStation with family-friendly ratings, GameSense can help power recommendations for similar titles they might enjoy.
63+
64+
### Automated Content Moderation
65+
By understanding the context of gaming conversations, GameSense can better identify toxic behavior while recognizing harmless gaming slang.
66+
67+
## 🧠 Technical Approach
68+
69+
GameSense uses Parameter-Efficient Fine-Tuning (PEFT) to customize powerful foundation models for understanding gaming language:
70+
71+
1. We start with a base model like Microsoft's Phi-2 or Llama 3.1
72+
2. Fine-tune on the gem/viggo dataset containing structured gaming conversations
73+
3. Use LoRA adapters for efficient training
74+
4. Evaluate on gaming-specific benchmarks
75+
5. Deploy to production environments
2576

2677
<div align="center">
2778
<br/>
@@ -46,6 +97,16 @@ GameSense leverages Parameter-Efficient Fine-Tuning (PEFT) techniques to customi
4697
- Python 3.8+
4798
- GPU with at least 24GB VRAM (for full model training)
4899
- ZenML installed and configured
100+
- Neptune.ai account for experiment tracking (optional)
101+
102+
### Environment Setup
103+
104+
1. Set up your Neptune.ai credentials if you want to use Neptune for experiment tracking:
105+
```bash
106+
# Set your Neptune project name and API token as environment variables
107+
export NEPTUNE_PROJECT="your-neptune-workspace/your-project-name"
108+
export NEPTUNE_API_TOKEN="your-neptune-api-token"
109+
```
49110

50111
### Quick Setup
51112

@@ -95,6 +156,17 @@ python run.py --config configs/llama3-1_finetune_local.yaml
95156
> - For remote finetuning: [`llama3-1_finetune_remote.yaml`](configs/llama3-1_finetune_remote.yaml)
96157
> - For local finetuning: [`llama3-1_finetune_local.yaml`](configs/llama3-1_finetune_local.yaml)
97158
159+
### Dataset Configuration
160+
161+
By default, GameSense uses the gem/viggo dataset, which contains structured gaming information like:
162+
163+
| gem_id | meaning_representation | target | references |
164+
|--------|------------------------|--------|------------|
165+
| viggo-train-0 | inform(name[Dirt: Showdown], release_year[2012], esrb[E 10+ (for Everyone 10 and Older)], genres[driving/racing, sport], platforms[PlayStation, Xbox, PC], available_on_steam[no], has_linux_release[no], has_mac_release[no]) | Dirt: Showdown from 2012 is a sport racing game for the PlayStation, Xbox, PC rated E 10+ (for Everyone 10 and Older). It's not available on Steam, Linux, or Mac. | [Dirt: Showdown from 2012 is a sport racing game for the PlayStation, Xbox, PC rated E 10+ (for Everyone 10 and Older). It's not available on Steam, Linux, or Mac.] |
166+
| viggo-train-1 | inform(name[Dirt: Showdown], release_year[2012], esrb[E 10+...]) | Dirt: Showdown is a sport racing game... | [Dirt: Showdown is a sport racing game...] |
167+
168+
You can also train on your own gaming conversations by formatting them in a similar structure and updating the configuration.
169+
98170
### Training Acceleration
99171

100172
For faster training on high-end hardware:
@@ -148,7 +220,7 @@ For detailed instructions on data preparation, see our [data customization guide
148220

149221
GameSense includes built-in evaluation using industry-standard metrics:
150222

151-
- **ROUGE Scores**: Measure response quality and relevance
223+
- **ROUGE Scores**: Measure how well the model can generate natural language from structured data
152224
- **Gaming-Specific Benchmarks**: Evaluate understanding of gaming terminology
153225
- **Automatic Model Promotion**: Only deploy models that meet quality thresholds
154226

@@ -192,7 +264,7 @@ GameSense follows a modular architecture for easy customization:
192264

193265
To fine-tune GameSense on your specific gaming platform's data:
194266

195-
1. **Format your dataset**: Prepare your gaming conversations in a structured format
267+
1. **Format your dataset**: Prepare your gaming conversations in a structured format similar to gem/viggo
196268
2. **Update the configuration**: Point to your dataset in the config file
197269
3. **Run the pipeline**: GameSense will automatically process and learn from your data
198270

@@ -203,6 +275,55 @@ The [`prepare_data` step](steps/prepare_datasets.py) handles:
203275

204276
For custom data sources, you'll need to prepare the splits in a Hugging Face dataset format. The step returns paths to the stored datasets (`train`, `val`, and `test_raw` splits), with the test set tokenized later during evaluation.
205277

278+
You can structure conversations from:
279+
- Game forums
280+
- Support tickets
281+
- Discord chats
282+
- Streaming chats
283+
- Reviews
284+
- Social media posts
285+
206286
## 📚 Documentation
207287

208288
For learning more about how to use ZenML to build your own MLOps pipelines, refer to our comprehensive [ZenML documentation](https://docs.zenml.io/).
289+
290+
## Running on CPU-only Environment
291+
292+
If you don't have access to a GPU, you can still run this project with the CPU-only configuration. We've made several optimizations to make this project work on CPU, including:
293+
294+
- Smaller batch sizes for reduced memory footprint
295+
- Fewer training steps
296+
- Disabled GPU-specific features (quantization, bf16, etc.)
297+
- Using smaller test datasets for evaluation
298+
- Special handling for Phi-3.5 model caching issues on CPU
299+
300+
To run the project on CPU:
301+
302+
```bash
303+
python run.py --config phi3.5_finetune_cpu.yaml
304+
```
305+
306+
Note that training on CPU will be significantly slower than training on a GPU. The CPU configuration uses:
307+
308+
1. A smaller model (`phi-3.5-mini-instruct`) which is more CPU-friendly
309+
2. Reduced batch size and increased gradient accumulation steps
310+
3. Fewer total training steps (50 instead of 300)
311+
4. Half-precision (float16) where possible to reduce memory usage
312+
5. Smaller dataset subsets (100 training samples, 20 validation samples, 10 test samples)
313+
6. Special compatibility settings for Phi models running on CPU
314+
315+
For best results, we recommend:
316+
- Using a machine with at least 16GB of RAM
317+
- Being patient! LLM training on CPU is much slower than on GPU
318+
- If you still encounter memory issues, try reducing the `max_train_samples` parameter even further in the config file
319+
320+
### Known Issues and Workarounds
321+
322+
Some large language models like Phi-3.5 have caching mechanisms that are optimized for GPU usage and may encounter issues when running on CPU. Our CPU configuration includes several workarounds:
323+
324+
1. Disabling KV caching for model generation
325+
2. Using `torch.float16 data` type to reduce memory usage
326+
3. Disabling flash attention which isn't needed on CPU
327+
4. Using standard AdamW optimizer instead of 8-bit optimizers that require GPU
328+
329+
These changes allow the model to run on CPU with less memory and avoid compatibility issues, although at the cost of some performance.
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Apache Software License 2.0
2+
#
3+
# Copyright (c) ZenML GmbH 2024. All rights reserved.
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
model:
19+
name: llm-peft-phi-3.5-mini-instruct-cpu
20+
description: "Fine-tune Phi-3.5-mini-instruct on CPU."
21+
tags:
22+
- llm
23+
- peft
24+
- phi-3.5
25+
- cpu
26+
version: 100_steps
27+
28+
settings:
29+
docker:
30+
parent_image: pytorch/pytorch:2.2.2-runtime
31+
requirements: requirements.txt
32+
python_package_installer: uv
33+
python_package_installer_args:
34+
system: null
35+
apt_packages:
36+
- git
37+
environment:
38+
MKL_SERVICE_FORCE_INTEL: "1"
39+
# Explicitly disable MPS
40+
PYTORCH_ENABLE_MPS_FALLBACK: "0"
41+
PYTORCH_MPS_HIGH_WATERMARK_RATIO: "0.0"
42+
43+
parameters:
44+
# Uses a smaller model for CPU training
45+
base_model_id: microsoft/Phi-3.5-mini-instruct
46+
use_fast: False
47+
load_in_4bit: False
48+
load_in_8bit: False
49+
cpu_only: True # Enable CPU-only mode
50+
# Extra conservative dataset size for CPU
51+
max_train_samples: 50
52+
max_val_samples: 10
53+
max_test_samples: 5
54+
system_prompt: |
55+
Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values.
56+
This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute'].
57+
The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier']
58+
59+
60+
steps:
61+
prepare_data:
62+
parameters:
63+
dataset_name: gem/viggo
64+
# These settings are now defined at the pipeline level
65+
# max_train_samples: 100
66+
# max_val_samples: 20
67+
# max_test_samples: 10
68+
69+
finetune:
70+
parameters:
71+
max_steps: 25 # Further reduced steps for CPU training
72+
eval_steps: 5 # More frequent evaluation
73+
bf16: False # Disable bf16 for CPU compatibility
74+
per_device_train_batch_size: 1 # Smallest batch size for CPU
75+
gradient_accumulation_steps: 2 # Reduced for CPU
76+
optimizer: "adamw_torch" # Use standard AdamW rather than 8-bit for CPU
77+
logging_steps: 2 # More frequent logging
78+
save_steps: 25 # Save less frequently
79+
save_total_limit: 1 # Keep only the best model
80+
evaluation_strategy: "steps"
81+
82+
promote:
83+
parameters:
84+
metric: rouge2
85+
target_stage: staging

gamesense/pipelines/train.py

Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ def llm_peft_full_finetune(
3333
use_fast: bool = True,
3434
load_in_8bit: bool = False,
3535
load_in_4bit: bool = False,
36+
cpu_only: bool = False,
37+
max_train_samples: int = None,
38+
max_val_samples: int = None,
39+
max_test_samples: int = None,
3640
):
3741
"""Pipeline for finetuning an LLM with peft.
3842
@@ -42,20 +46,39 @@ def llm_peft_full_finetune(
4246
- finetune: finetune the model
4347
- evaluate_model: evaluate the base and finetuned model
4448
- promote: promote the model to the target stage, if evaluation was successful
49+
50+
Args:
51+
system_prompt: The system prompt to use.
52+
base_model_id: The base model id to use.
53+
use_fast: Whether to use the fast tokenizer.
54+
load_in_8bit: Whether to load in 8-bit precision (requires GPU).
55+
load_in_4bit: Whether to load in 4-bit precision (requires GPU).
56+
cpu_only: Whether to force using CPU only and disable quantization.
57+
max_train_samples: Maximum number of training samples to use (for CPU or testing).
58+
max_val_samples: Maximum number of validation samples to use (for CPU or testing).
59+
max_test_samples: Maximum number of test samples to use (for CPU or testing).
4560
"""
46-
if not load_in_8bit and not load_in_4bit:
47-
raise ValueError(
48-
"At least one of `load_in_8bit` and `load_in_4bit` must be True."
49-
)
50-
if load_in_4bit and load_in_8bit:
51-
raise ValueError(
52-
"Only one of `load_in_8bit` and `load_in_4bit` can be True."
53-
)
61+
if not cpu_only:
62+
if not load_in_8bit and not load_in_4bit:
63+
raise ValueError(
64+
"At least one of `load_in_8bit` and `load_in_4bit` must be True when not in CPU-only mode."
65+
)
66+
if load_in_4bit and load_in_8bit:
67+
raise ValueError(
68+
"Only one of `load_in_8bit` and `load_in_4bit` can be True."
69+
)
70+
71+
if cpu_only:
72+
load_in_8bit = False
73+
load_in_4bit = False
5474

5575
datasets_dir = prepare_data(
5676
base_model_id=base_model_id,
5777
system_prompt=system_prompt,
5878
use_fast=use_fast,
79+
max_train_samples=max_train_samples,
80+
max_val_samples=max_val_samples,
81+
max_test_samples=max_test_samples,
5982
)
6083

6184
evaluate_model(
@@ -66,6 +89,7 @@ def llm_peft_full_finetune(
6689
use_fast=use_fast,
6790
load_in_8bit=load_in_8bit,
6891
load_in_4bit=load_in_4bit,
92+
cpu_only=cpu_only,
6993
id="evaluate_base",
7094
)
7195
log_metadata_from_step_artifact(
@@ -82,6 +106,8 @@ def llm_peft_full_finetune(
82106
load_in_8bit=load_in_8bit,
83107
load_in_4bit=load_in_4bit,
84108
use_accelerate=False,
109+
cpu_only=cpu_only,
110+
bf16=not cpu_only,
85111
)
86112

87113
evaluate_model(
@@ -92,6 +118,7 @@ def llm_peft_full_finetune(
92118
use_fast=use_fast,
93119
load_in_8bit=load_in_8bit,
94120
load_in_4bit=load_in_4bit,
121+
cpu_only=cpu_only,
95122
id="evaluate_finetuned",
96123
)
97124
log_metadata_from_step_artifact(

0 commit comments

Comments
 (0)