Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 139 additions & 18 deletions gamesense/README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,78 @@
# 🎮 GameSense: The LLM That Understands Gamers
# 🎮 GameSense: An LLM That Transforms Gaming Conversations into Structured Data

Elevate your gaming platform with an AI that translates player language into actionable data. A model that understands gaming terminology, extracts key attributes, and structures conversations for intelligent recommendations and support.
GameSense is a specialized language model that converts unstructured gaming conversations into structured, actionable data. It listens to how gamers talk and extracts valuable information that can power recommendations, support systems, and analytics.

## 🚀 Product Overview
## 🎯 What GameSense Does

GameSense is a specialized language model designed specifically for gaming platforms and communities. By fine-tuning powerful open-source LLMs on gaming conversations and terminology, GameSense can:
**Input**: Gamers' natural language about games from forums, chats, reviews, etc.

- **Understand Gaming Jargon**: Recognize specialized terms across different game genres and communities
- **Extract Player Sentiment**: Identify frustrations, excitement, and other emotions in player communications
- **Structure Unstructured Data**: Transform casual player conversations into structured, actionable data
- **Generate Personalized Responses**: Create contextually appropriate replies that resonate with gamers
- **Power Intelligent Recommendations**: Suggest games, content, or solutions based on player preferences and history
**Output**: Structured data with categorized information about games, platforms, preferences, etc.

Built on ZenML's enterprise-grade MLOps framework, GameSense delivers a production-ready solution that can be deployed, monitored, and continuously improved with minimal engineering overhead.
Here's a concrete example from our training data:

## 💡 How It Works
### Input Example (Gaming Conversation)
```
"Dirt: Showdown from 2012 is a sport racing game for the PlayStation, Xbox, PC rated E 10+ (for Everyone 10 and Older). It's not available on Steam, Linux, or Mac."
```

### Output Example (Structured Information)
```
inform(
name[Dirt: Showdown],
release_year[2012],
esrb[E 10+ (for Everyone 10 and Older)],
genres[driving/racing, sport],
platforms[PlayStation, Xbox, PC],
available_on_steam[no],
has_linux_release[no],
has_mac_release[no]
)
```

This structured output can be used to:
- Answer specific questions about games ("Is Dirt: Showdown available on Mac?")
- Track trends in gaming discussions
- Power recommendation engines
- Extract user opinions and sentiment
- Build gaming knowledge graphs
- Enhance customer support

## 🚀 How GameSense Transforms Gaming Conversations

GameSense listens to gaming chats, forum posts, customer support tickets, social media, and other sources where gamers communicate. As gamers discuss different titles, features, opinions, and issues, GameSense:

1. **Recognizes gaming jargon** across different genres and communities
2. **Extracts key information** about games, platforms, features, and opinions
3. **Structures this information** into a standardized format
4. **Makes it available** for downstream applications

## 💡 Real-World Applications

GameSense leverages Parameter-Efficient Fine-Tuning (PEFT) techniques to customize powerful foundation models like Microsoft's Phi-2 or Llama 3.1 for gaming-specific applications. The system follows a streamlined pipeline:
### Community Analysis
Monitor conversations across Discord, Reddit, and other platforms to track what games are being discussed, what features players care about, and emerging trends.

1. **Data Preparation**: Gaming conversations are processed and tokenized
2. **Model Fine-Tuning**: The base model is efficiently customized using LoRA adapters
3. **Evaluation**: The model is rigorously tested against gaming-specific benchmarks
4. **Deployment**: High-performing models are automatically promoted to production
### Intelligent Customer Support
When a player says: "I can't get Dirt: Showdown to run on my Mac," GameSense identifies:
- The specific game (Dirt: Showdown)
- The platform issue (Mac)
- The fact that the game doesn't support Mac (from structured knowledge)
- Can immediately inform the player about platform incompatibility

### Smart Recommendations
When a player has been discussing racing games for PlayStation with family-friendly ratings, GameSense can help power recommendations for similar titles they might enjoy.

### Automated Content Moderation
By understanding the context of gaming conversations, GameSense can better identify toxic behavior while recognizing harmless gaming slang.

## 🧠 Technical Approach

GameSense uses Parameter-Efficient Fine-Tuning (PEFT) to customize powerful foundation models for understanding gaming language:

1. We start with a base model like Microsoft's Phi-2 or Llama 3.1
2. Fine-tune on the gem/viggo dataset containing structured gaming conversations
3. Use LoRA adapters for efficient training
4. Evaluate on gaming-specific benchmarks
5. Deploy to production environments

<div align="center">
<br/>
Expand All @@ -46,6 +97,16 @@ GameSense leverages Parameter-Efficient Fine-Tuning (PEFT) techniques to customi
- Python 3.8+
- GPU with at least 24GB VRAM (for full model training)
- ZenML installed and configured
- Neptune.ai account for experiment tracking (optional)

### Environment Setup

1. Set up your Neptune.ai credentials if you want to use Neptune for experiment tracking:
```bash
# Set your Neptune project name and API token as environment variables
export NEPTUNE_PROJECT="your-neptune-workspace/your-project-name"
export NEPTUNE_API_TOKEN="your-neptune-api-token"
```

### Quick Setup

Expand Down Expand Up @@ -95,6 +156,17 @@ python run.py --config configs/llama3-1_finetune_local.yaml
> - For remote finetuning: [`llama3-1_finetune_remote.yaml`](configs/llama3-1_finetune_remote.yaml)
> - For local finetuning: [`llama3-1_finetune_local.yaml`](configs/llama3-1_finetune_local.yaml)

### Dataset Configuration

By default, GameSense uses the gem/viggo dataset, which contains structured gaming information like:

| gem_id | meaning_representation | target | references |
|--------|------------------------|--------|------------|
| viggo-train-0 | inform(name[Dirt: Showdown], release_year[2012], esrb[E 10+ (for Everyone 10 and Older)], genres[driving/racing, sport], platforms[PlayStation, Xbox, PC], available_on_steam[no], has_linux_release[no], has_mac_release[no]) | Dirt: Showdown from 2012 is a sport racing game for the PlayStation, Xbox, PC rated E 10+ (for Everyone 10 and Older). It's not available on Steam, Linux, or Mac. | [Dirt: Showdown from 2012 is a sport racing game for the PlayStation, Xbox, PC rated E 10+ (for Everyone 10 and Older). It's not available on Steam, Linux, or Mac.] |
| viggo-train-1 | inform(name[Dirt: Showdown], release_year[2012], esrb[E 10+...]) | Dirt: Showdown is a sport racing game... | [Dirt: Showdown is a sport racing game...] |

You can also train on your own gaming conversations by formatting them in a similar structure and updating the configuration.

### Training Acceleration

For faster training on high-end hardware:
Expand Down Expand Up @@ -148,7 +220,7 @@ For detailed instructions on data preparation, see our [data customization guide

GameSense includes built-in evaluation using industry-standard metrics:

- **ROUGE Scores**: Measure response quality and relevance
- **ROUGE Scores**: Measure how well the model can generate natural language from structured data
- **Gaming-Specific Benchmarks**: Evaluate understanding of gaming terminology
- **Automatic Model Promotion**: Only deploy models that meet quality thresholds

Expand Down Expand Up @@ -192,7 +264,7 @@ GameSense follows a modular architecture for easy customization:

To fine-tune GameSense on your specific gaming platform's data:

1. **Format your dataset**: Prepare your gaming conversations in a structured format
1. **Format your dataset**: Prepare your gaming conversations in a structured format similar to gem/viggo
2. **Update the configuration**: Point to your dataset in the config file
3. **Run the pipeline**: GameSense will automatically process and learn from your data

Expand All @@ -203,6 +275,55 @@ The [`prepare_data` step](steps/prepare_datasets.py) handles:

For custom data sources, you'll need to prepare the splits in a Hugging Face dataset format. The step returns paths to the stored datasets (`train`, `val`, and `test_raw` splits), with the test set tokenized later during evaluation.

You can structure conversations from:
- Game forums
- Support tickets
- Discord chats
- Streaming chats
- Reviews
- Social media posts

## 📚 Documentation

For learning more about how to use ZenML to build your own MLOps pipelines, refer to our comprehensive [ZenML documentation](https://docs.zenml.io/).

## Running on CPU-only Environment

If you don't have access to a GPU, you can still run this project with the CPU-only configuration. We've made several optimizations to make this project work on CPU, including:

- Smaller batch sizes for reduced memory footprint
- Fewer training steps
- Disabled GPU-specific features (quantization, bf16, etc.)
- Using smaller test datasets for evaluation
- Special handling for Phi-3.5 model caching issues on CPU

To run the project on CPU:

```bash
python run.py --config phi3.5_finetune_cpu.yaml
```

Note that training on CPU will be significantly slower than training on a GPU. The CPU configuration uses:

1. A smaller model (`phi-3.5-mini-instruct`) which is more CPU-friendly
2. Reduced batch size and increased gradient accumulation steps
3. Fewer total training steps (50 instead of 300)
4. Half-precision (float16) where possible to reduce memory usage
5. Smaller dataset subsets (100 training samples, 20 validation samples, 10 test samples)
6. Special compatibility settings for Phi models running on CPU

For best results, we recommend:
- Using a machine with at least 16GB of RAM
- Being patient! LLM training on CPU is much slower than on GPU
- If you still encounter memory issues, try reducing the `max_train_samples` parameter even further in the config file

### Known Issues and Workarounds

Some large language models like Phi-3.5 have caching mechanisms that are optimized for GPU usage and may encounter issues when running on CPU. Our CPU configuration includes several workarounds:

1. Disabling KV caching for model generation
2. Using `torch.float16 data` type to reduce memory usage
3. Disabling flash attention which isn't needed on CPU
4. Using standard AdamW optimizer instead of 8-bit optimizers that require GPU

These changes allow the model to run on CPU with less memory and avoid compatibility issues, although at the cost of some performance.
85 changes: 85 additions & 0 deletions gamesense/configs/phi3.5_finetune_cpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Apache Software License 2.0
#
# Copyright (c) ZenML GmbH 2024. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

model:
name: llm-peft-phi-3.5-mini-instruct-cpu
description: "Fine-tune Phi-3.5-mini-instruct on CPU."
tags:
- llm
- peft
- phi-3.5
- cpu
version: 100_steps

settings:
docker:
parent_image: pytorch/pytorch:2.2.2-runtime
requirements: requirements.txt
python_package_installer: uv
python_package_installer_args:
system: null
apt_packages:
- git
environment:
MKL_SERVICE_FORCE_INTEL: "1"
# Explicitly disable MPS
PYTORCH_ENABLE_MPS_FALLBACK: "0"
PYTORCH_MPS_HIGH_WATERMARK_RATIO: "0.0"

parameters:
# Uses a smaller model for CPU training
base_model_id: microsoft/Phi-3.5-mini-instruct
use_fast: False
load_in_4bit: False
load_in_8bit: False
cpu_only: True # Enable CPU-only mode
# Extra conservative dataset size for CPU
max_train_samples: 50
max_val_samples: 10
max_test_samples: 5
system_prompt: |
Given a target sentence construct the underlying meaning representation of the input sentence as a single function with attributes and attribute values.
This function should describe the target string accurately and the function must be one of the following ['inform', 'request', 'give_opinion', 'confirm', 'verify_attribute', 'suggest', 'request_explanation', 'recommend', 'request_attribute'].
The attributes must be one of the following: ['name', 'exp_release_date', 'release_year', 'developer', 'esrb', 'rating', 'genres', 'player_perspective', 'has_multiplayer', 'platforms', 'available_on_steam', 'has_linux_release', 'has_mac_release', 'specifier']


steps:
prepare_data:
parameters:
dataset_name: gem/viggo
# These settings are now defined at the pipeline level
# max_train_samples: 100
# max_val_samples: 20
# max_test_samples: 10

finetune:
parameters:
max_steps: 25 # Further reduced steps for CPU training
eval_steps: 5 # More frequent evaluation
bf16: False # Disable bf16 for CPU compatibility
per_device_train_batch_size: 1 # Smallest batch size for CPU
gradient_accumulation_steps: 2 # Reduced for CPU
optimizer: "adamw_torch" # Use standard AdamW rather than 8-bit for CPU
logging_steps: 2 # More frequent logging
save_steps: 25 # Save less frequently
save_total_limit: 1 # Keep only the best model
evaluation_strategy: "steps"

promote:
parameters:
metric: rouge2
target_stage: staging
43 changes: 35 additions & 8 deletions gamesense/pipelines/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ def llm_peft_full_finetune(
use_fast: bool = True,
load_in_8bit: bool = False,
load_in_4bit: bool = False,
cpu_only: bool = False,
max_train_samples: int = None,
max_val_samples: int = None,
max_test_samples: int = None,
):
"""Pipeline for finetuning an LLM with peft.

Expand All @@ -42,20 +46,39 @@ def llm_peft_full_finetune(
- finetune: finetune the model
- evaluate_model: evaluate the base and finetuned model
- promote: promote the model to the target stage, if evaluation was successful

Args:
system_prompt: The system prompt to use.
base_model_id: The base model id to use.
use_fast: Whether to use the fast tokenizer.
load_in_8bit: Whether to load in 8-bit precision (requires GPU).
load_in_4bit: Whether to load in 4-bit precision (requires GPU).
cpu_only: Whether to force using CPU only and disable quantization.
max_train_samples: Maximum number of training samples to use (for CPU or testing).
max_val_samples: Maximum number of validation samples to use (for CPU or testing).
max_test_samples: Maximum number of test samples to use (for CPU or testing).
"""
if not load_in_8bit and not load_in_4bit:
raise ValueError(
"At least one of `load_in_8bit` and `load_in_4bit` must be True."
)
if load_in_4bit and load_in_8bit:
raise ValueError(
"Only one of `load_in_8bit` and `load_in_4bit` can be True."
)
if not cpu_only:
if not load_in_8bit and not load_in_4bit:
raise ValueError(
"At least one of `load_in_8bit` and `load_in_4bit` must be True when not in CPU-only mode."
)
if load_in_4bit and load_in_8bit:
raise ValueError(
"Only one of `load_in_8bit` and `load_in_4bit` can be True."
)

if cpu_only:
load_in_8bit = False
load_in_4bit = False

datasets_dir = prepare_data(
base_model_id=base_model_id,
system_prompt=system_prompt,
use_fast=use_fast,
max_train_samples=max_train_samples,
max_val_samples=max_val_samples,
max_test_samples=max_test_samples,
)

evaluate_model(
Expand All @@ -66,6 +89,7 @@ def llm_peft_full_finetune(
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
cpu_only=cpu_only,
id="evaluate_base",
)
log_metadata_from_step_artifact(
Expand All @@ -82,6 +106,8 @@ def llm_peft_full_finetune(
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
use_accelerate=False,
cpu_only=cpu_only,
bf16=not cpu_only,
)

evaluate_model(
Expand All @@ -92,6 +118,7 @@ def llm_peft_full_finetune(
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
cpu_only=cpu_only,
id="evaluate_finetuned",
)
log_metadata_from_step_artifact(
Expand Down
Loading
Loading