You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: gamesense/README.md
+41Lines changed: 41 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -206,3 +206,44 @@ For custom data sources, you'll need to prepare the splits in a Hugging Face dat
206
206
## 📚 Documentation
207
207
208
208
For learning more about how to use ZenML to build your own MLOps pipelines, refer to our comprehensive [ZenML documentation](https://docs.zenml.io/).
209
+
210
+
## Running on CPU-only Environment
211
+
212
+
If you don't have access to a GPU, you can still run this project with the CPU-only configuration. We've made several optimizations to make this project work on CPU, including:
213
+
214
+
- Smaller batch sizes for reduced memory footprint
215
+
- Fewer training steps
216
+
- Disabled GPU-specific features (quantization, bf16, etc.)
217
+
- Using smaller test datasets for evaluation
218
+
- Special handling for Phi-3.5 model caching issues on CPU
219
+
220
+
To run the project on CPU:
221
+
222
+
```bash
223
+
python run.py --config phi3.5_finetune_cpu.yaml
224
+
```
225
+
226
+
Note that training on CPU will be significantly slower than training on a GPU. The CPU configuration uses:
227
+
228
+
1. A smaller model (Phi-3.5-mini-instruct) which is more CPU-friendly
229
+
2. Reduced batch size and increased gradient accumulation steps
230
+
3. Fewer total training steps (50 instead of 300)
231
+
4. Half-precision (float16) where possible to reduce memory usage
232
+
5. Smaller dataset subsets (100 training samples, 20 validation samples, 10 test samples)
233
+
6. Special compatibility settings for Phi models running on CPU
234
+
235
+
For best results, we recommend:
236
+
- Using a machine with at least 16GB of RAM
237
+
- Being patient! LLM training on CPU is much slower than on GPU
238
+
- If you still encounter memory issues, try reducing the max_train_samples parameter even further in the config file
239
+
240
+
### Known Issues and Workarounds
241
+
242
+
Some large language models like Phi-3.5 have caching mechanisms that are optimized for GPU usage and may encounter issues when running on CPU. Our CPU configuration includes several workarounds:
243
+
244
+
1. Disabling KV caching for model generation
245
+
2. Using torch.float16 data type to reduce memory usage
246
+
3. Disabling flash attention which isn't needed on CPU
247
+
4. Using standard AdamW optimizer instead of 8-bit optimizers that require GPU
248
+
249
+
These changes allow the model to run on CPU with less memory and avoid compatibility issues, although at the cost of some performance.
0 commit comments