@@ -170,7 +170,7 @@ This asymmetry allows for more efficient memory usage without compromising model
170170./llama.cpp/build/bin/llama-cli -m models/your-model.gguf -p " Your prompt" \
171171 -t 8 --flash-attn
172172
173- # ⭐ RECOMMENDED: 8-bit keys, 4-bit values (K8V4)
173+ # ⭐ RECOMMENDED: 8-bit keys, 4-bit values (K8V4)
174174# Best balance of quality and memory savings
175175./llama.cpp/build/bin/llama-cli -m models/your-model.gguf -p " Your prompt" \
176176 -t 8 --flash-attn --kvq 8
@@ -262,14 +262,14 @@ You can visualize memory savings with our capture tool:
262262## 🍎 Apple Silicon Optimization
263263
264264- ** Metal Performance** : Fully optimized for Apple's Metal framework
265- - ** Memory Efficiency** : Critical for memory-constrained M1/M2/M3 devices
265+ - ** Memory Efficiency** : Critical for memory-constrained M series Apple silicon devices
266266- ** Activity Monitor** : Use our ` capture_memory.sh ` script to visualize real-time memory reductions
267267- ** Alignment** : 256B page alignment in llama.cpp means actual memory savings might differ slightly from theoretical calculations
268268
269269## ⭐ Key Features
270270
271271- ** Differentiated Precision** : Independent key and value bit precision (K8V4, K4V8, etc)
272- - ** Apple Silicon Optimization** : Full Metal support for M1/M2/M3 chips
272+ - ** Apple Silicon Optimization** : Full Metal support for M1/M2/M3/M4 chips
273273- ** Comprehensive Benchmarking** : Memory, speed, and quality metrics
274274- ** Publication-Quality Visualization** : Beautiful plots for analysis
275275- ** Simple User Interface** : One-command install and quick comparison tools
0 commit comments