|
| 1 | +# QEfficient Memory Profiling |
| 2 | + |
| 3 | +A memory profiling solution for QEfficient workflows with manual operation marking. |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +## Quick Start |
| 8 | + |
| 9 | +```python |
| 10 | +from profiler import QEffMemoryProfiler |
| 11 | +from QEfficient import QEFFAutoModelForCausalLM |
| 12 | +from transformers import AutoTokenizer |
| 13 | + |
| 14 | +# Initialize profiler with verbose output to see detailed memory tracking information |
| 15 | +profiler = QEffMemoryProfiler(verbose=True) |
| 16 | +# Start monitoring memory usage - this begins tracking memory consumption |
| 17 | +profiler.start_monitoring() |
| 18 | + |
| 19 | +# Mark the start of model loading operation for memory profiling, this will help to create stage wise partitioning the output graph |
| 20 | +profiler.mark_operation("Loading model") |
| 21 | + |
| 22 | +model = QEFFAutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") |
| 23 | +tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") |
| 24 | + |
| 25 | +# Mark the export operation |
| 26 | +profiler.mark_operation("Export") |
| 27 | +model.export() |
| 28 | + |
| 29 | +# Mark the compilation operation |
| 30 | +profiler.mark_operation("Compile") |
| 31 | +model.compile(prefill_seq_len=128, ctx_len=256, num_cores=16) |
| 32 | + |
| 33 | +# Mark the text generation operation |
| 34 | +profiler.mark_operation("Generation") |
| 35 | +output = model.generate(prompts=["Hello world"], tokenizer=tokenizer, generation_len=100) |
| 36 | + |
| 37 | +# Stop memory monitoring and generate reports |
| 38 | +profiler.stop_monitoring() |
| 39 | + |
| 40 | +# Print a detailed memory usage report to the console showing peak memory and operation-wise breakdown (optional) |
| 41 | +print(profiler.get_memory_report()) |
| 42 | + |
| 43 | +# Generate a visual graph of memory usage over time and save it as an image file |
| 44 | +profiler.generate_memory_graph("profile.png") |
| 45 | +``` |
| 46 | + |
| 47 | +## Configuration |
| 48 | + |
| 49 | +### Basic Configuration |
| 50 | + |
| 51 | +```python |
| 52 | +profiler = QEffMemoryProfiler( |
| 53 | + sampling_interval=0.1, # Sample every 100ms |
| 54 | + output_file="my_profile.png", # Custom output file |
| 55 | + verbose=True, # Enable detailed logging |
| 56 | + enable_cpu_monitoring=True, # Monitor CPU usage |
| 57 | + enable_disk_monitoring=True, # Monitor disk I/O |
| 58 | +) |
| 59 | +``` |
| 60 | + |
| 61 | +### Manual Operation Marking |
| 62 | + |
| 63 | +```python |
| 64 | +profiler = QEffMemoryProfiler() |
| 65 | +profiler.start_monitoring() |
| 66 | + |
| 67 | +# Manual operation marking |
| 68 | +profiler.mark_operation("Custom Operation 1") |
| 69 | +# ... your code ... |
| 70 | + |
| 71 | +profiler.mark_operation("Custom Operation 2") |
| 72 | +# ... more code ... |
| 73 | + |
| 74 | +profiler.stop_monitoring() |
| 75 | +``` |
| 76 | + |
| 77 | +## API Reference |
| 78 | + |
| 79 | +### QEffMemoryProfiler |
| 80 | + |
| 81 | +#### Constructor Parameters |
| 82 | + |
| 83 | +| Parameter | Type | Default | Description | |
| 84 | +|-----------|------|---------|-------------| |
| 85 | +| `sampling_interval` | `float` | `0.05` | Time between samples (seconds) | |
| 86 | +| `output_file` | `str` | `"qeff_memory_profile.png"` | Output file path | |
| 87 | +| `verbose` | `bool` | `False` | Enable verbose logging | |
| 88 | +| `enable_cpu_monitoring` | `bool` | `True` | Monitor CPU usage | |
| 89 | +| `enable_disk_monitoring` | `bool` | `True` | Monitor disk I/O | |
| 90 | + |
| 91 | +#### Methods |
| 92 | + |
| 93 | +- **`start_monitoring()`**: Start background monitoring |
| 94 | +- **`stop_monitoring()`**: Stop monitoring and mark completion |
| 95 | +- **`mark_operation(name: str)`**: Manually mark operation start |
| 96 | +- **`get_memory_report() -> str`**: Generate comprehensive text report |
| 97 | +- **`generate_memory_graph(filename: str)`**: Create visualization |
| 98 | +- **`stop_and_save(filename: str) -> str`**: Convenience method to stop and save |
| 99 | + |
| 100 | +#### Properties |
| 101 | + |
| 102 | +- **`peak_rss`**: Peak RSS memory usage (MB) |
| 103 | +- **`peak_operation`**: Operation during peak memory |
| 104 | +- **`samples`**: List of collected profiling samples |
| 105 | +- **`operations`**: List of marked operations with timestamps |
| 106 | + |
| 107 | +## Operation Types |
| 108 | + |
| 109 | +The profiler supports marking these common QEfficient operations: |
| 110 | + |
| 111 | +- **Model Loading**: `from_pretrained`, `AutoModel`, `AutoTokenizer` |
| 112 | +- **Export**: `model.export()`, ONNX transforms, PyTorch transforms |
| 113 | +- **Compilation**: `model.compile()`, QNN compilation |
| 114 | +- **Generation**: `model.generate()`, inference execution |
| 115 | +- **Cleanup**: Memory cleanup, garbage collection |
| 116 | + |
| 117 | +## Output |
| 118 | + |
| 119 | +### Console Report |
| 120 | +``` |
| 121 | +QEFFICIENT PERFORMANCE MONITORING REPORT |
| 122 | +============================================================ |
| 123 | +Peak Memory Usage: |
| 124 | + • RSS (Physical): 18.7 GB at 14:23:45 |
| 125 | + • Peak during: Compilation |
| 126 | +
|
| 127 | +Memory Statistics: |
| 128 | + • Current RSS: 16.2 GB (Delta: +15.8 GB) |
| 129 | + • Duration: 185.3 seconds |
| 130 | + • Operations: 4 |
| 131 | +
|
| 132 | +QEfficient Operations Timeline: |
| 133 | + 1. 0.0s - Model Loading (25.2s) [+8.2 GB] |
| 134 | + 2. 25.2s - Export (15.4s) [+2.1 GB] |
| 135 | + 3. 40.6s - Compilation (120.8s) [+6.3 GB] <- Peak |
| 136 | + 4. 161.4s - Generation (18.7s) [+1.2 GB] |
| 137 | +``` |
| 138 | + |
| 139 | +### Visualization |
| 140 | + |
| 141 | +The profiler generates a comprehensive 4-panel visualization: |
| 142 | + |
| 143 | +1. **Memory Timeline**: RSS usage with colored operation phases |
| 144 | +2. **CPU Usage**: CPU utilization with performance zones |
| 145 | +3. **Disk I/O**: Read/write activity per operation phase |
| 146 | +4. **Phase Duration**: Timing analysis with duration labels |
| 147 | + |
| 148 | +#### Sample Output |
| 149 | + |
| 150 | + |
| 151 | + |
| 152 | +*Example memory profiling output showing QEfficient workflow phases including model loading, ONNX transforms, compilation, and generation phases with detailed memory, CPU, and disk I/O metrics.* |
| 153 | + |
| 154 | +## Advanced Usage |
| 155 | + |
| 156 | + |
| 157 | +### Accessing Raw Data |
| 158 | + |
| 159 | +```python |
| 160 | +# Get synchronized data arrays |
| 161 | +data = profiler.get_synchronized_data() |
| 162 | +timestamps = data['timestamps'] |
| 163 | +memory_usage = data['rss_memory'] |
| 164 | +cpu_usage = data['cpu_usage'] |
| 165 | + |
| 166 | +# Access individual samples |
| 167 | +for sample in profiler.samples: |
| 168 | + print(f"Time: {sample.timestamp}, RSS: {sample.rss_mb} MB") |
| 169 | +``` |
| 170 | + |
| 171 | +## Integration Examples |
| 172 | + |
| 173 | +### With Existing QEfficient Scripts |
| 174 | + |
| 175 | +```python |
| 176 | +# Add to existing QEfficient workflow |
| 177 | +profiler = QEffMemoryProfiler(output_file="workflow_profile.png") |
| 178 | +profiler.start_monitoring() |
| 179 | + |
| 180 | +# Existing QEfficient code unchanged |
| 181 | +model = QEFFAutoModelForCausalLM.from_pretrained(model_name) |
| 182 | +# ... rest of workflow ... |
| 183 | + |
| 184 | +# Add at end |
| 185 | +report = profiler.stop_and_save() |
| 186 | +print(report) |
| 187 | +``` |
| 188 | + |
| 189 | + |
| 190 | +## Limitations |
| 191 | + |
| 192 | +### Disk I/O Tracking |
| 193 | + |
| 194 | +**Subprocess I/O Limitation**: Disk I/O tracking captures parent process I/O only. Subprocess I/O (e.g., compilation reading ONNX files via `subprocess.run()`) is not captured due to Linux I/O accounting limitations. During compilation phases, expect lower I/O readings than actual file operations performed by subprocesses. |
| 195 | + |
| 196 | +## Compatibility |
| 197 | + |
| 198 | +- **Python**: 3.7+ |
| 199 | +- **Dependencies**: `psutil`, `matplotlib`, `numpy` |
0 commit comments