Skip to content

Commit 03becf0

Browse files
authored
Adding memory profiling (quic#674)
Added memory profiling tool (scripts/memory_profiling) that tracks memory, CPU, and disk I/O usage across QEfficient workflow stages. The profiler supports manual operation marking, child process tracking for accurate compilation metrics, and generates 4-panel visualizations with detailed performance reports to help identify bottlenecks and optimize resource usage. Signed-off-by: Rishin Raj <rishinr@qti.qualcomm.com>
1 parent 46ed92b commit 03becf0

File tree

5 files changed

+1585
-0
lines changed

5 files changed

+1585
-0
lines changed

scripts/memory_profiling/README.md

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
# QEfficient Memory Profiling
2+
3+
A memory profiling solution for QEfficient workflows with manual operation marking.
4+
5+
6+
7+
## Quick Start
8+
9+
```python
10+
from profiler import QEffMemoryProfiler
11+
from QEfficient import QEFFAutoModelForCausalLM
12+
from transformers import AutoTokenizer
13+
14+
# Initialize profiler with verbose output to see detailed memory tracking information
15+
profiler = QEffMemoryProfiler(verbose=True)
16+
# Start monitoring memory usage - this begins tracking memory consumption
17+
profiler.start_monitoring()
18+
19+
# Mark the start of model loading operation for memory profiling, this will help to create stage wise partitioning the output graph
20+
profiler.mark_operation("Loading model")
21+
22+
model = QEFFAutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
23+
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
24+
25+
# Mark the export operation
26+
profiler.mark_operation("Export")
27+
model.export()
28+
29+
# Mark the compilation operation
30+
profiler.mark_operation("Compile")
31+
model.compile(prefill_seq_len=128, ctx_len=256, num_cores=16)
32+
33+
# Mark the text generation operation
34+
profiler.mark_operation("Generation")
35+
output = model.generate(prompts=["Hello world"], tokenizer=tokenizer, generation_len=100)
36+
37+
# Stop memory monitoring and generate reports
38+
profiler.stop_monitoring()
39+
40+
# Print a detailed memory usage report to the console showing peak memory and operation-wise breakdown (optional)
41+
print(profiler.get_memory_report())
42+
43+
# Generate a visual graph of memory usage over time and save it as an image file
44+
profiler.generate_memory_graph("profile.png")
45+
```
46+
47+
## Configuration
48+
49+
### Basic Configuration
50+
51+
```python
52+
profiler = QEffMemoryProfiler(
53+
sampling_interval=0.1, # Sample every 100ms
54+
output_file="my_profile.png", # Custom output file
55+
verbose=True, # Enable detailed logging
56+
enable_cpu_monitoring=True, # Monitor CPU usage
57+
enable_disk_monitoring=True, # Monitor disk I/O
58+
)
59+
```
60+
61+
### Manual Operation Marking
62+
63+
```python
64+
profiler = QEffMemoryProfiler()
65+
profiler.start_monitoring()
66+
67+
# Manual operation marking
68+
profiler.mark_operation("Custom Operation 1")
69+
# ... your code ...
70+
71+
profiler.mark_operation("Custom Operation 2")
72+
# ... more code ...
73+
74+
profiler.stop_monitoring()
75+
```
76+
77+
## API Reference
78+
79+
### QEffMemoryProfiler
80+
81+
#### Constructor Parameters
82+
83+
| Parameter | Type | Default | Description |
84+
|-----------|------|---------|-------------|
85+
| `sampling_interval` | `float` | `0.05` | Time between samples (seconds) |
86+
| `output_file` | `str` | `"qeff_memory_profile.png"` | Output file path |
87+
| `verbose` | `bool` | `False` | Enable verbose logging |
88+
| `enable_cpu_monitoring` | `bool` | `True` | Monitor CPU usage |
89+
| `enable_disk_monitoring` | `bool` | `True` | Monitor disk I/O |
90+
91+
#### Methods
92+
93+
- **`start_monitoring()`**: Start background monitoring
94+
- **`stop_monitoring()`**: Stop monitoring and mark completion
95+
- **`mark_operation(name: str)`**: Manually mark operation start
96+
- **`get_memory_report() -> str`**: Generate comprehensive text report
97+
- **`generate_memory_graph(filename: str)`**: Create visualization
98+
- **`stop_and_save(filename: str) -> str`**: Convenience method to stop and save
99+
100+
#### Properties
101+
102+
- **`peak_rss`**: Peak RSS memory usage (MB)
103+
- **`peak_operation`**: Operation during peak memory
104+
- **`samples`**: List of collected profiling samples
105+
- **`operations`**: List of marked operations with timestamps
106+
107+
## Operation Types
108+
109+
The profiler supports marking these common QEfficient operations:
110+
111+
- **Model Loading**: `from_pretrained`, `AutoModel`, `AutoTokenizer`
112+
- **Export**: `model.export()`, ONNX transforms, PyTorch transforms
113+
- **Compilation**: `model.compile()`, QNN compilation
114+
- **Generation**: `model.generate()`, inference execution
115+
- **Cleanup**: Memory cleanup, garbage collection
116+
117+
## Output
118+
119+
### Console Report
120+
```
121+
QEFFICIENT PERFORMANCE MONITORING REPORT
122+
============================================================
123+
Peak Memory Usage:
124+
• RSS (Physical): 18.7 GB at 14:23:45
125+
• Peak during: Compilation
126+
127+
Memory Statistics:
128+
• Current RSS: 16.2 GB (Delta: +15.8 GB)
129+
• Duration: 185.3 seconds
130+
• Operations: 4
131+
132+
QEfficient Operations Timeline:
133+
1. 0.0s - Model Loading (25.2s) [+8.2 GB]
134+
2. 25.2s - Export (15.4s) [+2.1 GB]
135+
3. 40.6s - Compilation (120.8s) [+6.3 GB] <- Peak
136+
4. 161.4s - Generation (18.7s) [+1.2 GB]
137+
```
138+
139+
### Visualization
140+
141+
The profiler generates a comprehensive 4-panel visualization:
142+
143+
1. **Memory Timeline**: RSS usage with colored operation phases
144+
2. **CPU Usage**: CPU utilization with performance zones
145+
3. **Disk I/O**: Read/write activity per operation phase
146+
4. **Phase Duration**: Timing analysis with duration labels
147+
148+
#### Sample Output
149+
150+
![Sample Memory Profile](memory_profile_llama3.2.png)
151+
152+
*Example memory profiling output showing QEfficient workflow phases including model loading, ONNX transforms, compilation, and generation phases with detailed memory, CPU, and disk I/O metrics.*
153+
154+
## Advanced Usage
155+
156+
157+
### Accessing Raw Data
158+
159+
```python
160+
# Get synchronized data arrays
161+
data = profiler.get_synchronized_data()
162+
timestamps = data['timestamps']
163+
memory_usage = data['rss_memory']
164+
cpu_usage = data['cpu_usage']
165+
166+
# Access individual samples
167+
for sample in profiler.samples:
168+
print(f"Time: {sample.timestamp}, RSS: {sample.rss_mb} MB")
169+
```
170+
171+
## Integration Examples
172+
173+
### With Existing QEfficient Scripts
174+
175+
```python
176+
# Add to existing QEfficient workflow
177+
profiler = QEffMemoryProfiler(output_file="workflow_profile.png")
178+
profiler.start_monitoring()
179+
180+
# Existing QEfficient code unchanged
181+
model = QEFFAutoModelForCausalLM.from_pretrained(model_name)
182+
# ... rest of workflow ...
183+
184+
# Add at end
185+
report = profiler.stop_and_save()
186+
print(report)
187+
```
188+
189+
190+
## Limitations
191+
192+
### Disk I/O Tracking
193+
194+
**Subprocess I/O Limitation**: Disk I/O tracking captures parent process I/O only. Subprocess I/O (e.g., compilation reading ONNX files via `subprocess.run()`) is not captured due to Linux I/O accounting limitations. During compilation phases, expect lower I/O readings than actual file operations performed by subprocesses.
195+
196+
## Compatibility
197+
198+
- **Python**: 3.7+
199+
- **Dependencies**: `psutil`, `matplotlib`, `numpy`
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# -----------------------------------------------------------------------------
2+
#
3+
# Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
4+
# SPDX-License-Identifier: BSD-3-Clause
5+
#
6+
# -----------------------------------------------------------------------------
7+
8+
"""
9+
QEfficient Memory Profiling
10+
11+
A production-ready memory profiling solution specifically designed for QEfficient workflows.
12+
Provides manual operation marking, comprehensive metrics collection, and professional visualization.
13+
14+
Usage Example:
15+
16+
```python
17+
from scripts.memory_profiling import QEffMemoryProfiler
18+
19+
profiler = QEffMemoryProfiler(verbose=True)
20+
profiler.start_monitoring()
21+
# ... your QEfficient code ...
22+
profiler.stop_monitoring()
23+
print(profiler.get_memory_report())
24+
profiler.generate_memory_graph()
25+
```
26+
"""
27+
28+
__version__ = "2.0.0"
29+
__author__ = "Qualcomm Technologies, Inc."
30+
31+
# Core profiler components
32+
from .profiler import (
33+
MetricsCollector,
34+
ProfilerConfig,
35+
ProfileSample,
36+
QEffMemoryProfiler,
37+
)
38+
39+
# Visualization component (imported on-demand)
40+
try:
41+
from .visualizer import QEffMemoryVisualizer
42+
except ImportError:
43+
# Handle case where matplotlib is not available
44+
QEffMemoryVisualizer = None
45+
46+
__all__ = [
47+
"QEffMemoryProfiler",
48+
"ProfilerConfig",
49+
"ProfileSample",
50+
"MetricsCollector",
51+
"QEffMemoryVisualizer",
52+
"__version__",
53+
]
901 KB
Loading

0 commit comments

Comments
 (0)