Skip to content

Commit 63d8202

Browse files
authored
chore: update profiling context/instructions (#725)
1 parent 3d7e811 commit 63d8202

File tree

1 file changed

+167
-46
lines changed

1 file changed

+167
-46
lines changed

contrib/PROFILING.md

Lines changed: 167 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,95 +1,173 @@
1-
# Profiling Tools
2-
3-
This document explains the profiling infrastructure set up for our indexer-service and tap-agent services. The profiling setup enables developers to diagnose performance issues, memory leaks, and analyze runtime behavior in both development and production environments.
1+
# Load Testing and Profiling Guide for Indexer Service
42

53
## Overview
64

7-
Our project includes an integrated profiling system for the indexer services. The system supports multiple profiling methods through:
5+
This guide explains how to perform load testing and profiling on the indexer service and tap-agent components. The system includes built-in load testing tools and various profiling methods to help identify performance bottlenecks and optimize the services.
6+
7+
Our project includes an integrated profiling system that supports multiple profiling methods through:
88

99
1. A custom `profiler` library (included in the workspace)
1010
2. Docker-based profiling environments
1111
3. Various third-party profiling tools
1212

13-
## Available Profiling Methods
13+
## Important Note About Load Limits
1414

15-
### Built-in Profiler (pprof-based Flamegraphs)
15+
⚠️ **Important**: The current indexer-service implementation has a built-in protection mechanism against high load from a single sender. When receiving too many receipts from the same server (approximately 1000 receipts), the sender will be marked as denied. This is a security feature to prevent abuse.
1616

17-
A Rust library that uses [pprof](https://crates.io/crates/pprof) to continuously profile the application and generate flamegraphs at specified intervals.
18-
This solution was particularly suitable because tools like `perf`, while powerful, often pose configuration challenges or require specific capabilities (like CAP_SYS_ADMIN) that complicate their deployment within standard Docker containers.
17+
## Load Testing
18+
19+
### Prerequisites
1920

20-
- **Configuration**: Set in code with the `setup_profiling` function
21-
- **Activation**: Enabled via the `profiling` feature flag
22-
- **Output**: Flamegraphs (SVG) and protobuf profiles in `/opt/profiling/{service-name}/`
21+
1. Set up the local test network:
2322

24-
### External Profiling Tools
23+
```bash
24+
just setup
25+
```
2526

26-
The profiling environment also supports the following tools:
27+
2. Fund the escrow account (required for testing):
2728

28-
| Tool | Description | Output |
29-
| ------------- | ---------------------------------------- | --------------------------------------------- |
30-
| **strace** | Traces system calls with detailed timing | `/opt/profiling/{service-name}/strace.log` |
31-
| **valgrind** | Memory profiling with Massif | `/opt/profiling/{service-name}/massif.out` |
32-
| **callgrind** | CPU profiling (part of valgrind) | `/opt/profiling/{service-name}/callgrind.out` |
29+
```bash
30+
cd integration-tests
31+
./fund_escrow.sh
32+
```
3333

34-
## How to Use
34+
### Running Load Tests
3535

36-
### Prerequisites
36+
The integration tests include a dedicated load testing tool that communicates directly with the indexer-service (bypassing the gateway). This allows for more accurate performance measurements.
3737

38-
Run the setup command first to prepare the testing environment:
38+
To run a load test:
3939

4040
```bash
41-
just setup
41+
# Run with 1000 receipts
42+
cargo run -- load --num-receipts 1000
43+
44+
# Run with custom number of receipts
45+
cargo run -- load --num-receipts <number>
4246
```
4347

44-
### Profiling Commands
48+
The load test will:
49+
50+
- Send the specified number of receipts concurrently
51+
- Use all available CPU cores for concurrency
52+
- Report success/failure rates
53+
- Show average processing time per request
54+
- Display total duration
55+
56+
### Understanding Load Test Results
57+
58+
The load test output includes:
59+
60+
- Total number of receipts processed
61+
- Processing duration
62+
- Average time per request
63+
- Number of successful receipts
64+
- Number of failed receipts
65+
66+
Example output:
67+
68+
```
69+
Completed processing 1000 requests in 2.5s
70+
Average time per request: 2.5ms
71+
Successfully sent receipts: 998
72+
Failed receipts: 2
73+
```
74+
75+
## Profiling
76+
77+
### Available Profiling Methods
78+
79+
The system supports multiple profiling tools:
80+
81+
1. **Flamegraph** (Default)
82+
83+
- Visual representation of CPU usage
84+
- Shows function call stacks
85+
- Based on [pprof](https://crates.io/crates/pprof)
86+
- Output: SVG files and protobuf profiles in `/opt/profiling/{service-name}/`
87+
88+
2. **Valgrind Massif**
89+
90+
- Memory profiling
91+
- Tracks heap usage
92+
- Output: `massif.out` files
93+
94+
3. **Callgrind**
95+
96+
- Detailed CPU profiling
97+
- Cache and branch prediction analysis
98+
- Output: `callgrind.out` files
99+
100+
4. **Strace**
101+
- System call tracing
102+
- Detailed timing information
103+
- Output: `strace.log` files
104+
105+
### Running Profiling
45106

46-
Use the following commands to profile specific services:
107+
1. Start profiling with your chosen tool:
47108

48109
```bash
49-
# Profile with flamegraph (default)
110+
# For flamegraph (default)
50111
just profile-flamegraph
51112

52-
# Profile with valgrind
113+
# For memory profiling
53114
just profile-valgrind
54115

55-
# Profile with strace
116+
# For system call tracing
56117
just profile-strace
57118

58-
# Profile with callgrind
119+
# For CPU profiling
59120
just profile-callgrind
121+
```
122+
123+
2. Run your load test while profiling:
124+
125+
```bash
126+
cargo run -- load --num-receipts 1000
127+
```
128+
129+
3. Stop profiling to generate results:
60130

61-
# Stop profiling (gracefully terminate to generate output)
131+
```bash
62132
just stop-profiling
133+
```
63134

64-
# Restore normal service without profiling
135+
4. Restore normal service:
136+
137+
```bash
65138
just profile-restore
66139
```
67140

68-
### Viewing Results
141+
### Viewing Profiling Results
69142

70143
Profiling data is stored in:
71144

72-
- `contrib/profiling/indexer-service/`
73-
- `contrib/profiling/tap-agent/`
145+
- `contrib/profiling/indexer-service/` (for indexer-service)
146+
- `contrib/profiling/tap-agent/` (for tap-agent)
74147

75-
#### Visualization Tools
148+
#### Flamegraph Analysis
76149

77-
- **Flamegraphs**: Open the SVG files in any web browser
78-
- **Callgrind**: Use `callgrind_annotate` or KCachegrind for visualization:
150+
- Open the SVG files in a web browser
151+
- Look for wide bars indicating high CPU usage
152+
- Identify hot paths in the code
79153

80-
```bash
81-
callgrind_annotate contrib/profiling/tap-agent/callgrind.out
82-
```
154+
#### Memory Profiling (Massif)
83155

84-
- **Massif**: Use `ms_print` to view memory profiling results:
156+
```bash
157+
ms_print contrib/profiling/indexer-service/massif.out
158+
```
85159

86-
```bash
87-
ms_print contrib/profiling/tap-agent/massif.out
88-
```
160+
#### CPU Profiling (Callgrind)
89161

90-
- **Protobuf Profiles**: View with Go pprof tools:
162+
```bash
163+
callgrind_annotate contrib/profiling/indexer-service/callgrind.out
164+
```
91165

92-
```go
166+
#### Protobuf Profiles
167+
168+
View with Go pprof tools:
169+
170+
```bash
93171
# Install Go pprof tools if needed
94172
go install github.com/google/pprof@latest
95173

@@ -100,6 +178,44 @@ pprof -http=:8080 contrib/profiling/indexer-service/profile-*.pb
100178
pprof -flamegraph contrib/profiling/indexer-service/profile-*.pb > custom_flamegraph.svg
101179
```
102180

181+
## Best Practices
182+
183+
1. **Load Testing**
184+
185+
- Start with small numbers (100-500 receipts)
186+
- Gradually increase load to find breaking points
187+
- Monitor system resources during tests
188+
- Be aware of the 1000 receipt limit per sender
189+
190+
2. **Profiling**
191+
192+
- Use flamegraph for general performance analysis
193+
- Use Massif for memory leak detection
194+
- Use Callgrind for detailed CPU optimization
195+
- Use strace for system call bottlenecks
196+
- For production use, prefer the built-in profiler over external tools to minimize performance impact
197+
198+
3. **Environment**
199+
- Ensure clean state before testing
200+
- Monitor system resources
201+
- Check logs for errors
202+
- Verify escrow funding before testing
203+
204+
## Troubleshooting
205+
206+
1. If load tests fail:
207+
208+
- Check escrow funding
209+
- Verify service health
210+
- Check logs for errors
211+
- Ensure proper network setup
212+
213+
2. If profiling fails:
214+
- Check container permissions
215+
- Verify profiling tool installation
216+
- Check available disk space
217+
- Ensure proper service shutdown
218+
103219
## Implementation Details
104220

105221
### Profiler Integration
@@ -135,11 +251,16 @@ security_opt:
135251
## Notes
136252
137253
- The flamegraph profiling is enabled whenever using any of the profiling commands through the Justfile, as the binaries are compiled with the `profiling` feature flag.
138-
- For production use, prefer the built-in profiler over the external tools to minimize performance impact.
254+
- Built-in profiler was chosen because tools like `perf`, while powerful, often pose configuration challenges or require specific capabilities (like CAP_SYS_ADMIN) that complicate their deployment within standard Docker containers.
139255
- When using callgrind, consider enabling debug information and frame pointers in your Cargo.toml for better output:
140256

141257
```toml
142258
[profile.release]
143259
debug = true
144260
force-frame-pointers = true
145261
```
262+
263+
## Additional Resources
264+
265+
- [Developer Setup Guide](README.md)
266+
- [Integration Tests](../integration-tests/testing_plan.md)

0 commit comments

Comments
 (0)