Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 167 additions & 46 deletions contrib/PROFILING.md
Original file line number Diff line number Diff line change
@@ -1,95 +1,173 @@
# Profiling Tools

This document explains the profiling infrastructure set up for our indexer-service and tap-agent services. The profiling setup enables developers to diagnose performance issues, memory leaks, and analyze runtime behavior in both development and production environments.
# Load Testing and Profiling Guide for Indexer Service

## Overview

Our project includes an integrated profiling system for the indexer services. The system supports multiple profiling methods through:
This guide explains how to perform load testing and profiling on the indexer service and tap-agent components. The system includes built-in load testing tools and various profiling methods to help identify performance bottlenecks and optimize the services.

Our project includes an integrated profiling system that supports multiple profiling methods through:

1. A custom `profiler` library (included in the workspace)
2. Docker-based profiling environments
3. Various third-party profiling tools

## Available Profiling Methods
## Important Note About Load Limits

### Built-in Profiler (pprof-based Flamegraphs)
⚠️ **Important**: The current indexer-service implementation has a built-in protection mechanism against high load from a single sender. When receiving too many receipts from the same server (approximately 1000 receipts), the sender will be marked as denied. This is a security feature to prevent abuse.

A Rust library that uses [pprof](https://crates.io/crates/pprof) to continuously profile the application and generate flamegraphs at specified intervals.
This solution was particularly suitable because tools like `perf`, while powerful, often pose configuration challenges or require specific capabilities (like CAP_SYS_ADMIN) that complicate their deployment within standard Docker containers.
## Load Testing

### Prerequisites

- **Configuration**: Set in code with the `setup_profiling` function
- **Activation**: Enabled via the `profiling` feature flag
- **Output**: Flamegraphs (SVG) and protobuf profiles in `/opt/profiling/{service-name}/`
1. Set up the local test network:

### External Profiling Tools
```bash
just setup
```

The profiling environment also supports the following tools:
2. Fund the escrow account (required for testing):

| Tool | Description | Output |
| ------------- | ---------------------------------------- | --------------------------------------------- |
| **strace** | Traces system calls with detailed timing | `/opt/profiling/{service-name}/strace.log` |
| **valgrind** | Memory profiling with Massif | `/opt/profiling/{service-name}/massif.out` |
| **callgrind** | CPU profiling (part of valgrind) | `/opt/profiling/{service-name}/callgrind.out` |
```bash
cd integration-tests
./fund_escrow.sh
```

## How to Use
### Running Load Tests

### Prerequisites
The integration tests include a dedicated load testing tool that communicates directly with the indexer-service (bypassing the gateway). This allows for more accurate performance measurements.

Run the setup command first to prepare the testing environment:
To run a load test:

```bash
just setup
# Run with 1000 receipts
cargo run -- load --num-receipts 1000

# Run with custom number of receipts
cargo run -- load --num-receipts <number>
```

### Profiling Commands
The load test will:

- Send the specified number of receipts concurrently
- Use all available CPU cores for concurrency
- Report success/failure rates
- Show average processing time per request
- Display total duration

### Understanding Load Test Results

The load test output includes:

- Total number of receipts processed
- Processing duration
- Average time per request
- Number of successful receipts
- Number of failed receipts

Example output:

```
Completed processing 1000 requests in 2.5s
Average time per request: 2.5ms
Successfully sent receipts: 998
Failed receipts: 2
```

## Profiling

### Available Profiling Methods

The system supports multiple profiling tools:

1. **Flamegraph** (Default)

- Visual representation of CPU usage
- Shows function call stacks
- Based on [pprof](https://crates.io/crates/pprof)
- Output: SVG files and protobuf profiles in `/opt/profiling/{service-name}/`

2. **Valgrind Massif**

- Memory profiling
- Tracks heap usage
- Output: `massif.out` files

3. **Callgrind**

- Detailed CPU profiling
- Cache and branch prediction analysis
- Output: `callgrind.out` files

4. **Strace**
- System call tracing
- Detailed timing information
- Output: `strace.log` files

### Running Profiling

Use the following commands to profile specific services:
1. Start profiling with your chosen tool:

```bash
# Profile with flamegraph (default)
# For flamegraph (default)
just profile-flamegraph

# Profile with valgrind
# For memory profiling
just profile-valgrind

# Profile with strace
# For system call tracing
just profile-strace

# Profile with callgrind
# For CPU profiling
just profile-callgrind
```

2. Run your load test while profiling:

```bash
cargo run -- load --num-receipts 1000
```

3. Stop profiling to generate results:

# Stop profiling (gracefully terminate to generate output)
```bash
just stop-profiling
```

# Restore normal service without profiling
4. Restore normal service:

```bash
just profile-restore
```

### Viewing Results
### Viewing Profiling Results

Profiling data is stored in:

- `contrib/profiling/indexer-service/`
- `contrib/profiling/tap-agent/`
- `contrib/profiling/indexer-service/` (for indexer-service)
- `contrib/profiling/tap-agent/` (for tap-agent)

#### Visualization Tools
#### Flamegraph Analysis

- **Flamegraphs**: Open the SVG files in any web browser
- **Callgrind**: Use `callgrind_annotate` or KCachegrind for visualization:
- Open the SVG files in a web browser
- Look for wide bars indicating high CPU usage
- Identify hot paths in the code

```bash
callgrind_annotate contrib/profiling/tap-agent/callgrind.out
```
#### Memory Profiling (Massif)

- **Massif**: Use `ms_print` to view memory profiling results:
```bash
ms_print contrib/profiling/indexer-service/massif.out
```

```bash
ms_print contrib/profiling/tap-agent/massif.out
```
#### CPU Profiling (Callgrind)

- **Protobuf Profiles**: View with Go pprof tools:
```bash
callgrind_annotate contrib/profiling/indexer-service/callgrind.out
```

```go
#### Protobuf Profiles

View with Go pprof tools:

```bash
# Install Go pprof tools if needed
go install github.com/google/pprof@latest

Expand All @@ -100,6 +178,44 @@ pprof -http=:8080 contrib/profiling/indexer-service/profile-*.pb
pprof -flamegraph contrib/profiling/indexer-service/profile-*.pb > custom_flamegraph.svg
```

## Best Practices

1. **Load Testing**

- Start with small numbers (100-500 receipts)
- Gradually increase load to find breaking points
- Monitor system resources during tests
- Be aware of the 1000 receipt limit per sender

2. **Profiling**

- Use flamegraph for general performance analysis
- Use Massif for memory leak detection
- Use Callgrind for detailed CPU optimization
- Use strace for system call bottlenecks
- For production use, prefer the built-in profiler over external tools to minimize performance impact

3. **Environment**
- Ensure clean state before testing
- Monitor system resources
- Check logs for errors
- Verify escrow funding before testing

## Troubleshooting

1. If load tests fail:

- Check escrow funding
- Verify service health
- Check logs for errors
- Ensure proper network setup

2. If profiling fails:
- Check container permissions
- Verify profiling tool installation
- Check available disk space
- Ensure proper service shutdown

## Implementation Details

### Profiler Integration
Expand Down Expand Up @@ -135,11 +251,16 @@ security_opt:
## Notes

- The flamegraph profiling is enabled whenever using any of the profiling commands through the Justfile, as the binaries are compiled with the `profiling` feature flag.
- For production use, prefer the built-in profiler over the external tools to minimize performance impact.
- Built-in profiler was chosen because tools like `perf`, while powerful, often pose configuration challenges or require specific capabilities (like CAP_SYS_ADMIN) that complicate their deployment within standard Docker containers.
- When using callgrind, consider enabling debug information and frame pointers in your Cargo.toml for better output:

```toml
[profile.release]
debug = true
force-frame-pointers = true
```

## Additional Resources

- [Developer Setup Guide](README.md)
- [Integration Tests](../integration-tests/testing_plan.md)
Loading