Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,17 @@ jobs:
- uses: Swatinem/rust-cache@v2
- name: Build
shell: bash
run: cargo build --all-features --verbose --example simple
- name: Upload artifact for testing
run: cargo build --all-features --verbose --example simple --example pollcatch-without-agent
- name: Upload example-simple for testing
uses: actions/upload-artifact@v4
with:
name: example-simple
path: ./target/debug/examples/simple
- name: Upload example-pollcatch-without-agent for testing
uses: actions/upload-artifact@v4
with:
name: example-pollcatch-without-agent
path: ./target/debug/examples/pollcatch-without-agent
build-decoder:
name: Build Decoder
runs-on: ubuntu-latest
Expand Down Expand Up @@ -86,11 +91,17 @@ jobs:
with:
name: example-simple
path: ./tests
- name: Download example-pollcatch-without-agent
uses: actions/download-artifact@v4
with:
name: example-pollcatch-without-agent
path: ./tests
- name: Download async-profiler
shell: bash
working-directory: tests
run: wget https://github.com/async-profiler/async-profiler/releases/download/v4.1/async-profiler-4.1-linux-x64.tar.gz -O async-profiler.tar.gz && tar xvf async-profiler.tar.gz && mv -vf async-profiler-*/lib/libasyncProfiler.so .
- name: Run integration test
shell: bash
working-directory: tests
run: chmod +x simple pollcatch-decoder && LD_LIBRARY_PATH=$PWD ./integration.sh && LD_LIBRARY_PATH=$PWD ./separate_runtime_integration.sh
# The `ls -l` is there to help debugging CI, I see no reason to remove it
run: ls -l && chmod +x simple pollcatch-without-agent pollcatch-decoder && LD_LIBRARY_PATH=$PWD ./integration.sh && LD_LIBRARY_PATH=$PWD ./separate_runtime_integration.sh && LD_LIBRARY_PATH=$PWD ./test_pollcatch_without_agent.sh
249 changes: 249 additions & 0 deletions DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
# Design Document: async-profiler Rust Agent

## Overview

The async-profiler Rust agent is an in-process profiling library that integrates with [async-profiler](https://github.com/async-profiler/async-profiler) to collect performance data and upload it to various backends. The agent is designed to run continuously in production environments with minimal overhead.

For a more how-to-focused guide on running the profiler in various contexts, read the README.

This guide is based on an AI-driven summary, but it includes many comments from the development team.

This is a *design* document. It does not make stability promises and can change at any time.

## Architecture

The async-profiler agent runs as an agent within a Rust process and profiles it using [async-profiler].

async-profiler is loaded, currently the agent only supports loading a `libasyncProfiler.so` dynamically
via [libloading], but in future versions it might also be possible to statically or plain-dynamically
link against it.

The async-profiler configuration is controlled by the user, though only a limited set of configurations
is made available to control the support burden.

The agent collects periodic profiling [JFR]s and sends them to a reporter, which uploads them to
some location. The library supports a file-based reporter, that stores the JFRs on the filesystem,
and S3-based reporters, which upload the JFRs from async-profiler after wrapping them into a
`zip`. The library also allows users to implement their own reporters.

The agent can also perform autodetection of AWS IMDS metadata, which is passed to the reporter
as an argument, and in the S3-based reporter, used to determine the name of the uploaded files.

In addition, the library includes a Tokio integration for pollcatch, which allows detecting
long polls in Tokio applications. That integration uses the same `libasyncProfiler.so`
as the rest of the agent but is otherwise independent.

[async-profiler]: https://github.com/async-profiler/async-profiler
[libloading]: https://crates.io/crates/libloading
[JFR]: https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/about.htm

## Code Architecture

The crate follows a modular architecture with clear separation of concerns:

```
async-profiler-agent/
├── src/
│ ├── lib.rs # Public API and documentation
│ ├── profiler.rs # Core profiler orchestration
│ ├── asprof/ # async-profiler FFI bindings
│ ├── metadata/ # Host and report metadata
│ ├── pollcatch/ # Tokio poll time tracking
│ └── reporter/ # Data upload backends
├── examples/ # Sample applications
├── decoder/ # JFR analysis tool
└── tests/ # Integration tests
```

## Core Modules

### 1. Profiler (`profiler`)

**Purpose**: Central orchestration of profiling lifecycle and data collection.

**Key Components**:
- `Profiler` & `ProfilerBuilder`: Main entry point for starting profiling
- `ProfilerOptions`: Profiling behavior configuration
- `RunningProfiler`: Handle for controlling active profiler
- `ProfilerEngine` trait: used to allow mocking async-profiler (the C library) during tests

#### Profiler lifecycle management

As of async-profiler version 4.1, async-profiler does not have a mode where it can run continuously
with bounded memory usage and periodically collect samples.

Therefore, every [`reporting_interval`] seconds, the async-profiler agent restarts async-profiler by sending a `stop` (which flushes the JFR file) and `start` commands.

This is managed by `Profiler` (see the [`profiler_tick`] function).

This is a supported async-profiler operation mode.

[`reporting_interval`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerBuilder.html#method.with_reporting_interval
[`profiler_tick`]: https://github.com/async-profiler/rust-agent/blob/506718fff274b49cf2eb03305a4f9547b61720e3/src/profiler.rs#L1083

#### Agent lifecycle management

The async-profiler agent can be stopped and started at run-time.

When stopped, the async-profiler agent stops async-profiler, flushes the last profile to the reporter, and then
releases the stop handle from waiting. After the stop is done, it is possible to start a different instance of
the async-profiler agent on the same process.

The start/stop functionality is useful for several purposes:

1. "Chicken bit" stopping of the profiler if it causes application issues.
2. Stopping and starting a profiler with new configuration.
3. Stopping the profiler and uploading the last sample before application exit.

The profiler intentionally does *not* automatically flush the last profile on `Drop`. This is because
reporters can take an arbitrary amount of time to finish, and slowing an application on exit is likely
to be a worse default than missing some profiling samples.

#### Profiler configuration

async-profiler is configured via [`ProfilerOptions`] and [`ProfilerOptionsBuilder`]. You
should read these docs along with the [async-profiler options docs], for more details.

[`ProfilerOptions`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerOptionsBuilder.html
[`ProfilerOptionsBuilder`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerOptionsBuilder.html
[async-profiler options docs]: https://github.com/async-profiler/async-profiler/blob/v4.0/docs/ProfilerOptions.md

#### JFR file rotation

async-profiler expects to be writing the current JFR to a "fresh" file path. To that
effect, async-profiler creates 2 unnamed temporary files via `JfrFile`, and gives to
async-profiler alternating paths of the form `/proc/self/fd/<N>` to write the
JFRs into.

### 2. async-profiler FFI (`asprof`)

**Purpose**: Safe Rust bindings to the native async-profiler library.

**Key Components**:
- `AsProf`: Safe interface to async-profiler
- `raw`: Low-level FFI declarations

**Responsibilities**:
- Dynamic loading of `libasyncProfiler.so` using [`libloading`]
- Safe, Rust-native wrappers around C API calls

[libloading]: https://crates.io/crates/libloading

### 3. Metadata (`metadata/`)

**Purpose**: Host identification and report context information.

**Key Components**:
- `AgentMetadata`: Host identification (EC2, Fargate, or generic)
- `aws`: AWS-specific metadata autodetection via IMDS

The metadata is sent to the [`Reporter`] implementation, and can be used to
identify the host that generated a particular profiling report. In the local reporter,
it is ignored. In the S3 reporter, it is used to determine the uploaded file name.

### 4. Reporters (`reporter/`)

**Purpose**: Pluggable backends for uploading profiling data.

**Key Components**:
- [`Reporter`] trait: Common interface for all backends
- [`LocalReporter`]: Filesystem output for development/testing
- [`S3Reporter`]: AWS S3 upload with metadata
- [`MultiReporter`]: Composition of multiple reporters

[`Reporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/trait.Reporter.html
[`LocalReporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/local/struct.LocalReporter.html
[`S3Reporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/s3/struct.S3Reporter.html
[`MultiReporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/multi/struct.MultiReporter.html

The reporter trait is as follows:

```rust
#[async_trait]
pub trait Reporter: fmt::Debug {
async fn report(
&self,
jfr: Vec<u8>,
metadata: &ReportMetadata,
) -> Result<(), Box<dyn std::error::Error + Send>>;
}
```

Customers whose needs are not suited by the built-in reporters might write their
own reporters.

### 5. PollCatch (`pollcatch/`)

**Purpose**: Tokio-specific instrumentation for detecting long poll times.

**Key Components**:
- `before_poll_hook()`: Pre-poll timestamp capture
- `after_poll_hook()`: Post-poll analysis and reporting
- `tsc.rs`: CPU timestamp counter utilities, works on x86 and ARM

The idea of pollcatch is that if a wall-clock profiling event happens in the middle of a Tokio poll,
when that Tokio poll *ends*, a `tokio.PollCatchV1` event is emitted that contains the start and
end times of that poll, and therefore it is possible to correlate long polls with stack traces
that happen within them.

The way this is done is that `before_poll_hook` saves the before timestamp and the async-profiler
`sample_counter` from `asprof_thread_local_data` into a (private) thread-local variable, and
`after_poll_hook` checks if the `sample_counter` changes, emits a `tokio.PollCatchV1` event
containing the stored before-timestamp and the current timestamp as an after-timestamp.

By emitting only 1 `tokio.PollCatchV1` event per wall-clock profiling event, the pollcatch profiling overhead
is kept bounded and low.

By only emitting the event at the `after_poll_hook`, which is normally run as a Tokio after-poll hook,
the event is basically emitted "at the Tokio main loop", in a context where "no locks are held" and
is outside of a signal handler.

The `tokio.PollCatchV1` event contains the following payload:

```rust
before_timestamp: LittleEndianU64,
after_timestamp: LittleEndianU64,
```

Where both timestamps come from the TSC. The pollcatch decoder uses the fact that the async-profiler profiling samples
contain a clock which uses the same TSC to correlate profiling samples corresponding to a single Tokio poll (though
normally, since the wall-clock interval is normally 1/second, unless a Tokio poll is *horribly* slow it will bracket at
most a single sample) - and in addition, to determine how long that particular poll is by observing the
difference between the timestamps.

## The decoder (`decoder/`)

The decoder is a JFR decoder using `jfrs` that can decode the JFRs from async-profiler and display pollcatch
metadata in a nice format.

The decoder implementation is quite ugly currently.

## Data Flow

1. **Initialization**: Profiler loads `libasyncProfiler.so` and initializes
2. **Session Start**: Creates temporary JFR files and starts async-profiler
3. **Continuous Profiling**: async-profiler collects samples to active JFR file
4. **Periodic Reporting**:
- Stop profiler and rotate JFR files
- Read completed JFR data
- Package with metadata
- Upload via configured reporters
- Restart profiler with new JFR file
5. **Shutdown**: Stop profiler and perform final report

## Feature Flags

All AWS dependencies are optional and only enabled if an AWS feature flag is passed.

In addition, for every AWS feature flag, there is an "X-no-defaults" version of that flag
that does not enable default features for the AWS libraries.

The main reason for this design is that the AWS SDK needs to have a selected TLS backend
in order to connect to https services, but users might want to enable a backend other
than the default one and not have the default backend linked in to their executable.

- `s3`: Full S3 reporter with default AWS SDK features
- `s3-no-defaults`: S3 reporter without default features (for custom TLS)
- `aws-metadata`: AWS metadata detection with default features
- `aws-metadata-no-defaults`: AWS metadata without default features
- `__unstable-fargate-cpu-count`: Experimental Fargate CPU metrics
58 changes: 57 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ The S3 reporter uploads each report in a `zip` file, that currently contains 2 f
2. metadata as `metadata.json`, in format `reporter::s3::MetadataJson`.

The `zip` file is uploaded to the bucket under the path `profile_{profiling_group_name}_{machine}_{pid}_{time}.zip`,
where `{machine}` is either `ec2_{ec2_instance_id}_`, `ecs_{cluster_arn}_{task_arn}`, or `onprem__`.
where `{machine}` is either `ec2_{ec2_instance_id}_`, `ecs_{cluster_arn}_{task_arn}`, or `unknown__`.

In addition to the S3 reporter, `async-profiler-agent` also includes `LocalReporter` that writes to a directory, and a `MultiReporter` that allows combining reporters. You can also write your own reporter (via the `Reporter` trait) to upload the profile results to your favorite profiler backend.

Expand Down Expand Up @@ -105,6 +105,35 @@ Memory samples are not enabled by default, but can be enabled by [`with_native_m
[`ProfilerOptionsBuilder`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerOptionsBuilder.html
[`with_native_mem_bytes`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerOptionsBuilder.html#method.with_native_mem_bytes

### Non-standard runtime configurations

The profiler always profiles a process-at-once. If a program has multiple Tokio (or non-Tokio, or non-Rust)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process-at-once is a little confusing. do you mean "an entire process"?

runtimes, it will profile all of them without problems.

The most-often used [`Profiler::spawn`] and [`Profiler::spawn_controllable`] functions assume that they are run within
a Tokio runtime. The S3 reporter performs AWS SDK calls within that runtime, and therefore it
assumes that the runtime is appropriate for performing AWS SDK calls. Most Tokio applications should just
spawn async-profiler within their (primary and only) Tokio runtime.

Some services have especially strict (tens-of-microseconds or less) latency requirements, and therefore have
a "data plane" (either non-Tokio or weirdly-configured Tokio) runtime, and in addition to that a
normally-configured (or normally-configured but high-[niceness]) "control plane" runtime that
is suitable for performing AWS SDK calls. In these cases, it makes sense to spawn the async-profiler agent
within the "control plane" runtime.

Other applications just don't use Tokio for their main code. These applications can use
[`Profiler::spawn_thread`] and its variants to spawn async-profiler in a separate thread
that will come with a Tokio runtime managed by `async-profiler-agent`.

In all of these cases, the [pollcatch](#pollcatch) hooks should be enabled on the
runtime where you *intend to be catching long polls on* - presumably your data-plane runtime. They do not
introduce much overhead or unpredictable latency.

[`Profiler::spawn`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.Profiler.html#method.spawn
[`Profiler::spawn_controllable`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.Profiler.html#method.spawn_controllable
[`Profiler::spawn_thread`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.Profiler.html#method.spawn_thread
[niceness]: https://linux.die.net/man/2/nice

### PollCatch

If you want to find long poll times, and you have `RUSTFLAGS="--cfg tokio_unstable"`, you can
Expand All @@ -123,6 +152,33 @@ If you can't use `tokio_unstable`, it is possible to wrap your tasks by instrume
runs the risk of forgetting to instrument the task that is actually causing the high latency,
and therefore it is strongly recommended to use `on_before_task_poll`/`on_after_task_poll`.

#### Using pollcatch without the agent

The recommended way of using async-profiler-agent is via async-profiler-agent's agent. However, in case your
application is already integrated with some other mechanism that calls `async-profiler`, the
`on_before_task_poll`/`on_after_task_poll` hooks just call the async-profiler [JFR Event API]. They can be used
even if async-profiler is run via a mechanism different from the async-profiler Rust agent (for example, a
Java-native async-profiler integration), though currently, the results from the JFR Event API are only exposed in
async-profiler's JFR-format output mode.

You can see the `test_pollcatch_without_agent.sh` for an example that uses pollcatch with just async-profiler's
`LD_PRELOAD` mode.

However, in that case, it is only needed that the pollcatch hooks refer to the same `libasyncProfiler.so` that is
being used as a profiler, since the JFR Event API is based on global variables that must match. async-profiler-agent
uses [libloading] which uses [dlopen(3)] (currently passing [`RTLD_LOCAL | RTLD_LAZY`][libloadingflags]), which
performs [deduplication based on inode]. Therefore, if your system only has a single `libasyncProfiler.so`
on the search path, it will be shared and pollcatch will work.

The async-profiler-agent crate currently does not expose the JFR Event API to users, due to stability
reasons. As a user, using `libloading` to open `libasyncProfiler.so` and calling the API yourself
will work, but if you have a use case for the JFR Event API, consider opening an issue.

[deduplication based on inode]: https://stackoverflow.com/questions/45954861/how-to-circumvent-dlopen-caching/45955035#45955035
[JFR Event API]: https://github.com/async-profiler/async-profiler/blob/master/src/asprof.h#L99
[libloading]: https://crates.io/crates/libloading
[libloadingflags]: https://docs.rs/libloading/latest/libloading/os/unix/struct.Library.html#method.new

### Not enabling the AWS SDK / Reqwest default features

The `aws-metadata-no-defaults` and `s3-no-defaults` feature flags do not enable feature flags for the AWS SDK and `reqwest`.
Expand Down
18 changes: 18 additions & 0 deletions examples/pollcatch-without-agent.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
extern crate async_profiler_agent;
use std::time::{Duration, Instant};

// Simple test without a Tokio runtime, to just have an integration
// test of the pollcatch hooks on async-profiler without involving
// Tokio

fn main() {
let start = Instant::now();
while start.elapsed() < Duration::from_secs(1) {
async_profiler_agent::pollcatch::before_poll_hook();
let mid = Instant::now();
while mid.elapsed() < Duration::from_millis(10) {
// spin, there will be a profiler sample here
}
async_profiler_agent::pollcatch::after_poll_hook();
}
}
Loading
Loading