Skip to content

Commit 20c5c22

Browse files
authored
Merge pull request #116 from arielb1/design-docs
doc: add design docs, document running without the agent
2 parents 248eafa + 6675a18 commit 20c5c22

File tree

9 files changed

+409
-6
lines changed

9 files changed

+409
-6
lines changed

.github/workflows/build.yml

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,17 @@ jobs:
3838
- uses: Swatinem/rust-cache@v2
3939
- name: Build
4040
shell: bash
41-
run: cargo build --all-features --verbose --example simple
42-
- name: Upload artifact for testing
41+
run: cargo build --all-features --verbose --example simple --example pollcatch-without-agent
42+
- name: Upload example-simple for testing
4343
uses: actions/upload-artifact@v4
4444
with:
4545
name: example-simple
4646
path: ./target/debug/examples/simple
47+
- name: Upload example-pollcatch-without-agent for testing
48+
uses: actions/upload-artifact@v4
49+
with:
50+
name: example-pollcatch-without-agent
51+
path: ./target/debug/examples/pollcatch-without-agent
4752
build-decoder:
4853
name: Build Decoder
4954
runs-on: ubuntu-latest
@@ -86,11 +91,17 @@ jobs:
8691
with:
8792
name: example-simple
8893
path: ./tests
94+
- name: Download example-pollcatch-without-agent
95+
uses: actions/download-artifact@v4
96+
with:
97+
name: example-pollcatch-without-agent
98+
path: ./tests
8999
- name: Download async-profiler
90100
shell: bash
91101
working-directory: tests
92102
run: wget https://github.com/async-profiler/async-profiler/releases/download/v4.1/async-profiler-4.1-linux-x64.tar.gz -O async-profiler.tar.gz && tar xvf async-profiler.tar.gz && mv -vf async-profiler-*/lib/libasyncProfiler.so .
93103
- name: Run integration test
94104
shell: bash
95105
working-directory: tests
96-
run: chmod +x simple pollcatch-decoder && LD_LIBRARY_PATH=$PWD ./integration.sh && LD_LIBRARY_PATH=$PWD ./separate_runtime_integration.sh
106+
# The `ls -l` is there to help debugging CI, I see no reason to remove it
107+
run: ls -l && chmod +x simple pollcatch-without-agent pollcatch-decoder && LD_LIBRARY_PATH=$PWD ./integration.sh && LD_LIBRARY_PATH=$PWD ./separate_runtime_integration.sh && LD_LIBRARY_PATH=$PWD ./test_pollcatch_without_agent.sh

DESIGN.md

Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
# Design Document: async-profiler Rust Agent
2+
3+
## Overview
4+
5+
The async-profiler Rust agent is an in-process profiling library that integrates with [async-profiler](https://github.com/async-profiler/async-profiler) to collect performance data and upload it to various backends. The agent is designed to run continuously in production environments with minimal overhead.
6+
7+
For a more how-to-focused guide on running the profiler in various contexts, read the README.
8+
9+
This guide is based on an AI-driven summary, but it includes many comments from the development team.
10+
11+
This is a *design* document. It does not make stability promises and can change at any time.
12+
13+
## Architecture
14+
15+
The async-profiler agent runs as an agent within a Rust process and profiles it using [async-profiler].
16+
17+
async-profiler is loaded, currently the agent only supports loading a `libasyncProfiler.so` dynamically
18+
via [libloading], but in future versions it might also be possible to statically or plain-dynamically
19+
link against it.
20+
21+
The async-profiler configuration is controlled by the user, though only a limited set of configurations
22+
is made available to control the support burden.
23+
24+
The agent collects periodic profiling [JFR]s and sends them to a reporter, which uploads them to
25+
some location. The library supports a file-based reporter, that stores the JFRs on the filesystem,
26+
and S3-based reporters, which upload the JFRs from async-profiler after wrapping them into a
27+
`zip`. The library also allows users to implement their own reporters.
28+
29+
The agent can also perform autodetection of AWS IMDS metadata, which is passed to the reporter
30+
as an argument, and in the S3-based reporter, used to determine the name of the uploaded files.
31+
32+
In addition, the library includes a Tokio integration for pollcatch, which allows detecting
33+
long polls in Tokio applications. That integration uses the same `libasyncProfiler.so`
34+
as the rest of the agent but is otherwise independent.
35+
36+
[async-profiler]: https://github.com/async-profiler/async-profiler
37+
[libloading]: https://crates.io/crates/libloading
38+
[JFR]: https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/about.htm
39+
40+
## Code Architecture
41+
42+
The crate follows a modular architecture with clear separation of concerns:
43+
44+
```
45+
async-profiler-agent/
46+
├── src/
47+
│ ├── lib.rs # Public API and documentation
48+
│ ├── profiler.rs # Core profiler orchestration
49+
│ ├── asprof/ # async-profiler FFI bindings
50+
│ ├── metadata/ # Host and report metadata
51+
│ ├── pollcatch/ # Tokio poll time tracking
52+
│ └── reporter/ # Data upload backends
53+
├── examples/ # Sample applications
54+
├── decoder/ # JFR analysis tool
55+
└── tests/ # Integration tests
56+
```
57+
58+
## Core Modules
59+
60+
### 1. Profiler (`profiler`)
61+
62+
**Purpose**: Central orchestration of profiling lifecycle and data collection.
63+
64+
**Key Components**:
65+
- `Profiler` & `ProfilerBuilder`: Main entry point for starting profiling
66+
- `ProfilerOptions`: Profiling behavior configuration
67+
- `RunningProfiler`: Handle for controlling active profiler
68+
- `ProfilerEngine` trait: used to allow mocking async-profiler (the C library) during tests
69+
70+
#### Profiler lifecycle management
71+
72+
As of async-profiler version 4.1, async-profiler does not have a mode where it can run continuously
73+
with bounded memory usage and periodically collect samples.
74+
75+
Therefore, every [`reporting_interval`] seconds, the async-profiler agent restarts async-profiler by sending a `stop` (which flushes the JFR file) and `start` commands.
76+
77+
This is managed by `Profiler` (see the [`profiler_tick`] function).
78+
79+
This is a supported async-profiler operation mode.
80+
81+
[`reporting_interval`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerBuilder.html#method.with_reporting_interval
82+
[`profiler_tick`]: https://github.com/async-profiler/rust-agent/blob/506718fff274b49cf2eb03305a4f9547b61720e3/src/profiler.rs#L1083
83+
84+
#### Agent lifecycle management
85+
86+
The async-profiler agent can be stopped and started at run-time.
87+
88+
When stopped, the async-profiler agent stops async-profiler, flushes the last profile to the reporter, and then
89+
releases the stop handle from waiting. After the stop is done, it is possible to start a different instance of
90+
the async-profiler agent on the same process.
91+
92+
The start/stop functionality is useful for several purposes:
93+
94+
1. "Chicken bit" stopping of the profiler if it causes application issues.
95+
2. Stopping and starting a profiler with new configuration.
96+
3. Stopping the profiler and uploading the last sample before application exit.
97+
98+
The profiler intentionally does *not* automatically flush the last profile on `Drop`. This is because
99+
reporters can take an arbitrary amount of time to finish, and slowing an application on exit is likely
100+
to be a worse default than missing some profiling samples.
101+
102+
#### Profiler configuration
103+
104+
async-profiler is configured via [`ProfilerOptions`] and [`ProfilerOptionsBuilder`]. You
105+
should read these docs along with the [async-profiler options docs], for more details.
106+
107+
[`ProfilerOptions`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerOptionsBuilder.html
108+
[`ProfilerOptionsBuilder`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerOptionsBuilder.html
109+
[async-profiler options docs]: https://github.com/async-profiler/async-profiler/blob/v4.0/docs/ProfilerOptions.md
110+
111+
#### JFR file rotation
112+
113+
async-profiler expects to be writing the current JFR to a "fresh" file path. To that
114+
effect, async-profiler creates 2 unnamed temporary files via `JfrFile`, and gives to
115+
async-profiler alternating paths of the form `/proc/self/fd/<N>` to write the
116+
JFRs into.
117+
118+
### 2. async-profiler FFI (`asprof`)
119+
120+
**Purpose**: Safe Rust bindings to the native async-profiler library.
121+
122+
**Key Components**:
123+
- `AsProf`: Safe interface to async-profiler
124+
- `raw`: Low-level FFI declarations
125+
126+
**Responsibilities**:
127+
- Dynamic loading of `libasyncProfiler.so` using [`libloading`]
128+
- Safe, Rust-native wrappers around C API calls
129+
130+
[libloading]: https://crates.io/crates/libloading
131+
132+
### 3. Metadata (`metadata/`)
133+
134+
**Purpose**: Host identification and report context information.
135+
136+
**Key Components**:
137+
- `AgentMetadata`: Host identification (EC2, Fargate, or generic)
138+
- `aws`: AWS-specific metadata autodetection via IMDS
139+
140+
The metadata is sent to the [`Reporter`] implementation, and can be used to
141+
identify the host that generated a particular profiling report. In the local reporter,
142+
it is ignored. In the S3 reporter, it is used to determine the uploaded file name.
143+
144+
### 4. Reporters (`reporter/`)
145+
146+
**Purpose**: Pluggable backends for uploading profiling data.
147+
148+
**Key Components**:
149+
- [`Reporter`] trait: Common interface for all backends
150+
- [`LocalReporter`]: Filesystem output for development/testing
151+
- [`S3Reporter`]: AWS S3 upload with metadata
152+
- [`MultiReporter`]: Composition of multiple reporters
153+
154+
[`Reporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/trait.Reporter.html
155+
[`LocalReporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/local/struct.LocalReporter.html
156+
[`S3Reporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/s3/struct.S3Reporter.html
157+
[`MultiReporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/multi/struct.MultiReporter.html
158+
159+
The reporter trait is as follows:
160+
161+
```rust
162+
#[async_trait]
163+
pub trait Reporter: fmt::Debug {
164+
async fn report(
165+
&self,
166+
jfr: Vec<u8>,
167+
metadata: &ReportMetadata,
168+
) -> Result<(), Box<dyn std::error::Error + Send>>;
169+
}
170+
```
171+
172+
Customers whose needs are not suited by the built-in reporters might write their
173+
own reporters.
174+
175+
### 5. PollCatch (`pollcatch/`)
176+
177+
**Purpose**: Tokio-specific instrumentation for detecting long poll times.
178+
179+
**Key Components**:
180+
- `before_poll_hook()`: Pre-poll timestamp capture
181+
- `after_poll_hook()`: Post-poll analysis and reporting
182+
- `tsc.rs`: CPU timestamp counter utilities, works on x86 and ARM
183+
184+
The idea of pollcatch is that if a wall-clock profiling event happens in the middle of a Tokio poll,
185+
when that Tokio poll *ends*, a `tokio.PollCatchV1` event is emitted that contains the start and
186+
end times of that poll, and therefore it is possible to correlate long polls with stack traces
187+
that happen within them.
188+
189+
The way this is done is that `before_poll_hook` saves the before timestamp and the async-profiler
190+
`sample_counter` from `asprof_thread_local_data` into a (private) thread-local variable, and
191+
`after_poll_hook` checks if the `sample_counter` changes, emits a `tokio.PollCatchV1` event
192+
containing the stored before-timestamp and the current timestamp as an after-timestamp.
193+
194+
By emitting only 1 `tokio.PollCatchV1` event per wall-clock profiling event, the pollcatch profiling overhead
195+
is kept bounded and low.
196+
197+
By only emitting the event at the `after_poll_hook`, which is normally run as a Tokio after-poll hook,
198+
the event is basically emitted "at the Tokio main loop", in a context where "no locks are held" and
199+
is outside of a signal handler.
200+
201+
The `tokio.PollCatchV1` event contains the following payload:
202+
203+
```rust
204+
before_timestamp: LittleEndianU64,
205+
after_timestamp: LittleEndianU64,
206+
```
207+
208+
Where both timestamps come from the TSC. The pollcatch decoder uses the fact that the async-profiler profiling samples
209+
contain a clock which uses the same TSC to correlate profiling samples corresponding to a single Tokio poll (though
210+
normally, since the wall-clock interval is normally 1/second, unless a Tokio poll is *horribly* slow it will bracket at
211+
most a single sample) - and in addition, to determine how long that particular poll is by observing the
212+
difference between the timestamps.
213+
214+
## The decoder (`decoder/`)
215+
216+
The decoder is a JFR decoder using `jfrs` that can decode the JFRs from async-profiler and display pollcatch
217+
metadata in a nice format.
218+
219+
The decoder implementation is quite ugly currently.
220+
221+
## Data Flow
222+
223+
1. **Initialization**: Profiler loads `libasyncProfiler.so` and initializes
224+
2. **Session Start**: Creates temporary JFR files and starts async-profiler
225+
3. **Continuous Profiling**: async-profiler collects samples to active JFR file
226+
4. **Periodic Reporting**:
227+
- Stop profiler and rotate JFR files
228+
- Read completed JFR data
229+
- Package with metadata
230+
- Upload via configured reporters
231+
- Restart profiler with new JFR file
232+
5. **Shutdown**: Stop profiler and perform final report
233+
234+
## Feature Flags
235+
236+
All AWS dependencies are optional and only enabled if an AWS feature flag is passed.
237+
238+
In addition, for every AWS feature flag, there is an "X-no-defaults" version of that flag
239+
that does not enable default features for the AWS libraries.
240+
241+
The main reason for this design is that the AWS SDK needs to have a selected TLS backend
242+
in order to connect to https services, but users might want to enable a backend other
243+
than the default one and not have the default backend linked in to their executable.
244+
245+
- `s3`: Full S3 reporter with default AWS SDK features
246+
- `s3-no-defaults`: S3 reporter without default features (for custom TLS)
247+
- `aws-metadata`: AWS metadata detection with default features
248+
- `aws-metadata-no-defaults`: AWS metadata without default features
249+
- `__unstable-fargate-cpu-count`: Experimental Fargate CPU metrics

README.md

Lines changed: 68 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ The S3 reporter uploads each report in a `zip` file, that currently contains 2 f
5050
2. metadata as `metadata.json`, in format `reporter::s3::MetadataJson`.
5151

5252
The `zip` file is uploaded to the bucket under the path `profile_{profiling_group_name}_{machine}_{pid}_{time}.zip`,
53-
where `{machine}` is either `ec2_{ec2_instance_id}_`, `ecs_{cluster_arn}_{task_arn}`, or `onprem__`.
53+
where `{machine}` is either `ec2_{ec2_instance_id}_`, `ecs_{cluster_arn}_{task_arn}`, or `unknown__`.
5454

5555
In addition to the S3 reporter, `async-profiler-agent` also includes `LocalReporter` that writes to a directory, and a `MultiReporter` that allows combining reporters. You can also write your own reporter (via the `Reporter` trait) to upload the profile results to your favorite profiler backend.
5656

@@ -105,6 +105,46 @@ Memory samples are not enabled by default, but can be enabled by [`with_native_m
105105
[`ProfilerOptionsBuilder`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerOptionsBuilder.html
106106
[`with_native_mem_bytes`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerOptionsBuilder.html#method.with_native_mem_bytes
107107

108+
### Non-standard runtime configurations
109+
110+
The profiler always profiles an entire process. If a program has multiple Tokio (or non-Tokio, or non-Rust)
111+
runtimes, it will profile all of them mostly without problems.
112+
113+
Even mixing native code and JVM in the same process works, no matter whether
114+
async-profiler is started from Rust or Java, though there are occasionally bugs
115+
there - if you are mixing native code and JVM and are encountering weird
116+
problems, you should [report an issue to async-profiler].
117+
118+
Since async-profiler uses process-global resources such as signal handlers, a
119+
process can only have one active instance of async-profiler at a time. This
120+
applies across languages as well - if you have both native code and JVM code in
121+
your process, only one of them should be starting async-profiler.
122+
123+
The most-often used [`Profiler::spawn`] and [`Profiler::spawn_controllable`] functions assume that they are run within
124+
a Tokio runtime. The S3 reporter performs AWS SDK calls within that runtime, and therefore it
125+
assumes that the runtime is appropriate for performing AWS SDK calls. Most Tokio applications should just
126+
spawn async-profiler within their (primary and only) Tokio runtime.
127+
128+
Some services have especially strict (tens-of-microseconds or less) latency requirements, and therefore have
129+
a "data plane" (either non-Tokio or weirdly-configured Tokio) runtime, and in addition to that a
130+
normally-configured (or normally-configured but high-[niceness]) "control plane" runtime that
131+
is suitable for performing AWS SDK calls. In these cases, it makes sense to spawn the async-profiler agent
132+
within the "control plane" runtime.
133+
134+
Other applications just don't use Tokio for their main code. These applications can use
135+
[`Profiler::spawn_thread`] and its variants to spawn async-profiler in a separate thread
136+
that will come with a Tokio runtime managed by `async-profiler-agent`.
137+
138+
In all of these cases, the [pollcatch](#pollcatch) hooks should be enabled on the
139+
runtime where you *intend to be catching long polls on* - presumably your data-plane runtime. They do not
140+
introduce much overhead or unpredictable latency.
141+
142+
[report an issue to async-profiler]: https://github.com/async-profiler/async-profiler/issues
143+
[`Profiler::spawn`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.Profiler.html#method.spawn
144+
[`Profiler::spawn_controllable`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.Profiler.html#method.spawn_controllable
145+
[`Profiler::spawn_thread`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.Profiler.html#method.spawn_thread
146+
[niceness]: https://linux.die.net/man/2/nice
147+
108148
### PollCatch
109149

110150
If you want to find long poll times, and you have `RUSTFLAGS="--cfg tokio_unstable"`, you can
@@ -123,6 +163,33 @@ If you can't use `tokio_unstable`, it is possible to wrap your tasks by instrume
123163
runs the risk of forgetting to instrument the task that is actually causing the high latency,
124164
and therefore it is strongly recommended to use `on_before_task_poll`/`on_after_task_poll`.
125165

166+
#### Using pollcatch without the agent
167+
168+
The recommended way of using async-profiler-agent is via async-profiler-agent's agent. However, in case your
169+
application is already integrated with some other mechanism that calls `async-profiler`, the
170+
`on_before_task_poll`/`on_after_task_poll` hooks just call the async-profiler [JFR Event API]. They can be used
171+
even if async-profiler is run via a mechanism different from the async-profiler Rust agent (for example, a
172+
Java-native async-profiler integration), though currently, the results from the JFR Event API are only exposed in
173+
async-profiler's JFR-format output mode.
174+
175+
You can see the `test_pollcatch_without_agent.sh` for an example that uses pollcatch with just async-profiler's
176+
`LD_PRELOAD` mode.
177+
178+
However, in that case, it is only needed that the pollcatch hooks refer to the same `libasyncProfiler.so` that is
179+
being used as a profiler, since the JFR Event API is based on global variables that must match. async-profiler-agent
180+
uses [libloading] which uses [dlopen(3)] (currently passing [`RTLD_LOCAL | RTLD_LAZY`][libloadingflags]), which
181+
performs [deduplication based on inode]. Therefore, if your system only has a single `libasyncProfiler.so`
182+
on the search path, it will be shared and pollcatch will work.
183+
184+
The async-profiler-agent crate currently does not expose the JFR Event API to users, due to stability
185+
reasons. As a user, using `libloading` to open `libasyncProfiler.so` and calling the API yourself
186+
will work, but if you have a use case for the JFR Event API, consider opening an issue.
187+
188+
[deduplication based on inode]: https://stackoverflow.com/questions/45954861/how-to-circumvent-dlopen-caching/45955035#45955035
189+
[JFR Event API]: https://github.com/async-profiler/async-profiler/blob/master/src/asprof.h#L99
190+
[libloading]: https://crates.io/crates/libloading
191+
[libloadingflags]: https://docs.rs/libloading/latest/libloading/os/unix/struct.Library.html#method.new
192+
126193
### Not enabling the AWS SDK / Reqwest default features
127194

128195
The `aws-metadata-no-defaults` and `s3-no-defaults` feature flags do not enable feature flags for the AWS SDK and `reqwest`.

0 commit comments

Comments
 (0)