doc: add design docs, document running without the agent

Ariel Ben-Yehuda · Ariel Ben-Yehuda · commit 32e9bed2582d · 2025-12-14T16:04:54.000Z
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -38,12 +38,17 @@ jobs:
       - uses: Swatinem/rust-cache@v2
       - name: Build
         shell: bash
-        run: cargo build --all-features --verbose --example simple
-      - name: Upload artifact for testing
+        run: cargo build --all-features --verbose --example simple --example pollcatch-without-agent
+      - name: Upload example-simple for testing
         uses: actions/upload-artifact@v4
         with:
           name: example-simple
           path: ./target/debug/examples/simple
+      - name: Upload example-pollcatch-without-agent for testing
+        uses: actions/upload-artifact@v4
+        with:
+          name: example-pollcatch-without-agent
+          path: ./target/debug/examples/pollcatch-without-agent
   build-decoder:
     name: Build Decoder
     runs-on: ubuntu-latest
@@ -86,11 +91,16 @@ jobs:
         with:
           name: example-simple
           path: ./tests
+      - name: Download example-pollcatch-without-agent
+        uses: actions/download-artifact@v4
+        with:
+          name: example-pollcatch-without-agent
+          path: ./tests
       - name: Download async-profiler
         shell: bash
         working-directory: tests
         run: wget https://github.com/async-profiler/async-profiler/releases/download/v4.1/async-profiler-4.1-linux-x64.tar.gz -O async-profiler.tar.gz && tar xvf async-profiler.tar.gz && mv -vf async-profiler-*/lib/libasyncProfiler.so .
       - name: Run integration test
         shell: bash
         working-directory: tests
-        run: chmod +x simple pollcatch-decoder && LD_LIBRARY_PATH=$PWD ./integration.sh && LD_LIBRARY_PATH=$PWD ./separate_runtime_integration.sh
+        run: ls -l && chmod +x simple pollcatch-without-agent pollcatch-decoder && LD_LIBRARY_PATH=$PWD ./integration.sh && LD_LIBRARY_PATH=$PWD ./separate_runtime_integration.sh && LD_LIBRARY_PATH=$PWD ./test_pollcatch_without_agent.sh
diff --git a/DESIGN.md b/DESIGN.md
@@ -0,0 +1,219 @@
+# Design Document: async-profiler Rust Agent
+
+## Overview
+
+The async-profiler Rust agent is an in-process profiling library that integrates with [async-profiler](https://github.com/async-profiler/async-profiler) to collect performance data and upload it to various backends. The agent is designed to run continuously in production environments with minimal overhead.
+
+For a more how-to-focused guide on running the profiler in various contexts, read the README.
+
+This guide is based on an AI-driven summary, but it includes many comments from the development team.
+
+## Architecture
+
+The async-profiler agent runs as an agent within a Rust process and profiles it using [async-profiler].
+
+async-profiler is loaded, currently the agent only supports loading a `libasyncProfiler.so` dynamically
+via [libloading], but in future versions it might also be possible to statically or plain-dynamically
+link against it.
+
+[async-profiler]: https://github.com/async-profiler/async-profiler
+[libloading]: https://crates.io/crates/libloading
+
+## Code Architecture
+
+The crate follows a modular architecture with clear separation of concerns:
+
+```
+async-profiler-agent/
+├── src/
+│   ├── lib.rs              # Public API and documentation
+│   ├── profiler.rs         # Core profiler orchestration
+│   ├── asprof/             # async-profiler FFI bindings
+│   ├── metadata/           # Host and report metadata
+│   ├── pollcatch/          # Tokio poll time tracking
+│   └── reporter/           # Data upload backends
+├── examples/               # Sample applications
+├── decoder/                # JFR analysis tool
+└── tests/                  # Integration tests
+```
+
+## Core Modules
+
+### 1. Profiler (`profiler`)
+
+**Purpose**: Central orchestration of profiling lifecycle and data collection.
+
+**Key Components**:
+- `Profiler` & `ProfilerBuilder`: Main entry point for starting profiling
+- `ProfilerOptions`: Profiling behavior configuration
+- `RunningProfiler`: Handle for controlling active profiler
+- `ProfilerEngine` trait: used to allow mocking async-profiler (the C library) during tests
+
+#### Profiler lifecycle management
+
+As of version 4.1, async-profiler does not have a mode where it can run continuously
+with bounded memory usage and periodically collect samples.
+
+Therefore, every [`reporting_interval`] seconds, the async-profiler agent restarts async-profiler by sending a `stop` (which flushes the JFR file) and `start` commands.
+
+This is managed by `Profiler` (see the [`profiler_tick`] function).
+
+This is a supported async-profiler operation mode.
+
+[`reporting_interval`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerBuilder.html#method.with_reporting_interval
+[`profiler_tick`]: https://github.com/async-profiler/rust-agent/blob/506718fff274b49cf2eb03305a4f9547b61720e3/src/profiler.rs#L1083
+
+#### Agent lifecycle management
+
+The async-profiler agent can be stopped and started at run-time.
+
+Trying to start an async-profiler session when async-profiler is already running leads to an error from
+async-profiler, so if restarting the profiler is desired (possibly with a different configuration), it is needed
+to stop the profiler before starting it again.
+
+When stopped, the async-profiler agent stops async-profiler, flushes the last profile to the recorder, and then signals
+that it has finished. After that signal, it is possible to start a different instance of the async-profiler
+agent on the same process.
+
+#### Profiler configuration
+
+async-profiler is configured via [`ProfilerOptions`] and [`ProfilerOptionsBuilder`]. You
+should read these docs along with the [async-profiler options docs], for more details.
+
+[`ProfilerOptions`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerOptionsBuilder.html
+[`ProfilerOptionsBuilder`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/profiler/struct.ProfilerOptionsBuilder.html
+[async-profiler options docs]: https://github.com/async-profiler/async-profiler/blob/v4.0/docs/ProfilerOptions.md
+
+#### JFR file rotation
+
+async-profiler expects to be writing the current JFR to a "fresh" file path. To that
+effect, async-profiler creates 2 unnamed temporary files via `JfrFile`, and gives to
+async-profiler alternating paths of the form `/proc/self/fd/<N>` to write the
+JFRs into.
+
+### 2. async-profiler FFI (`asprof`)
+
+**Purpose**: Safe Rust bindings to the native async-profiler library.
+
+**Key Components**:
+- `AsProf`: Safe interface to async-profiler
+- `raw`: Low-level FFI declarations
+
+**Responsibilities**:
+- Dynamic loading of `libasyncProfiler.so` using [`libloading`]
+- Safe, Rust-native wrappers around C API calls
+
+[libloading]: crates.io/crates/libloading
+
+### 3. Metadata (`metadata/`)
+
+**Purpose**: Host identification and report context information.
+
+**Key Components**:
+- `AgentMetadata`: Host identification (EC2, Fargate, or generic)
+- `aws`: AWS-specific metadata autodetection via IMDS
+
+The metadata is sent to the [`Reporter`] implementation, and can be used to
+identify the host that generated a particular profiling report. In the local reporter,
+it is ignored, In the S3 reporter, it is attached to the zip uploaded
+to S3 as `metadata.json`. 
+
+### 4. Reporters (`reporter/`)
+
+**Purpose**: Pluggable backends for uploading profiling data.
+
+**Key Components**:
+- [`Reporter`] trait: Common interface for all backends
+- [`LocalReporter`]: Filesystem output for development/testing
+- [`S3Reporter`]: AWS S3 upload with metadata
+- [`MultiReporter`]: Composition of multiple reporters
+
+[`Reporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/trait.Reporter.html
+[`LocalReporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/local/struct.LocalReporter.html
+[`S3Reporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/s3/struct.S3Reporter.html
+[`MultiReporter`]: https://docs.rs/async-profiler-agent/0.1/async_profiler_agent/reporter/multi/struct.MultiReporter.html
+
+The reporter trait is as follows:
+
+```rust
+#[async_trait]
+pub trait Reporter: fmt::Debug {
+    async fn report(
+        &self,
+        jfr: Vec<u8>,
+        metadata: &ReportMetadata,
+    ) -> Result<(), Box<dyn std::error::Error + Send>>;
+}
+```
+
+Customers whose needs are not suited by the built-in reporters might write their
+own reporters.
+
+### 5. PollCatch (`pollcatch/`)
+
+**Purpose**: Tokio-specific instrumentation for detecting long poll times.
+
+**Key Components**:
+- `before_poll_hook()`: Pre-poll timestamp capture
+- `after_poll_hook()`: Post-poll analysis and reporting
+- `tsc.rs`: CPU timestamp counter utilities
+
+**Responsibilities**:
+- Minimal-overhead poll time tracking
+- Integration with Tokio's task hooks
+- JFR event emission for long polls
+- CPU timestamp correlation with profiler samples
+
+## Data Flow
+
+1. **Initialization**: Profiler loads `libasyncProfiler.so` and initializes
+2. **Session Start**: Creates temporary JFR files and starts async-profiler
+3. **Continuous Profiling**: async-profiler collects samples to active JFR file
+4. **Periodic Reporting**: 
+   - Stop profiler and rotate JFR files
+   - Read completed JFR data
+   - Package with metadata
+   - Upload via configured reporters
+   - Restart profiler with new JFR file
+5. **Shutdown**: Stop profiler and perform final report
+
+## Key Design Decisions
+
+### Dual JFR File Strategy
+Uses two temporary files to enable continuous profiling during report uploads. While one file receives new samples, the other is being processed and uploaded.
+
+### Builder Pattern Configuration
+Provides type-safe, ergonomic configuration with sensible defaults while supporting advanced customization.
+
+### Trait-Based Reporters
+Enables pluggable upload destinations without coupling core profiling logic to specific backends.
+
+### Optional AWS Integration
+AWS-specific features are behind feature flags, allowing use in non-AWS environments without unnecessary dependencies.
+
+### Thread Safety
+Designed for multi-threaded environments with careful synchronization around profiler state and file operations.
+
+## Feature Flags
+
+- `s3`: Full S3 reporter with default AWS SDK features
+- `s3-no-defaults`: S3 reporter without default features (for custom TLS)
+- `aws-metadata`: AWS metadata detection with default features
+- `aws-metadata-no-defaults`: AWS metadata without default features
+- `__unstable-fargate-cpu-count`: Experimental Fargate CPU metrics
+
+## Error Handling
+
+The design emphasizes resilience:
+- Reporter errors don't stop profiling
+- Profiler errors are logged but allow graceful degradation
+- Resource cleanup is guaranteed via RAII patterns
+- Temporary file management prevents resource leaks
+
+## Performance Considerations
+
+- Minimal overhead during normal operation
+- JFR file I/O is asynchronous and non-blocking
+- PollCatch hooks are optimized for the common case (no sample)
+- Memory allocation is minimized in hot paths
+- Background reporting doesn't interfere with application performance
diff --git a/README.md b/README.md
@@ -123,6 +123,33 @@ If you can't use `tokio_unstable`, it is possible to wrap your tasks by instrume
 runs the risk of forgetting to instrument the task that is actually causing the high latency,
 and therefore it is strongly recommended to use `on_before_task_poll`/`on_after_task_poll`.
 
+#### Using pollcatch without the agent
+
+The recommended way of using async-profiler-agent is via async-profiler-agent's agent. However, in case your
+application is already integrated with some other mechanism that calls `async-profiler`, the
+`on_before_task_poll`/`on_after_task_poll` hooks just call the async-profiler [JFR Event API]. They can be used
+even if async-profiler is run via a mechanism different from the async-profiler Rust agent (for example, a
+Java-native async-profiler integration), though currently, the results from the JFR Event API are only exposed in
+async-profiler's JFR-format output mode.
+
+You can see the `test_pollcatch_without_agent.sh` for an example that uses pollcatch with just async-profiler's
+`LD_PRELOAD` mode.
+
+However, in that case, it is only needed that the pollcatch hooks refer to the same `libasyncProfiler.so` that is
+being used as a profiler, since the JFR Event API is based on global variables that must match. async-profiler-agent
+uses [libloading] which uses [dlopen(3)] (currently passing [`RTLD_LOCAL | RTLD_LAZY`][libloadingflags]), which
+performs [deduplication based on inode]. Therefore, if your system only has a single `libasyncProfiler.so`
+on the search path, it will be shared and pollcatch will work.
+
+The async-profiler-agent crate currently does not expose the JFR Event API to users, due to stability
+reasons. As a user, using `libloading` to open `libasyncProfiler.so` and calling the API yourself
+will work, but if you have a use case for the JFR Event API, consider opening an issue.
+
+[deduplication based on inode]: https://stackoverflow.com/questions/45954861/how-to-circumvent-dlopen-caching/45955035#45955035
+[JFR Event API]: https://github.com/async-profiler/async-profiler/blob/master/src/asprof.h#L99
+[libloading]: https://crates.io/crates/libloading
+[libloadingflags]: https://docs.rs/libloading/latest/libloading/os/unix/struct.Library.html#method.new
+
 ### Not enabling the AWS SDK / Reqwest default features
 
 The `aws-metadata-no-defaults` and `s3-no-defaults` feature flags do not enable feature flags for the AWS SDK and `reqwest`. 
diff --git a/examples/pollcatch-without-agent.rs b/examples/pollcatch-without-agent.rs
@@ -0,0 +1,18 @@
+extern crate async_profiler_agent;
+use std::time::{Duration, Instant};
+
+// Simple test without a Tokio runtime, to just have an integration
+// test of the pollcatch hooks on async-profiler without involving
+// Tokio
+
+fn main() {
+    let start = Instant::now();
+    while start.elapsed() < Duration::from_secs(1) {
+        async_profiler_agent::pollcatch::before_poll_hook();
+        let mid = Instant::now();
+        while mid.elapsed() < Duration::from_millis(10) {
+            // spin, there will be a profiler sample here
+        }
+        async_profiler_agent::pollcatch::after_poll_hook();
+    }
+}
diff --git a/tests/test_pollcatch_without_agent.sh b/tests/test_pollcatch_without_agent.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+
+# This test needs to following resources:
+# 1. LD_LIBRARY_PATH set to an async-profiler with user JFR support
+# 2. executable `./pollcatch-decoder` from `cd decoder && cargo build`
+# 3. executable `./pollcatch-without-agent` from `cargo build --example pollcatch-without-agent`
+
+set -exuo pipefail
+
+dir="pollcatch-without-agent-jfr"
+
+mkdir -p $dir
+rm -f $dir/*.jfr
+
+# Test that the pollcatch functions work fine with async-profiler in non-agent mode (test LD_PRELOAD mode)
+LD_PRELOAD=libasyncProfiler.so ASPROF_COMMAND=start,event=cpu,jfr,file=$dir/output.jfr ./pollcatch-without-agent
+./pollcatch-decoder longpolls --include-non-pollcatch $dir/output.jfr > $dir/output.txt
+cat $dir/output.txt
+grep -q 'poll of' $dir/output.txt