Skip to content

performance testing: number of subscriptions vs. function latency#111

Open
diana-qing wants to merge 34 commits intostanford-esrg:mainfrom
diana-qing:num-sub
Open

performance testing: number of subscriptions vs. function latency#111
diana-qing wants to merge 34 commits intostanford-esrg:mainfrom
diana-qing:num-sub

Conversation

@diana-qing
Copy link
Contributor

This PR adds scripts to measure how the latency of a function when running the ip_subs application changes as the number of subscriptions changes.

"examples/log_ssh",
"examples/streaming",
"examples/streaming",
"examples/ip_subs",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this into the tests/perf folder?

@@ -0,0 +1,49 @@
use retina_core::{Runtime, config::load_config};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a README for this example.

@@ -0,0 +1,31 @@
# Performance Testing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an intro with the high-level motivation for this and what it does! What you shared at the EOQ lab meeting was great.

Copy link
Collaborator

@thearossman thearossman Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention the initial testing you did to ensure that this approach is accurate!

Here's my best understanding of what you found:

  • You compared results with Retina's current timing infrastructure, which inlines cycle counts. You found that the uprobes add a constant overhead. That is, this will accurately surface patterns for the use-case of comparing function latency across different implementations or applications.
  • You can't run this at super high throughputs. IIRC, we were able to handle ~5Gbps of live traffic (unless you got more on the passive box). This gives plenty of data points for saying something about function latency.
  • You confirmed this separates entry/exit points by thread, so it'll be accurate even if there are multiple cores. (IMO this was a bit unclear in the documentation.)

```

## Number of Subscriptions vs. Function Latency
`generate_ip_subs.py` shards the IPv4 address space into `n` subnets to generate `n` Retina subscriptions, where `n` is passed in by the user. The subscriptions are written to `spec.toml`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clarify that this is a sample / basic application and more can easily be added. The main goal of your project was to set up the infrastructure.


`run_app.py` runs the `ip_subs` application and measures how the latency of a function changes as the number of subscriptions changes. It generates subscriptions using `generate_ip_subs.py`, then runs `ip_subs` with these subscriptions and measures latency using `func_latency.py`. The latencies are written to `stats/ip_subs_latency_stats.csv` and plots on the number of subscriptions vs. latency for different stats (e.g. average, 99th percentile) can be found in the `figs` directory. The `stats` and `figs` directory get created by the script if they don't already exist.

When running `run_app.py`, you can specify which function to profile, the number of subscriptions, and the config file path. For example, to measure the latency of the `process_packet` function in online mode when the number of subscriptions is 64 and 256, you can run:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably mention that you can profile multiple functions, but because it just records entry/exit timestamps, keep in mind that profiling functions that overlap will cause interference. (You observed this!)


`func_latency.py` uses bcc to profile function latency when running an application by attaching eBPF programs to uprobes at the entry and exit point of functions. Latency is measured in nanoseconds by default. The code for profiling function latency was based on the [example provided by bcc](https://github.com/iovisor/bcc/blob/master/tools/funclatency.py).

`run_app.py` runs the `ip_subs` application and measures how the latency of a function changes as the number of subscriptions changes. It generates subscriptions using `generate_ip_subs.py`, then runs `ip_subs` with these subscriptions and measures latency using `func_latency.py`. The latencies are written to `stats/ip_subs_latency_stats.csv` and plots on the number of subscriptions vs. latency for different stats (e.g. average, 99th percentile) can be found in the `figs` directory. The `stats` and `figs` directory get created by the script if they don't already exist.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be run from a specific directory within the Retina repo?

@@ -0,0 +1,206 @@
# code for profiling function latency with bcc based on https://github.com/iovisor/bcc/blob/master/tools/funclatency.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had talked a bit about managing output a couple of weeks ago in online mode by consuming the subprocess output and filtering it before printing:

  • Making it so that the "samples lost" alert isn't printed
  • Consuming the output and printing the updates on Gbps processed, packets lost, etc.

Did you try this and run into challenges? (I think this is not critical for accuracy, but it is extremely helpful for usability if it's reasonably easy to do.)

@@ -0,0 +1,49 @@
import argparse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File header comment

@@ -0,0 +1,128 @@
import argparse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File header comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants