Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## Unreleased

## Features (User Facing)
* Introduce `max_sample_size` which guides how many samples will be gathered at most for a given scenario. This avoids a variety of issues when scenarios gather too many samples (memory consumption etc). Defaults to `1_000_000`, setting it to `nil` gathers unlimited samples again (behavior before this version).
* Introduce `max_sample_size` which guides how many samples will be gathered at most for a given scenario.
This avoids a variety of issues when scenarios gather too many samples (memory consumption, statistics taking long to calculate, formatters hanging/not working).
Defaults to `1_000_000`, setting it to `nil` gathers unlimited samples again (behavior before this version).
* Introduce `exclude_outliers` option which when set to `true` will automatically exclude outliers from the samples gathered.
Especially important for run time, you can remove samples caused by garbage collection or external factors.
Defaults to `false`.
Shout out to [@NickNeck](https://github.com/NickNeck) who implemented this long wished for feature over in `Statistex`.

### Bugfixes (User Facing)
* fixed a bug where if times were supplied as `0` instead of `0.0` we'd sometimes gather a single measurement
* elixir `1.19` compilation warnings have been removed

### Features (Plugins)
* The `%Benchee.Statistics{}` struct now comes with values to accompany the outlier exclusion feature:
* outliers - if outlier exclusion was enabled, may include any samples of outliers that were found, empty list otherwise
* lower_outlier_bound - value below which values are considered an outlier
* upper_outlier_bound - value above which values are considered an outlier

## 1.4.0 (2025-04-14)

Some nice features (`pre_check: :all_same` is cool) along with adding support for some new stuff (`tprof`) and fixing some bugs.
Expand Down
50 changes: 41 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ Produces the following output on the console:
Operating System: Linux
CPU Information: AMD Ryzen 9 5900X 12-Core Processor
Number of Available Cores: 24
Available memory: 31.25 GB
Elixir 1.16.0-rc.1
Erlang 26.1.2
Available memory: 31.26 GB
Elixir 1.19.0
Erlang 28.1
JIT enabled: true

Benchmark suite executing with the following configuration:
Expand All @@ -39,25 +39,26 @@ reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 28 s
Excluding outliers: false

Benchmarking flat_map ...
Benchmarking map.flatten ...
Calculating statistics...
Formatting results...

Name ips average deviation median 99th %
flat_map 3.79 K 263.87 μs ±15.49% 259.47 μs 329.29 μs
map.flatten 1.96 K 509.19 μs ±51.36% 395.23 μs 1262.27 μs
flat_map 3.96 K 252.74 μs ±15.64% 247.61 μs 321.85 μs
map.flatten 1.84 K 543.57 μs ±44.18% 414.16 μs 1223.92 μs

Comparison:
flat_map 3.79 K
map.flatten 1.96 K - 1.93x slower +245.32 μs
flat_map 3.96 K
map.flatten 1.84 K - 2.15x slower +290.83 μs

Memory usage statistics:

Name Memory usage
flat_map 625 KB
map.flatten 781.25 KB - 1.25x memory usage +156.25 KB
flat_map 624.97 KB
map.flatten 781.25 KB - 1.25x memory usage +156.28 KB

**All measurements for memory usage were the same**
```
Expand All @@ -83,6 +84,7 @@ The aforementioned [plugins](#plugins) like [benchee_html](https://github.com/be
- [Formatters](#formatters)
- [Console Formatter options](#console-formatter-options)
- [Profiling after a run](#profiling-after-a-run)
- [Remove Outliers](#remove-outliers)
- [Saving, loading and comparing previous runs](#saving-loading-and-comparing-previous-runs)
- [Hooks (Setup, Teardown etc.)](#hooks-setup-teardown-etc)
- [Suite hooks](#suite-hooks)
Expand Down Expand Up @@ -115,6 +117,7 @@ The aforementioned [plugins](#plugins) like [benchee_html](https://github.com/be
* as precise as it can get, measure with up to nanosecond precision (Operating System dependent)
* nicely formatted console output with units scaled to appropriately (nanoseconds to minutes)
* (optionally) measures the overhead of function calls so that the measured/reported times really are the execution time of _your_code_ without that overhead.
* (optionally) [removes outliers](#remove-outliers)
* [hooks](#hooks-setup-teardown-etc) to execute something before/after a benchmarking invocation, without it impacting the measured time
* execute benchmark jobs in parallel to gather more results in the same time, or simulate a system under load
* well documented & well tested
Expand All @@ -136,6 +139,8 @@ In addition, you can optionally output an extended set of statistics:
* **sample size** - the number of measurements taken
* **mode** - the measured values that occur the most. Often one value, but can be multiple values if they occur exactly as often. If no value occurs at least twice, this value will be `nil`.

Benchee can also [remove outliers](#remove-outliers).

## Installation

Add `:benchee` to your list of dependencies in `mix.exs`:
Expand Down Expand Up @@ -263,6 +268,11 @@ The available options are the following (also documented in [hexdocs](https://he
This is used to limit memory consumption and unnecessary processing - 1 Million samples is plenty.
This limit also applies to number of iterations done during warmup.
You can set your own number or set it to `nil` if you don't want any limit.
* `exclude_outliers` - whether or not statistical outliers should be removed for the calculated statistics.
Defaults to `false`.
This means that values that are far outside the usual range (as determined by the percentiles/quantiles) will
be removed from the gathered samples and the calculated statistics. You might want to enable this if you
don't want things like the garbage collection triggering to influence your results as much.

### Metrics to measure

Expand Down Expand Up @@ -303,6 +313,7 @@ So, what happens if a function executes too fast for Benchee to measure? If Benc
* essentially every single measurement is now an average across 10 runs making lots of statistics less meaningful

Benchee will print a big warning when this happens.

#### Measuring Memory Consumption

Starting with version 0.13, users can now get measurements of how much memory their benchmarked scenarios use. The measurement is **limited to the process that Benchee executes your provided code in** - i.e. other processes (like worker pools)/the whole BEAM isn't taken into account.
Expand Down Expand Up @@ -542,6 +553,27 @@ Enum."-map/2-lists^map/1-0-"/2 10001 26.38 2282 0.23

**Note about after_each hooks:** `after_each` hooks currently don't work when profiling a function, as they are not passed the return value of the function after the profiling run. It's already fixed on the elixir side and is waiting for release, likely in 1.14. It should then just work.

### Remove Outliers

Benchee can remove outliers from the gathered samples while calculating statistics.
That is, as determined by percentiles/quantiles (we follow [this approach](https://en.wikipedia.org/wiki/Interquartile_range#Outliers)).

You might consider excluding outliers for extreme micro/nano-benchmarks where individual results can be skewed a lot by the Garbage Collection.

You can simply pass `exclude_outliers: true` to Benchee to trigger the removal of outliers.

```elixir
Benchee.run(jobs, exclude_outliers: true)
```

The outliers themselves (aka the samples that have been determined to be outliers)
as well as the lower/upper bound after which samples are considered outliers are accessible
in the `Benchee.Statistics` struct.

The samples themselves still include the outliers, they are only removed for calculating statistics.

Right now Benchee doesn't print the outliers yet, but you can inspect the resulting data structures if you're interested (or send a PR :) )

### Saving, loading and comparing previous runs

Benchee can store the results of previous runs in a file and then load them again to compare them. For example this is useful to compare what was recorded on the main branch against a branch with performance improvements. You may also use this to benchmark across different exlixir/erlang versions.
Expand Down
4 changes: 3 additions & 1 deletion lib/benchee/benchmark/runner.ex
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@ defmodule Benchee.Benchmark.Runner do
# This module actually runs our benchmark scenarios, adding information about
# run time and memory usage to each scenario.

alias Benchee.Benchmark
alias Benchee.Benchmark.BenchmarkConfig
alias Benchee.{Benchmark, Scenario, Utility.Parallel}
alias Benchee.Scenario
alias Benchee.Utility.Parallel

alias Benchmark.{
Collect,
Expand Down
11 changes: 9 additions & 2 deletions lib/benchee/configuration.ex
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ defmodule Benchee.Configuration do
# It also generates less than 1GB in data (some of which is garbage collected/
# not necessarily all in RAM at the same time) - which seems reasonable enough.
# see `samples/statistics_performance.exs` and also maybe run it yourself.
max_sample_size: 1_000_000
max_sample_size: 1_000_000,
exclude_outliers: false

@typedoc """
The configuration supplied by the user as either a map or a keyword list
Expand Down Expand Up @@ -152,6 +153,11 @@ defmodule Benchee.Configuration do
This is used to limit memory consumption and unnecessary processing - 1 Million samples is plenty.
This limit also applies to number of iterations done during warmup.
You can set your own number or set it to `nil` if you don't want any limit.
* `exclude_outliers` - whether or not statistical outliers should be removed for the calculated statistics.
Defaults to `false`.
This means that values that are far outside the usual range (as determined by the percentiles/quantiles) will
be removed from the gathered samples and the calculated statistics. You might want to enable this if you
don't want things like the garbage collection triggering to influence your results as much.
"""
@type user_configuration :: map | keyword

Expand Down Expand Up @@ -183,7 +189,8 @@ defmodule Benchee.Configuration do
measure_function_call_overhead: boolean,
title: String.t() | nil,
profile_after: boolean | atom | {atom, keyword},
max_sample_size: pos_integer()
max_sample_size: pos_integer(),
exclude_outliers: boolean()
}

@time_keys [:time, :warmup, :memory_time, :reduction_time]
Expand Down
2 changes: 1 addition & 1 deletion lib/benchee/formatters/console.ex
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ defmodule Benchee.Formatters.Console do

@behaviour Benchee.Formatter

alias Benchee.Suite
alias Benchee.Formatters.Console.{Memory, Reductions, RunTime}
alias Benchee.Suite

@doc """
Formats the benchmark statistics to a report suitable for output on the CLI.
Expand Down
4 changes: 3 additions & 1 deletion lib/benchee/output/benchmark_printer.ex
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ defmodule Benchee.Output.BenchmarkPrinter do
warmup: warmup,
inputs: inputs,
memory_time: memory_time,
reduction_time: reduction_time
reduction_time: reduction_time,
exclude_outliers: exclude_outliers
}) do
scenario_count = length(scenarios)
exec_time = warmup + time + memory_time + reduction_time
Expand All @@ -84,6 +85,7 @@ defmodule Benchee.Output.BenchmarkPrinter do
parallel: #{parallel}
inputs: #{inputs_out(inputs)}
Estimated total run time: #{Duration.format_human(total_time)}
Excluding outliers: #{exclude_outliers}
""")
end

Expand Down
Loading