Skip to content

Commit cdc0dbd

Browse files
committed
readme: add section about performance and benchmarks
1 parent 4aaf389 commit cdc0dbd

File tree

1 file changed

+71
-0
lines changed

1 file changed

+71
-0
lines changed

README.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,77 @@ The full set of features one can disable are
219219
[in the "Crate features" section of the documentation](https://docs.rs/regex/1.*/#crate-features).
220220

221221

222+
### Performance
223+
224+
One of the goals of this crate is for the regex engine to be "fast." What that
225+
is a somewhat nebulous goal, it is usually interpreted in one of two ways.
226+
First, it means that all searches take worst case `O(m * n)` time, where
227+
`m` is proportional to `len(regex)` and `n` is proportional to `len(haystack)`.
228+
Second, it means that even aside from the time complexity constraint, regex
229+
searches are "fast" in practice.
230+
231+
While the first interpretation is pretty unambiguous, the second one remains
232+
nebulous. While nebulous, it guides this crate's architecture and the sorts of
233+
the trade offs it makes. For example, here are some general architectural
234+
statements that follow as a result of the goal to be "fast":
235+
236+
* When given the choice between faster regex searches and faster Rust compile
237+
times, this crate will generally choose faster regex searches.
238+
* When given the choice between faster regex searches and faster regex compile
239+
times, this crate will generally choose faster regex searches. That is, it is
240+
generally acceptable for `Regex::new` to get a little slower if it means that
241+
searches get faster. (This is a somewhat delicate balance to strike, because
242+
the speed of `Regex::new` needs to remain somewhat reasonable. But this is why
243+
one should avoid re-compiling the same regex over and over again.)
244+
* When given the choice between faster regex searches and simpler API
245+
design, this crate will generally choose faster regex searches. For example,
246+
if one didn't care about performance, we could like get rid of both of
247+
the `Regex::is_match` and `Regex::find` APIs and instead just rely on
248+
`Regex::captures`.
249+
250+
There are perhaps more ways that being "fast" influences things.
251+
252+
While this repository used to provide its own benchmark suite, it has since
253+
been moved to [rebar](https://github.com/BurntSushi/rebar). The benchmarks are
254+
quite extensive, and there are many more than what is shown in rebar's README
255+
(which is just limited to a "curated" set meant to compare performance between
256+
regex engines). To run all of this crate's benchmarks, first start by cloning
257+
and installing `rebar`:
258+
259+
```text
260+
$ git clone https://github.com/BurntSushi/rebar
261+
$ cd rebar
262+
$ cargo install --path ./
263+
```
264+
265+
Then build the benchmark harness for just this crate:
266+
267+
```text
268+
$ rebar build -e '^rust/regex$'
269+
```
270+
271+
Run all benchmarks for this crate as tests (each benchmark is executed once to
272+
ensure it works):
273+
274+
```text
275+
$ rebar measure -e '^rust/regex$' -t
276+
```
277+
278+
Record measurements for all benchmarks and save them to a CSV file:
279+
280+
```text
281+
$ rebar measure -e '^rust/regex$' | tee results.csv
282+
```
283+
284+
Explore benchmark timings:
285+
286+
```text
287+
$ rebar cmp results.csv
288+
```
289+
290+
See the `rebar` documentation for more details on how it works and how to
291+
compare results with other regex engines.
292+
222293
### Minimum Rust version policy
223294

224295
This crate's minimum supported `rustc` version is `1.60.0`.

0 commit comments

Comments
 (0)