Skip to content
Open
25 changes: 15 additions & 10 deletions source/benchmarking/odm-benchmarking.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ different ODMs, and the relative performance of ODMs and their associated langua

We expect substantial performance differences between ODMs based on both their language families (e.g. static vs.
dynamic or compiled vs. virtual-machine-based) as well as their inherent design (e.g. web frameworks such as Django vs.
application-agnostic such as Mongoose). However we still expect "vertical" comparison within families of ODMs to expose
outlier behavior that can be optimized away.
application-agnostic such as Mongoose). Within families of ODMs that use similar design and language families, these
comparisons could be used to identify potential areas of performance improvement.

### Task Hierarchy

Expand All @@ -34,9 +34,9 @@ The suite is intentionally kept small for several reasons:

- ODM feature sets vary significantly across libraries, limiting the number of benchmarks that can be run across the
entire collection of extant ODMs.
- Several popular \`MongoDB ODMs are actively maintained by third-parties, such as Mongoose. By limiting the
benchmarking suite to a minimal set of representative tests that are easy to implement, we encourage adoption of the
suite by these third-party maintainers.
- Several popular MongoDB ODMs are actively maintained by third-parties, such as Mongoose. By limiting the benchmarking
suite to a minimal set of representative tests that are easy to implement, we encourage adoption of the suite by
these third-party maintainers.

### Measurement

Expand Down Expand Up @@ -368,14 +368,19 @@ supports.
### Benchmark Server

The MongoDB ODM Performance Benchmark must be run against a MongoDB replica set of size 1 running the latest stable
database version without authentication or SSL enabled.
database version without authentication or SSL enabled. The Benchmark should be run on the established internal
performance distro for the sake of consistency.

### Benchmark placement and scheduling

The MongoDB ODM Performance Benchmark should be placed within the ODM's test directory as an independent test suite. Due
to the relatively long runtime of the benchmarks, including them as part of an automated suite that runs against every
PR is not recommended. Instead, scheduling benchmark runs on a regular cadence is the recommended method of automating
this suite of tests.
The MongoDB ODM Performance Benchmark should be placed in one of two places. For first-party ODMs, the Benchmark should
be placed within the ODM's test directory as an independent test suite. For third-party ODMs, if the external
maintainers do not wish to have the Benchmark included as part of the in-repo test suite, it should be included in the
ODM performance testing repository created explicitly for this purpose.

Due to the relatively long runtime of the benchmarks, including them as part of an automated suite that runs against
every PR is not recommended. Instead, scheduling benchmark runs on a regular cadence is the recommended method of
automating this suite of tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a dedicated, isolated perf lab, with machines that won't get changes unless we know about it? My experience with perf testing over many years is that unless you have such a system, then the noise makes it very difficult to see when things really change. For example, OS updates, platform/language changes, virus checking, auto-updates kicking in mid run, and so on, all make the data hard to interpret.

How do you currently handle driver perf test machines? Can you point me to charts, or even raw data I guess, that should variation/noise over time? Also, how often do they run? Is there only a single change between each run so that it's feasible to trace back a perf difference to a single change, be that external or a code change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of what the Python driver perf tests output. The driver perf tests have experienced all of the issues you've stated, but still provide useful metrics that let us catch regressions and identify places for improvement. Running on Evergreen doesn't allow us (AFAIK) to have our own dedicated set of machines.

The Python driver perf tests run weekly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drivers do have access to a special host in evergreen to run dedicated performance tasks on to ensure stability and consistency (rhel90-dbx-perf-large).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the dedicated Evergreen hosts be discussed in this document? Alternatively, if it's already discussed in the general driver benchmarks spec we can reference that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NoahStapp let me know if this is also an internal document or something we can easily reference. Feel free to resolve accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in an internal document titled "Drivers Performance Testing Infrastructure Guidelines".

## ODM-specific benchmarking

Expand Down
Loading