Skip to content

Conversation

@NoahStapp
Copy link
Contributor

@NoahStapp NoahStapp commented Aug 14, 2025

Purpose

This is the first draft of the Django implementation of the MongoDB ODM Benchmarking suite. It contains a small suite of benchmark tests designed to measure django-mongodb-backend performance across data sizes and structures for what we expect to be common user operations.

This is NOT intended to be a comprehensive test suite for every operation, only the most common and widely applicable. This is also not intended to be Django specific: each of these tests must be implementable across all of our ODMs, up to the features supported by each library.

Structure

The benchmarks are contained within a separate Django application within tests/performance/perftest. This application exists only to execute the benchmarking suite.

Running Locally

After starting a local MongoDB server, the tests can be run by running python manage.py test from the tests/performance directory. The full suite run locally is expected to take approximately 10 minutes. For faster benchmarks, pass FASTBENCH=1 as an environment variable totests/performance/perftest/tests.py.

Review Items

  • Do the Django operations being performed conform to our expectations of how django-mongodb-backend should be used?
  • Are any operations missing that we expect to be frequently performed by users?
  • Is the scope and breadth of the benchmarks sufficient for coverage without being cumbersome to implement across potentially dozens of libraries, first and third-party?
  • Are the data sizes and model structures being used representative of real-world situations?

@aclark4life
Copy link
Collaborator

aclark4life commented Aug 18, 2025

In general, awesome!

Re:

Structure

The benchmarks are contained within a separate Django application within tests/performance/perftest. This application exists only to execute the benchmarking suite.

Can this be integrated with existing tests such that some established testing conventions are followed? For example,

  • tests/<testsuite>_ - contains test suites for various features
  • performance - Django project for perf testing that contains perf testing app?

IOW If the performance directory "contained a test suite for testing performance" then we could follow the convention of adding the _ at the end so the Django tests could run the performance tests. Since performance appears to be a standalone Django project, I'm not sure what the best thing to do is (especially given the need to be able to run without the Django test runner.) I wonder if a top level perf_testing project that contains the perftest app may be better than including the perf tests in tests/. Or maybe a top level perftest/perftest, project and app respectively. Also not sure if it helps, but apps don't have to be in projects …

@NoahStapp
Copy link
Contributor Author

In general, awesome!

Re:

Structure

The benchmarks are contained within a separate Django application within tests/performance/perftest. This application exists only to execute the benchmarking suite.

Can this be integrated with existing tests such that some established testing conventions are followed? For example,

  • tests/<testsuite>_ - contains test suites for various features
  • performance - Django project for perf testing that contains perf testing app?

IOW If the performance directory "contained a test suite for testing performance" then we could follow the convention of adding the _ at the end so the Django tests could run the performance tests. Since performance appears to be a standalone Django project, I'm not sure what the best thing to do is (especially given the need to be able to run without the Django test runner.) I wonder if a top level perf_testing project that contains the perftest app may be better than including the perf tests in tests/. Or maybe a top level perftest/perftest, project and app respectively. Also not sure if it helps, but apps don't have to be in projects …

Running the performance tests alongside the Django tests would add too much runtime to be practical. Putting the perftest project somewhere not in the tests directory is fine with me, but having every test suite be within tests makes more sense from a consistency standpoint.

Is there a way to independently run tests on the perftest without a project? If we can do away with needing to have a full (if minimal) Django project at all here that would be ideal, but I couldn't find a way to do so with how the other tests here are set up.

@timgraham
Copy link
Collaborator

I'm inclined to agree that the performance tests should be in their own directory performance_tests or test_performance? There not meant to be run like the other tests, so to intermingle them seems confusing (and requires workarounds in the test runner, as you added).

Probably integrating with Django's runtests.py isn't the way to go, but I'll think about it. At least the settings file you added could be minimized. I could do that without much effort. I have a few comments from a cursory glance and will dig in more later.

Comment on lines 191 to 175
class TestSmallFlatDocCreation(SmallFlatDocTest, TestCase):
def do_task(self):
for doc in self.documents:
model = SmallFlatModel(**doc)
model.save()

def after(self):
SmallFlatModel.objects.all().delete()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uncertain how this whole thing fits together. On the hand, you inherit django.test.TestCase which has its own setUp/tearDown strategy, on the other hand, you seem to implement your scheme with an after() method. Some classes use after(), others tearDown()... is it intentional? (edit: perhaps answered by the benchmark spec.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ODM benchmark spec (draft PR here: mongodb/specifications#1828) is helpful here, but the short answer is yes, we need the ability to do setup or teardown both before and after every iteration of a test as well as the entire set of iterations.

Comment on lines 215 to 216
def after(self):
SmallFlatModel.objects.all().delete()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uncertain how this whole thing fits together. On the hand, you inherit django.test.TestCase which has its own setUp/tearDown strategy, on the other hand, you seem to implement your scheme with an after() method. Reading the spec, it looks like some after() methods should be tearDown(). In this case, if you delete the objects after the first iteration, future iterations are actually creating new models rather than updating existing ones. Check your other tests too!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Definitely a bug here. I'll have to modify the benchmark to ensure that each iteration of updated_value is unique so the database actually performs an update operation.

Comment on lines 7 to 9
# Install django-mongodb-backend
/opt/python/3.10/bin/python3 -m venv venv
. venv/bin/activate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:
Whilst I'd rather not have a new dependency on drivers-evergreen-tools, to future-proof the binary usage, you could add in the find_python function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I copied over from the existing run-tests.sh script here. Let's update them both in a separate ticket.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to bring in drivers-evergreen-tools anyway for QE

name = self.__class__.__name__[4:]
median = self.percentile(50)
megabytes_per_sec = self.data_size / median / 1000000
print( # noqa: T201
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use LOGGER here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging.info doesn't play nicely with tests here. For perf tests I don't think it's worth the effort to fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is awkward (dots from each test passing, mixed with these prints):

System check identified no issues (0 silenced).
Completed TestLargeFlatDocCreation 46.140 MB/s, MEDIAN=52.461s, total time=94.050s, iterations=1
.Completed TestLargeFlatDocFilterPkByIn 585.696 MB/s, MEDIAN=4.133s, total time=53.420s, iterations=2
.Completed TestLargeFlatDocUpdate 0.001 MB/s, MEDIAN=24.834s, total time=81.044s, iterations=1

The elegant solution might be to use a custom test runner / test result (which has access to the verbosity level) so that this debugging information can be output only at verbosity > 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to be flippant, but I don't think this really matters or should be a blocker for merging. This debugging output will only be relevant for local runs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, just mentioning it for posterity. Implementation details could be borrowed from Django's --debug-sql option.

@NoahStapp NoahStapp requested review from Jibola and timgraham August 20, 2025 13:51
@NoahStapp NoahStapp requested a review from Jibola December 17, 2025 20:10
self.documents = [self.document.copy() for _ in range(NUM_DOCS)]


class TestSmallFlatDocCreation(SmallFlatDocTest, TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to inherit django.test.TestCase? It's unclear to me because I believe PerformanceTest overrides many methods without calling super() which blocks at least some of TestCase's normal setup and teardown operations. I'll have to study this to check how it actually works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PerformanceTest does not inherit from any class, it's effectively a mixin.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that all the benchmarks inherit django.test.TestCase which has certain functionality such as cleaning the database between each test, calling a user-defined setupTestData() (if it exists), however, it's unclear if any of this functionality is used. Each test has a manual tearDown(), for example, and the benchmark seems to work just as well with:

diff --git a/performance_tests/perftest/tests.py b/performance_tests/perftest/tests.py
index 6f45511d..a11ed7e6 100644
--- a/performance_tests/perftest/tests.py
+++ b/performance_tests/perftest/tests.py
@@ -14,6 +14,10 @@ from pathlib import Path
 from bson import ObjectId, encode, json_util
 from django.test import TestCase
 
+import unittest
+
+TestCase = unittest.TestCase
+
 from .models import (
     ForeignKeyModel,
     IntegerEmbeddedModel,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll use unittest.TestCase instead of Django's.

Comment on lines +61 to +54
result_data: list = []


def tearDownModule():
output = json.dumps(result_data, indent=4)
if OUTPUT_FILE:
with open(OUTPUT_FILE, "w") as opf: # noqa: PTH123
opf.write(output)
else:
print(output) # noqa: T201
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uff... I wonder if this blocks splitting up the tests into multiple files? At least it may need to become "append aware."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, splitting into multiple files is complicated by our infrastructure expecting a single JSON output file for the entire suite.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I suggested a default for OUTPUT_FILE, I now see that would break this if/else. Maybe we just remove OUTPUT_FILE from the documentation on how to run the tests? And actually, perhaps it's appropriate to remove this print statement also as I imagine the test by test output would be more useful than the JSON version meant for machine consumption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON version is a nice simple summary that's useful for local use as well. Setting a default in the run-perf-tests.sh script that CI/CD uses makes sense, but we shouldn't set a default in the actual test suite iteself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to argue, but my feeling is that:
TestSmallFlatDocUpdate 0.007 MB/s, MEDIAN=4.282s, total time=8.648s, iterations=2
is much easier to read than:

    {
        "info": {
            "test_name": "SmallFlatDocUpdate"
        },
        "metrics": [
            {
                "name": "megabytes_per_sec",
                "type": "MEDIAN",
                "value": 0.007472334166114841,
                "metadata": {
                    "improvement_direction": "up",
                    "measurement_unit": "megabytes_per_second"
                }
            }
        ]
    }

and to get to the former, I have to scroll up past the huge JSON dump. (Of course, nothing is stopping someone from adding or removing that print statement while testing their changes.)

Incidentally, the JSON results don't include the elapsed time. Is it correct that we won't track that metric? The spec says, "In addition to latency data, ..."

Where can details about the metadata format (improvement_direction, etc.) be found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't report elapsed time in the JSON metrics, correct. The JSON metrics are tracked by Evergreen's charting, and we want a single metric easily comparable across ODMs there. We record elapsed time individually as part of each suite run as a secondary metric if needed for investigation.

The metadata format is for Evergreen chart tracking.

display_name: Performance Benchmarks
run_on:
- rhel90-dbx-perf-large
batchtime: 1440
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batchtime ensures that the performance tests run up to once per day.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, so the other thread where you said "these tests are going to run once for each commit merged into master" isn't correct? There could be multiple commits per day but just one run, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my earlier comment was misleading--that's correct, we expect one run of the perf tests per day that a commit is merged into master.

Comment on lines +184 to +185
tasks:
- name: perf-tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description you wrote, "The full suite run locally is expected to take approximately 10 minutes." On the latest Evergreen build for this PR, it took 35 minutes... is it expected? That's a bit slower than the rest of our builds (~20 minutes). Is this going to run for every push to every PR? It's probably unnecessary. Perhaps it could be a very fast version with very low iterations just to ensure these tests don't break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was for an earlier iteration that didn't fully implement the spec.

No, these tests are going to run once for each commit merged into master, not once per PR push.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So they won't run at all on PRs? What do you think about my point about running with one/low iterations on PRs to ensure a PR doesn't break the suite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the suite runs public APIs only, I don't think having an extra suite just to test that the suite itself doesn't break is worth it. Anything that breaks this suite should be caught by other tests failing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the performance tests are running on this PR... Is there some change we need to make before merging this so that they don't run on PRs? What happens when we need to make a change to the performance tests? Do we temporarily modify some trigger to have them run on that PR, then revert the trigger before merging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to update the Evergreen project settings to exclude the benchmark tests on PRs. Changes to the perf tests would require scheduling Evergreen patches to confirm changes work as expected.

@NoahStapp NoahStapp requested a review from timgraham January 7, 2026 17:01
Comment on lines 395 to 438
class TestSmallFlatDocFilterPkByIn(SmallFlatDocTest, TestCase):
def setUp(self):
super().setUp()
self.ids = []
models = []
for doc in self.documents:
models.append(SmallFlatModel(**doc))

SmallFlatModel.objects.bulk_create(models)
self.ids = [model.id for model in models]

def do_task(self):
list(SmallFlatModel.objects.filter(id__in=self.ids))

def tearDown(self):
super().tearDown()
SmallFlatModel.objects.all().delete()


class TestLargeFlatDocFilterPkByIn(LargeFlatDocTest, TestCase):
def setUp(self):
super().setUp()
models = []
for doc in self.documents:
models.append(LargeFlatModel(**doc))
LargeFlatModel.objects.bulk_create(models)
self.id = LargeFlatModel.objects.first().id

def do_task(self):
list(LargeFlatModel.objects.filter(id__in=[self.id]))

def tearDown(self):
super().tearDown()
LargeFlatModel.objects.all().delete()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes have the same name except for LargeFlatDoc vs. SmallFlatDoc. but the queries are different: (id__in=self.ids) vs. (id__in=[self.id]). Is it intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not intentional, the pattern used by SmallFlatDoc is the intended one.

@NoahStapp NoahStapp requested a review from timgraham January 21, 2026 17:40
@timgraham timgraham changed the title INTPYTHON-847 - Standardize performance testing infrastructure INTPYTHON-847 Add performance tests Jan 27, 2026
Comment on lines +5 to +6
Django MongoDB Backend uses a benchmarking suite to catch performance regressions.
This suite is located in the top-level ``performance_tests`` directory of the Django MongoDB Backend repository.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to see a discussion (linking to the benchmark spec as appropriate) about what this is benchmarking (both Python code and MongoDB queries), how to interpret the results (Mb/sec??), where do we view the ongoing results, etc.

Copy link
Contributor Author

@NoahStapp NoahStapp Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here. We could add a link to the benchmark spec, which details the purpose and scope of the suite? There's already a link in the actual test file. Each run will have an evergreen chart attached to its run, there isn't a single place with all the results added dynamically.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, how can I use this tool to catch performance regressions? Am I supposed to run it locally and compare before and after? Is there a histogram of results? Will regressions be automatically detected? How will we know if the results are statistically significant?

Django has https://github.com/django/django-asv/tree/main and results displayed at https://django.github.io/django-asv/ but it's not obvious how to interpret and use the results, so I have similar concerns here.

Is this supposed to catch performance regressions in Django MongoDB Backend, MongoDB, or both? How can we differentiate?

The spec answers some questions but it's a long document with a lot of unnecessary details. Maybe high-level "cheat sheet" would be useful or other ODMs as well.

I suppose not all questions need to be answered before merging this (perhaps some answers are unknown), but this is what I've been thinking about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer: https://jira.mongodb.org/browse/INTPYTHON-530 will set up automated alerting to be identical to our current PyMongo automated regression alerts. Here's a link to what the charting looks like for that, it'll look similar for Django: https://spruce.mongodb.com/task/mongo_python_driver_performance_benchmarks_perf_8.0_standalone_a89c5e3a89fa7e621f03fc9173067256a5fdc300_26_01_26_19_36_51/trend-charts?execution=0.

The expectation is that the benchmark suite will run up to once per day on commit merge and automatically alert us in Slack if any regressions are detected. No local runs are expected generally.


LOGGER = logging.getLogger("test")
logging.basicConfig(level=logging.INFO, format="%(levelname)-8s %(message)s")
OUTPUT_FILE = os.environ.get("OUTPUT_FILE")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make "results.json" the default so it's not required to be set? Maybe the environ variable is useful for local debugging but I don't see any other values set in the this PR, right?

Comment on lines 36 to 44
.. code-block:: bash

$ git clone --depth 1 https://github.com/mongodb/specifications.git
$ pushd specifications/source/benchmarking/odm-data
$ tar xf flat_models.tgz
$ tar xf nested_models.tgz
$ popd
$ export DJANGO_MONGODB_PERFORMANCE_TEST_DATA_PATH="specifications/source/benchmarking/odm-data"
$ export OUTPUT_FILE="results.json"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These instructions should put the data in performance_tests/odm-data so that the environ variable isn't required.

self.documents = [self.document.copy() for _ in range(NUM_DOCS)]


class TestSmallFlatDocCreation(SmallFlatDocTest, TestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that all the benchmarks inherit django.test.TestCase which has certain functionality such as cleaning the database between each test, calling a user-defined setupTestData() (if it exists), however, it's unclear if any of this functionality is used. Each test has a manual tearDown(), for example, and the benchmark seems to work just as well with:

diff --git a/performance_tests/perftest/tests.py b/performance_tests/perftest/tests.py
index 6f45511d..a11ed7e6 100644
--- a/performance_tests/perftest/tests.py
+++ b/performance_tests/perftest/tests.py
@@ -14,6 +14,10 @@ from pathlib import Path
 from bson import ObjectId, encode, json_util
 from django.test import TestCase
 
+import unittest
+
+TestCase = unittest.TestCase
+
 from .models import (
     ForeignKeyModel,
     IntegerEmbeddedModel,

name = self.__class__.__name__[4:]
median = self.percentile(50)
megabytes_per_sec = self.data_size / median / 1000000
print( # noqa: T201
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is awkward (dots from each test passing, mixed with these prints):

System check identified no issues (0 silenced).
Completed TestLargeFlatDocCreation 46.140 MB/s, MEDIAN=52.461s, total time=94.050s, iterations=1
.Completed TestLargeFlatDocFilterPkByIn 585.696 MB/s, MEDIAN=4.133s, total time=53.420s, iterations=2
.Completed TestLargeFlatDocUpdate 0.001 MB/s, MEDIAN=24.834s, total time=81.044s, iterations=1

The elegant solution might be to use a custom test runner / test result (which has access to the verbosity level) so that this debugging information can be output only at verbosity > 1.

@NoahStapp NoahStapp requested a review from timgraham January 28, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants