Skip to content

Commit c544061

Browse files
mlwellesall-seeing-codedhagrowajeetdsouzamartinmr
authored
feat: add stress tests and CI benchmark integration (#298)
## Summary Adds comprehensive stress tests for sync/async clients and integrates pytest-benchmark into CI. Includes merge of `main` bringing in the picklable AbortedError fix (#299). ### Stress Tests - Concurrent read queries and mutations using ThreadPoolExecutor (sync) and asyncio.gather (async) - Mixed workload tests combining queries, mutations, commits, and discards - Transaction conflict handling and upsert tests - Retry utilities (`retry()`, `retry_async()`, `with_retry()`, `with_retry_async()`, `run_transaction()`, `run_transaction_async()`) - Deadlock regression tests validating the fix from #296 - All tests have consistent `_sync` or `_async` suffixes for clear identification ### Targeted Benchmarks Individual operation benchmarks to pinpoint regression root causes: | Category | Operations Benchmarked | |----------|------------------------| | **Query** | Simple, with variables, best-effort | | **Mutation** | commit_now, explicit commit, discard, N-Quads, delete | | **Transaction** | Upsert, batch mutations, run_transaction helper | | **Client** | check_version, alter schema | **26 total benchmarks** (13 sync + 13 async) - when a stress test regresses, compare individual operation times to identify the exact bottleneck. ### Test Resources - Movie dataset (1million.schema, 1million.rdf.gz) downloaded on demand from dgraph-benchmarks repo - Session-scoped fixtures: `movies_schema()`, `movies_rdf_gz()`, `movies_rdf()` - Automatic decompression with temp directory cleanup ### CI Benchmark Integration - pytest-benchmark fixtures added to stress tests - New `benchmarks` job in PR/main CI workflow (STRESS_TEST_MODE=moderate) - New workflow for semver tag releases - JSON + SVG histogram artifacts uploaded ### Makefile Improvements - `make test PYTEST_ARGS="..."` syntax (exports propagate to scripts) - `make benchmark` delegates to test target - Default `PYTEST_ARGS=-v --benchmark-disable` - STRESS_TEST_MODE and DGRAPH_IMAGE_TAG exported ### Stress Test Modes Each mode uses a `rounds` parameter that repeats each test's concurrent batch to create sustained load: | Mode | Workers | Ops/round | Rounds | load_movies | Target duration | |------|---------|-----------|--------|-------------|-----------------| | `quick` (default) | 20 | 200 | 50 | No | ~30s | | `moderate` | 10 | 200 | 8 | Yes (1M triples) | 5-8 min | | `full` | 15 | 500 | 15 | Yes (1M triples) | 12-16 min | ### Housekeeping - Removed deprecated `unittest.makeSuite` usage (removed in Python 3.13) - Replaced manual `suite()` functions with `unittest.main()` in test files --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Anurag Sharma <anurags92@gmail.com> Co-authored-by: Anurag <anurag@dgraph.io> Co-authored-by: Miguel Turner <cymrow@gmail.com> Co-authored-by: Ajeet D'Souza <98ajeet@gmail.com> Co-authored-by: Martin Martinez Rivera <martinmr@dgraph.io> Co-authored-by: Damián Parrino <bucanero@users.noreply.github.com> Co-authored-by: aaroncarey <31550444+aaroncarey@users.noreply.github.com> Co-authored-by: 0xflotus <0xflotus@gmail.com> Co-authored-by: joaquin <joaquin@dgraph.io> Co-authored-by: Michel Diz <MichelDiz@users.noreply.github.com> Co-authored-by: Daniel Mai <daniel@dgraph.io> Co-authored-by: mrwunderbar666 <paul.balluff@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Marcelo Dalta <marcelo.dalta@gmail.com> Co-authored-by: Joshua Goldstein <92491720+joshua-goldstein@users.noreply.github.com> Co-authored-by: Aman Mangal <aman@dgraph.io> Co-authored-by: Raphael <rderbier@gmail.com> Co-authored-by: Matthew McNeely <matthew.mcneely@gmail.com> Co-authored-by: Ruohao Zhang <72735001+seedoilz@users.noreply.github.com> Co-authored-by: Rahul Arvikar <41536633+rarvikar@users.noreply.github.com> Co-authored-by: Joshua Goldstein <jgoldstein345@gmail.com> Co-authored-by: Joshua Goldstein <joshua@hypermode.com> Co-authored-by: Ryan Fox-Tyler <60440289+ryanfoxtyler@users.noreply.github.com> Co-authored-by: Gautam Bhat <gautam.bhat05@GMAIL.COM> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Shiva <shiva@Shivajis-MacBook-Pro.local> Co-authored-by: Gary Wang <garylavayou@outlook.com> Co-authored-by: Shaun Patterson <shaunpatterson@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent a662852 commit c544061

21 files changed

+3621
-114
lines changed
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: ci-pydgraph-benchmarks
2+
on:
3+
push:
4+
branches:
5+
- main
6+
tags:
7+
- v[0-9]+.[0-9]+.[0-9]+*
8+
9+
permissions:
10+
contents: read
11+
12+
jobs:
13+
benchmarks:
14+
name: Release Benchmarks
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v5
18+
- name: Setup python runtime and tooling
19+
uses: ./.github/actions/setup-python-and-tooling
20+
with:
21+
python-version: "3.13"
22+
- name: Setup project dependencies
23+
run: INSTALL_MISSING_TOOLS=true make setup
24+
- name: Sync python virtualenv
25+
run: make sync
26+
- name: Run benchmarks
27+
run: make benchmark
28+
- name: Upload benchmark results
29+
uses: actions/upload-artifact@v4
30+
with:
31+
name: benchmark-results-${{ github.ref_name }}
32+
path: |
33+
benchmark-results.json
34+
benchmark-histogram.svg

.gitignore

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,16 @@ venv
4141
pyvenv.cfg
4242
.DS_Store
4343
examples/notebook/RAG/.env
44+
.osgrep
45+
46+
# Git worktrees
47+
.worktrees/
48+
49+
# Benchmark outputs
50+
benchmark-results.json
51+
benchmark-histogram.svg
52+
stress-benchmark-results.json
53+
54+
# Downloaded test data (fetched on demand)
55+
tests/resources/1million.rdf.gz
56+
tests/resources/1million.schema

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ repos:
6565
pass_filenames: false
6666
additional_dependencies:
6767
- pytest>=8.3.3
68+
- pytest-benchmark>=4.0.0
6869
- grpcio>=1.65.1
6970
- protobuf>=4.23.0
7071
- repo: https://github.com/pre-commit/mirrors-mypy

CONTRIBUTING.md

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -220,15 +220,48 @@ make test
220220
Run specific tests:
221221

222222
```sh
223-
bash scripts/local-test.sh -v tests/test_connect.py::TestOpen
223+
make test PYTEST_ARGS="-v tests/test_connect.py::TestOpen"
224224
```
225225

226226
Run a single test:
227227

228228
```sh
229-
bash scripts/local-test.sh -v tests/test_connect.py::TestOpen::test_connection_with_auth
229+
make test PYTEST_ARGS="-v tests/test_connect.py::TestOpen::test_connection_with_auth"
230230
```
231231

232+
### Stress Tests
233+
234+
The project includes comprehensive stress tests that verify concurrent operations, transaction
235+
conflicts, deadlock prevention, and retry mechanisms for both sync and async clients.
236+
237+
**Quick mode** (default, ~12 seconds) - 20 workers, 50 ops, 10 iterations:
238+
239+
```sh
240+
make test PYTEST_ARGS="tests/test_stress_sync.py tests/test_stress_async.py -v"
241+
```
242+
243+
**Moderate mode** (10x quick, includes movie dataset, ~60+ seconds) - 200 workers, 500 ops, 100
244+
iterations:
245+
246+
```sh
247+
make test STRESS_TEST_MODE=moderate PYTEST_ARGS="tests/test_stress_sync.py tests/test_stress_async.py -v"
248+
```
249+
250+
**Full mode** (10x moderate, maximum stress, ~10+ minutes) - 2000 workers, 5000 ops, 1000
251+
iterations:
252+
253+
```sh
254+
make test STRESS_TEST_MODE=full PYTEST_ARGS="tests/test_stress_sync.py tests/test_stress_async.py -v"
255+
```
256+
257+
The stress tests cover:
258+
259+
- **Sync tests**: Run with `ThreadPoolExecutor` to test concurrent operations
260+
- **Async tests**: Use pure `asyncio.gather()` concurrency (no `concurrent.futures` mixing)
261+
- **Retry utilities**: Tests for `retry_async()`, `with_retry_async()`, and
262+
`run_transaction_async()`
263+
- **Deadlock regression**: Validates the asyncio.Lock deadlock fix from PR #293
264+
232265
### Test Infrastructure
233266

234267
The test script requires Docker and Docker Compose to be installed on your machine.

Makefile

Lines changed: 52 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,25 @@
22
SHELL := /bin/bash
33
export PATH := $(HOME)/.local/bin:$(HOME)/.cargo/bin:$(PATH)
44

5+
# Export test configuration variables so they're available to child processes
6+
# Usage: make test STRESS_TEST_MODE=moderate PYTEST_ARGS="-v"
7+
# make test LOG=info (adds --log-cli-level=INFO to default PYTEST_ARGS)
8+
export STRESS_TEST_MODE
9+
export DGRAPH_IMAGE_TAG
10+
11+
# When LOG is set (e.g., LOG=info), inject --log-cli-level into pytest flags.
12+
# Works with both the default PYTEST_ARGS and explicit overrides:
13+
# make test LOG=info → -v --benchmark-disable --log-cli-level=INFO
14+
# make benchmark LOG=warning → --benchmark-only ... --log-cli-level=WARNING
15+
# make test PYTEST_ARGS="-x" LOG=debug → -x --log-cli-level=DEBUG
16+
PYTEST_ARGS ?= -v --benchmark-disable
17+
ifdef LOG
18+
LOG_FLAG := --log-cli-level=$(shell echo '$(LOG)' | tr '[:lower:]' '[:upper:]')
19+
PYTEST_ARGS += $(LOG_FLAG)
20+
endif
21+
export LOG
22+
export PYTEST_ARGS
23+
524
# Source venv if it exists and isn't already active
625
PROJECT_VENV := $(CURDIR)/.venv
726
ACTIVATE := $(wildcard .venv/bin/activate)
@@ -15,14 +34,21 @@ else
1534
RUN :=
1635
endif
1736

18-
.PHONY: help setup sync deps deps-uv deps-trunk deps-docker test check protogen clean build publish
37+
.PHONY: help setup sync deps deps-uv deps-trunk deps-docker test benchmark check protogen clean build publish
1938

2039
.DEFAULT_GOAL := help
2140

2241
help: ## Show this help message
2342
@echo ""
2443
@echo "Environment Variables:"
2544
@echo " INSTALL_MISSING_TOOLS=true Enable automatic installation of missing tools (default: disabled)"
45+
@echo " LOG=<level> Add --log-cli-level to pytest (e.g., LOG=info, LOG=debug)"
46+
@echo " Works with both 'test' and 'benchmark' targets"
47+
@echo " STRESS_TEST_MODE=<mode> Stress test preset: quick (default), moderate, full"
48+
@echo " PYTEST_ARGS=\"...\" Override default pytest flags (default: -v --benchmark-disable)"
49+
@echo " Note: overrides LOG when set explicitly. 'benchmark' sets its own"
50+
@echo " PYTEST_ARGS internally but still honours LOG"
51+
@echo " DGRAPH_IMAGE_TAG=<tag> Override the Dgraph Docker image tag (default: latest)"
2652
@echo ""
2753
@echo "Available targets:"
2854
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-15s\033[0m %s\n", $$1, $$2}'
@@ -51,8 +77,31 @@ clean: ## Cleans build artifacts
5177
build: deps-uv sync protogen ## Builds release package
5278
$(RUN) uv build
5379

54-
test: deps-uv sync ## Run tests
55-
bash scripts/local-test.sh
80+
test: deps-uv sync ## Run tests (use PYTEST_ARGS to pass options, e.g., make test PYTEST_ARGS="-v tests/test_connect.py")
81+
bash scripts/local-test.sh $(PYTEST_ARGS)
82+
83+
benchmark: ## Run benchmarks (measures per-operation latency with pytest-benchmark)
84+
@# Outputs (all .gitignored):
85+
@# benchmark-results.json Phase 1 results (pytest-benchmark JSON)
86+
@# benchmark-histogram.svg Phase 1 latency histogram
87+
@# stress-benchmark-results.json Phase 2 results (pytest-benchmark JSON)
88+
@#
89+
@# Phase 1: Per-operation latency benchmarks against a clean database.
90+
@# Runs targeted benchmark tests (test_benchmark_*.py) which measure individual
91+
@# operations (query, mutation, upsert, etc.) in isolation. Each test creates a
92+
@# fresh schema via drop_all, so these MUST run on their own Dgraph cluster —
93+
@# the rapid schema churn destabilises the alpha for any tests that follow.
94+
@echo "═══ Phase 1: Per-operation latency benchmarks ═══"
95+
$(MAKE) test PYTEST_ARGS="--benchmark-only --benchmark-json=benchmark-results.json --benchmark-histogram=benchmark-histogram -v $(LOG_FLAG) tests/test_benchmark_async.py tests/test_benchmark_sync.py"
96+
@# Phase 2: Stress-test benchmarks under sustained concurrent load.
97+
@# Runs stress tests (test_stress_*.py) with the 1-million-movie dataset loaded.
98+
@# Uses a separate Dgraph cluster (via a second 'make test' invocation) so the
99+
@# alpha starts fresh after Phase 1's drop_all churn.
100+
@# benchmark.pedantic(rounds=1) in each stress test prevents pytest-benchmark
101+
@# from compounding iterations — the stress_config["rounds"] inner loop
102+
@# (controlled by STRESS_TEST_MODE) handles repetition instead.
103+
@echo "═══ Phase 2: Stress-test benchmarks (moderate load, 1M movies) ═══"
104+
$(MAKE) test STRESS_TEST_MODE=moderate PYTEST_ARGS="--benchmark-only --benchmark-json=stress-benchmark-results.json -v $(LOG_FLAG) tests/test_stress_async.py tests/test_stress_sync.py"
56105

57106
publish: clean build ## Publish a new release to PyPi (requires UV_PUBLISH_USERNAME and UV_PUBLISH_PASSWORD to be set)
58107
$(RUN) uv publish

PUBLISHING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,12 @@ This document contains instructions to create a new pydgraph release and publish
99
1. Create a new branch (prepare-for-release-vXX.X.X, for instance)
1010
1. Update the VERSION in pydgraph/meta.py
1111
1. Build pydgraph locally, see the [README](README.md#build-from-source)
12-
1. Run the tests (`bash scripts/local-test.sh`) to ensure everything works
12+
1. Run the tests (`make test`) to ensure everything works
1313
1. If you're concerned about incompatibilities with earlier Dgraph versions, invoke the test suite
1414
with earlier Dgraph versions
1515

1616
```sh
17-
DGRAPH_IMAGE_TAG=vX.X.X bash scripts/local-test.sh
17+
make test DGRAPH_IMAGE_TAG=vX.X.X
1818
```
1919

2020
1. If you happen to have the testpypi access token, try a test upload to testpypi:

0 commit comments

Comments
 (0)