Get value from this toolkit in 30 minutes.
-
Open the audit template:
templates/environment-audit-template.md
-
Fill out the five gap sections:
- Traffic Realism
- Dependency Realism
- Data Realism
- Configuration Realism
- Temporal Realism
-
Don't overthink it. Just answer:
- What's different between staging and production?
- Could that difference hide a real problem?
- What are we going to do about it?
See a completed example: examples/completed-audit-example.md
# Compare production metrics to your load test config
python scripts/traffic_gap_detector.py \
--prod-metrics examples/prod_metrics_example.json \
--load-test your_load_test_config.yamlThis will tell you if your load test is missing critical patterns.
# Compare critical DB settings between environments
./scripts/db_config_diff.sh \
--prod-host prod-db.example.com \
--staging-host staging-db.example.comFinds timeout and connection pool mismatches that invalidate tests.
If you're testing against mocked dependencies, replace perfect stubs with realistic behavior.
Before (useless):
# Always succeeds in 50ms
def mock_payment():
time.sleep(0.05)
return {"status": "success"}After (realistic):
# Run our realistic mock instead
python mocks/payment_gateway_realistic.pyThis mock includes:
- Rate limiting (429 errors at 100 TPS)
- Tail latency (p99 = 2 seconds)
- Realistic error rate (2%)
- Timeout behavior
It will expose issues your perfect mock hides.
After running this audit, you'll typically find:
- Your load test runs steady-state, production spikes → Add spike simulation
- Your mocks are perfect, real dependencies push back → Make mocks meaner
- Your timeouts don't match production → Match them or document the gap
- You're testing at 2 PM, production breaks at 10 PM during batch jobs → Test during real failure windows
Here's what gaps look like when you find them:
[HIGH] Traffic Spike Gap
Production spikes to 2.1x baseline during campaigns
Test runs steady state at 15K RPS
Impact: May miss connection pool exhaustion during spikes
Action: Add spike config to load test
[HIGH] Dependency Mock Gap
Payment gateway throttles at 100 TPS in production
Mock never throttles
Impact: Won't see retry storms or circuit breaker behavior
Action: Replace with realistic mock
[MEDIUM] Data Volume Gap
Production: 14M rows
Staging: 400K rows
Impact: Query performance may differ
Action: Document gap, consider seeding more data
You have four options for each gap:
- Fix it - Match staging to production (best option if feasible)
- Document it - Write down the delta and adjust your conclusions
- Test differently - If gap is too big, test in production instead (with guardrails)
- Accept it - Some gaps don't matter for your specific test
Don't let perfect be the enemy of good. You're not trying to make staging identical to production. You're trying to understand where they differ and what that means.
After your first audit:
- Run your chaos experiment with eyes open about the gaps
- Come back to the template and fill out the "Post-Test Follow-up" section
- Build a gap library for your team (which gaps matter for which tests?)
- Automate what you can using the scripts in this repo
- See a real example:
examples/completed-audit-example.md - Read the articles:
- Fork and customize these templates for your own environment
The goal is simple: stop trusting test results from environments that haven't earned that trust.