Skip to content

Commit 84acb4c

Browse files
Enabled support for multiple collection workloads
1 parent 09e1084 commit 84acb4c

File tree

20 files changed

+2435
-201
lines changed

20 files changed

+2435
-201
lines changed

README.md

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,26 @@ Quick access:
9999
* Includes `executionStats` vs `queryPlanner` guidance, severity-based explain eligibility, filtering counters, and expected outcomes
100100
* Includes query traceability fields so each finding can be mapped to a concrete query source/definition
101101
* Includes concise findings cards + Explore drawer workflow with execution-plan stage flow and technical details tabs
102+
* Includes mixed sharded/non-sharded runtime support with per-collection auto detection and explicit sharding strategy controls in the UI
103+
104+
### 6. Workload Lifecycle & Timing Semantics
105+
106+
PLGM now exposes an explicit lifecycle so users can distinguish preparation from measured workload execution:
107+
108+
* `initializing`: collection setup, index creation, sharding/runtime preparation, optional seeding
109+
* `running`: actual query workload execution
110+
* `completed` / `failed`: terminal states
111+
112+
Important timing behavior:
113+
114+
* Initialization time is tracked separately and is **not** treated as execution runtime
115+
* Execution timer and progress bar start from the execution phase
116+
* During initialization, the UI shows progress context (step name, progress %, and recent activity events)
117+
118+
Current limitation:
119+
120+
* Initialization progress is structured and user-visible, but still coarse-grained (step/event based)
121+
* It is not yet per-index/per-shard real-time streaming
102122

103123
## The Interactive UI
104124

@@ -681,6 +701,24 @@ If you point to a folder, plgm will scan and merge **all** `.json` files found i
681701
export PLGM_COLLECTIONS_PATH="./resources/custom_collections/"
682702
```
683703
704+
#### Single-File Multi-Definition Format
705+
When using a single file, plgm supports both legacy array format and wrapped multi-definition format:
706+
707+
- Collections:
708+
- `[{...}]`
709+
- `{"collections":[...]}`
710+
- Queries:
711+
- `[{...}]`
712+
- `{"queries":[...]}`
713+
714+
For multi-collection workloads, each query must identify its target collection (and database when needed). plgm validates query-to-collection mapping before execution and fails early on unknown or ambiguous references.
715+
Using the same collection name in different databases is supported (for example `db1.orders` and `db2.orders`).
716+
717+
Example files:
718+
719+
- [`resources/collections/multi_collections.json`](./resources/collections/multi_collections.json)
720+
- [`resources/queries/multi_queries.json`](./resources/queries/multi_queries.json)
721+
684722
#### Default Workload Filtering
685723
When using **Directory Mode**, the behavior depends on the `PLGM_DEFAULT_WORKLOAD` setting:
686724
@@ -734,6 +772,9 @@ You can override any setting in `config.yaml` using environment variables. This
734772
| `insights_explain_workers` | `PLGM_INSIGHTS_EXPLAIN_WORKERS` | Post-run explain worker count | `1` |
735773
| `insights_explain_retries` | `PLGM_INSIGHTS_EXPLAIN_RETRIES` | Retry count for explain timeout/failure | `1` |
736774
| `insights_explain_backoff_ms` | `PLGM_INSIGHTS_EXPLAIN_BACKOFF_MS` | Backoff between explain retries (ms) | `150` |
775+
| **Sharding Strategy** | | | |
776+
| `sharding_mode` | `PLGM_SHARDING_MODE` | Per-collection sharding behavior: `auto`, `force_on`, `force_off` | `auto` |
777+
| `sharding_skip_generic_without_config` | `PLGM_SHARDING_SKIP_GENERIC_WITHOUT_CONFIG` | Skip generic sharding setup for collections without `shardConfig` (recommended for mixed workloads) | `true` |
737778
| **Workload Control** | | | |
738779
| `concurrency` | `PLGM_CONCURRENCY` | Number of active worker goroutines | `50` |
739780
| `duration` | `PLGM_DURATION` | Test duration (Go duration string) | `5m`, `60s` |
@@ -789,7 +830,7 @@ When executed, plgm performs the following steps:
789830
* Spawns the configured number of **Active Workers**.
790831
* Continuously generates and executes queries (Find, Insert, Update, Delete, Aggregate, Upsert) based on your configured ratios.
791832
* Generates realistic BSON data for Inserts and Updates (supports recursion and complex schemas).
792-
* Workers pick a random collection from the provided list for every operation.
833+
* Workers pick a random collection from the provided list for every operation, then route query execution using queries bound to that same database+collection.
793834
4. **Reporting:**
794835
* Outputs a real-time status report every N seconds (configurable).
795836
* Prints a detailed summary table at the end of the run.

TESTING.md

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ Integration tests are behind the `integration` build tag and are skipped from de
2222
Current integration smoke test:
2323

2424
- `internal/mongo/TestRunWorkloadIntegration_OneShotExecutesFindUpdateAndSkipsInsert`
25+
- `internal/mongo/TestRunWorkloadIntegration_OneShotMultiDBMixedShardConfig`
26+
- `internal/mongo/TestRunWorkloadIntegration_MongosMixedShardedAndUnshardedCollections` (mongos-only, gated)
2527

2628
### Option A: Start MongoDB with Docker Compose (recommended)
2729

@@ -37,6 +39,12 @@ Then run the integration smoke test:
3739
go test -tags=integration ./internal/mongo -run TestRunWorkloadIntegration_OneShotExecutesFindUpdateAndSkipsInsert -v
3840
```
3941

42+
Run all integration tests available for your endpoint:
43+
44+
```bash
45+
go test -tags=integration ./internal/mongo -v
46+
```
47+
4048
Stop MongoDB when done:
4149

4250
```bash
@@ -54,14 +62,44 @@ go test -tags=integration ./internal/mongo -run TestRunWorkloadIntegration_OneSh
5462

5563
If your MongoDB requires authentication, include credentials in `PLGM_IT_MONGO_URI`.
5664

65+
### Mongos Mixed Sharding Integration Test (CI/profiled env)
66+
67+
This test validates a real mongos path with one sharded and one unsharded collection in the same run:
68+
69+
- `TestRunWorkloadIntegration_MongosMixedShardedAndUnshardedCollections`
70+
71+
It is intentionally gated and will skip unless explicitly enabled:
72+
73+
```bash
74+
PLGM_IT_MONGO_URI='mongodb://<mongos-host>:27017' \
75+
PLGM_IT_ENABLE_MONGOS_SHARD_TEST=true \
76+
go test -tags=integration ./internal/mongo \
77+
-run TestRunWorkloadIntegration_MongosMixedShardedAndUnshardedCollections \
78+
-v -count=1
79+
```
80+
81+
Recommended CI/profiled environment requirements:
82+
83+
- URI must point to a real `mongos` router
84+
- test user must have privileges for:
85+
- `listShards`
86+
- `enableSharding`
87+
- `shardCollection`
88+
- normal read/write on test databases
89+
- ideally run in a dedicated test cluster/profile to avoid interference with production workloads
90+
5791
## Run Everything Available
5892

5993
```bash
6094
# 1) Unit tests
6195
go test ./...
6296

6397
# 2) Integration smoke test(s)
64-
go test -tags=integration ./internal/mongo -run TestRunWorkloadIntegration_OneShotExecutesFindUpdateAndSkipsInsert -v
98+
go test -tags=integration ./internal/mongo -v
99+
100+
# 3) Optional: mongos-only mixed sharding coverage
101+
PLGM_IT_ENABLE_MONGOS_SHARD_TEST=true \
102+
go test -tags=integration ./internal/mongo -run TestRunWorkloadIntegration_MongosMixedShardedAndUnshardedCollections -v -count=1
65103
```
66104

67105
## Insights Feature Test Focus (Unit)
@@ -74,3 +112,15 @@ go test ./internal/webui -run Insights -v
74112
```
75113

76114
These cover filtering/severity behavior, `/api/insights` lifecycle, and export payload parity with insights output.
115+
116+
## Workload Lifecycle Observability
117+
118+
`/api/stats` now includes explicit lifecycle state and structured initialization progress fields:
119+
120+
- `lifecyclePhase` (`initializing`, `running`, `completed`, `failed`)
121+
- `lifecycleStep`, `lifecycleStepIndex`, `lifecycleStepTotal`
122+
- `lifecycleStepDone`, `lifecycleStepWork`, `lifecycleStepProgressPct`
123+
- `lifecycleRecentEvents` (bounded recent activity list)
124+
- `initializationDurationSec` and `executionElapsedSec`
125+
126+
This enables UI/automation to distinguish preparation time from measured execution time.

cmd/plgm/main.go

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -269,18 +269,18 @@ func main() {
269269
log.Fatal(err)
270270
}
271271

272-
validCollections := make(map[string]bool)
273-
for _, col := range collectionsCfg.Collections {
274-
validCollections[col.Name] = true
272+
if err := config.ValidateCollectionDefinitions(collectionsCfg.Collections); err != nil {
273+
log.Fatal(err)
275274
}
276275

277-
var filteredQueries []config.QueryDefinition
278-
for _, q := range queriesCfg.Queries {
279-
if validCollections[q.Collection] {
280-
filteredQueries = append(filteredQueries, q)
281-
}
276+
if err := config.NormalizeAndValidateQueries(queriesCfg.Queries); err != nil {
277+
log.Fatal(err)
282278
}
283-
queriesCfg.Queries = filteredQueries
279+
boundQueries, err := config.ValidateAndBindQueriesToCollections(queriesCfg.Queries, collectionsCfg.Collections)
280+
if err != nil {
281+
log.Fatal(err)
282+
}
283+
queriesCfg.Queries = boundQueries
284284

285285
dbName := collectionsCfg.Collections[0].DatabaseName
286286
stats.PrintConfiguration(appCfg, collectionsCfg.Collections, version)
@@ -291,27 +291,41 @@ func main() {
291291
}
292292
defer conn.Disconnect(ctx)
293293

294+
initStart := time.Now()
295+
log.Printf("[Lifecycle] Phase=initializing: preparing collections, indexes, sharding setup, and optional seed data")
296+
log.Printf("[Lifecycle] Init Step 1/3: preparing collections")
294297
if err := mongo.CreateCollectionsFromConfig(ctx, conn.Database, collectionsCfg, appCfg.DropCollections); err != nil {
295298
log.Fatal(err)
296299
}
300+
totalIndexes := 0
301+
for _, col := range collectionsCfg.Collections {
302+
totalIndexes += len(col.Indexes)
303+
}
304+
log.Printf("[Lifecycle] Init Step 2/3: creating indexes (declared indexes=%d)", totalIndexes)
297305
if err := mongo.CreateIndexesFromConfig(ctx, conn.Database, collectionsCfg); err != nil {
298306
log.Fatal(err)
299307
}
300308

309+
log.Printf("[Lifecycle] Init Step 3/3: optional seed (skip_seed=%v, documents_count=%d)", appCfg.SkipSeed, appCfg.DocumentsCount)
301310
if !appCfg.SkipSeed && appCfg.DocumentsCount > 0 {
302311
for _, col := range collectionsCfg.Collections {
303312
if err := mongo.InsertRandomDocuments(ctx, conn.Database, col, appCfg.DocumentsCount, appCfg); err != nil {
304313
log.Fatal(err)
305314
}
306315
}
307316
}
317+
initDuration := time.Since(initStart)
318+
log.Printf("[Lifecycle] Initialization completed in %s", initDuration.Round(10*time.Millisecond))
319+
log.Printf("[Lifecycle] Phase=running: execution timers now track query workload only")
308320

309321
intervalDuration, _ := time.ParseDuration(appCfg.IntervalDelay)
310322
for i := 1; i <= appCfg.Iterations; i++ {
323+
iterStart := time.Now()
311324
log.Printf("Starting Standard Workload iteration %d of %d", i, appCfg.Iterations)
312325
if err := mongo.RunWorkload(ctx, conn.Database, collectionsCfg.Collections, queriesCfg.Queries, appCfg); err != nil {
313326
log.Fatal(err)
314327
}
328+
log.Printf("[Lifecycle] Iteration %d execution duration: %s", i, time.Since(iterStart).Round(10*time.Millisecond))
315329
if i < appCfg.Iterations && intervalDuration > 0 {
316330
log.Printf("Waiting %s before next iteration...", appCfg.IntervalDelay)
317331
time.Sleep(intervalDuration)

config.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,20 @@ insights_explain_workers: 1
7979
insights_explain_retries: 1
8080
insights_explain_backoff_ms: 150
8181

82+
# ==============================================================================
83+
# Sharding Strategy
84+
# ==============================================================================
85+
# Controls how generic sharding setup is applied per collection.
86+
# Modes:
87+
# - auto: detect per collection (recommended)
88+
# - force_on: require mongos and apply sharding setup
89+
# - force_off: never apply sharding setup
90+
sharding_mode: "auto"
91+
92+
# When true, collections without explicit shardConfig are skipped by generic sharding setup.
93+
# Recommended for mixed workloads where some collections are intentionally unsharded.
94+
sharding_skip_generic_without_config: true
95+
8296
# ==============================================================================
8397
# Workload Definition
8498
# ==============================================================================

internal/config/collections.go

Lines changed: 59 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -108,12 +108,8 @@ func LoadCollections(path string, loadDefault bool) (*CollectionsFile, error) {
108108
allCollections = append(allCollections, loaded.Collections...)
109109
}
110110

111-
// --- VALIDATION ---
112-
// Ensure we didn't load an empty config due to JSON key mismatch
113-
for i, col := range allCollections {
114-
if col.DatabaseName == "" || col.Name == "" {
115-
return nil, fmt.Errorf("loaded collection at index %d has empty 'database' or 'collection' name. Check your JSON keys: must be 'database' and 'collection' (lowercase)", i)
116-
}
111+
if err := ValidateCollectionDefinitions(allCollections); err != nil {
112+
return nil, err
117113
}
118114

119115
return &CollectionsFile{Collections: allCollections}, nil
@@ -149,3 +145,60 @@ func parseCollectionsBytes(b []byte) (*CollectionsFile, error) {
149145

150146
return nil, fmt.Errorf("invalid collections format")
151147
}
148+
149+
func ParseCollectionsBytes(b []byte) (*CollectionsFile, error) {
150+
return parseCollectionsBytes(b)
151+
}
152+
153+
func ValidateCollectionDefinitions(cols []CollectionDefinition) error {
154+
seenNamespaces := make(map[string]int, len(cols))
155+
156+
for i, col := range cols {
157+
if strings.TrimSpace(col.DatabaseName) == "" || strings.TrimSpace(col.Name) == "" {
158+
return fmt.Errorf("loaded collection at index %d has empty 'database' or 'collection' name. Check your JSON keys: must be 'database' and 'collection' (lowercase)", i)
159+
}
160+
161+
nsKey := strings.ToLower(strings.TrimSpace(col.DatabaseName)) + "." + strings.ToLower(strings.TrimSpace(col.Name))
162+
if prev, ok := seenNamespaces[nsKey]; ok {
163+
return fmt.Errorf("duplicate namespace %q at index %d (already declared at index %d)", col.DatabaseName+"."+col.Name, i, prev)
164+
}
165+
seenNamespaces[nsKey] = i
166+
167+
for idxPos, idx := range col.Indexes {
168+
if len(idx.Keys) == 0 {
169+
return fmt.Errorf("collection %q has invalid index at position %d: index keys cannot be empty", col.Name, idxPos)
170+
}
171+
}
172+
if err := validateFieldConstraints(col.Fields, ""); err != nil {
173+
return fmt.Errorf("collection %q field validation failed: %w", col.DatabaseName+"."+col.Name, err)
174+
}
175+
}
176+
177+
return nil
178+
}
179+
180+
func validateFieldConstraints(fields map[string]CollectionField, prefix string) error {
181+
for name, field := range fields {
182+
path := name
183+
if prefix != "" {
184+
path = prefix + "." + name
185+
}
186+
if field.Min != nil && field.Max != nil && *field.Min > *field.Max {
187+
return fmt.Errorf("field %q has invalid min/max: min (%d) is greater than max (%d)", path, *field.Min, *field.Max)
188+
}
189+
if field.MinLength > 0 && field.MaxLength > 0 && field.MinLength > field.MaxLength {
190+
return fmt.Errorf("field %q has invalid minLength/maxLength: minLength (%d) is greater than maxLength (%d)", path, field.MinLength, field.MaxLength)
191+
}
192+
if len(field.Fields) > 0 {
193+
if err := validateFieldConstraints(field.Fields, path); err != nil {
194+
return err
195+
}
196+
}
197+
if field.Items != nil {
198+
if err := validateFieldConstraints(map[string]CollectionField{"[]": *field.Items}, path); err != nil {
199+
return err
200+
}
201+
}
202+
}
203+
return nil
204+
}

0 commit comments

Comments
 (0)