Commit e2fcfd9
Enhance broadcast threshold diagnostics
Implements #2 priority from CLAUDE.md backlog: "Better broadcast-threshold
diagnostics (include k × dim and configured threshold)".
Changes:
1. **AutoAssignment** (Strategies.scala:316-377)
- Added formatBroadcastSize() helper for human-readable sizes (B/KB/MB/GB)
- Enhanced BroadcastUDF selection log with k, dim, size in elements and bytes
- Comprehensive chunked broadcast warning with:
* k×dim calculation vs threshold with overage %
* Number of data scans required (Math.ceil(k / chunkSize))
* 4 actionable suggestions to improve performance
* Calculation of max k supported for current configuration
2. **BroadcastUDFAssignment** (Strategies.scala:45-95)
- Added formatBroadcastSize() helper
- Enhanced debug logging with k, dim, and broadcast size
- Proactive warning when broadcast exceeds 100MB (~12.5M elements)
- Warning includes potential issues and 4 actionable mitigations
3. **BroadcastDiagnosticsSuite.scala** (new)
- 7 comprehensive tests validating diagnostic messages:
* AutoAssignment threshold exceeded → chunked selection
* AutoAssignment below threshold → broadcast selection
* BroadcastUDFAssignment large broadcast warning (>100MB)
* formatBroadcastSize correctness across scales
* Chunk count calculation (k=250, chunkSize=100 → 3 passes)
* Threshold increase suggestions
* Max k calculation for given dimensionality
4. **README.md** (lines 130-185)
- Enhanced "Scaling & Assignment Strategy" section
- Documented all assignment strategy options (auto/crossJoin/broadcastUDF/chunked)
- Added "Broadcast Diagnostics" subsection with example warning output
- Guidance on interpreting warnings and tuning configurations
Validation:
- All 7 new tests pass
- Existing tests pass (verified with sbt test)
- Diagnostic messages confirmed in test output
- README examples match actual log output format
Risk: Low - diagnostic messages only, no algorithm changes
Compatibility: No API surface or persistence changes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>1 parent c0af6c0 commit e2fcfd9
File tree
3 files changed
+352
-14
lines changed- src
- main/scala/com/massivedatascience/clusterer/ml/df
- test/scala/com/massivedatascience/clusterer/ml/df
3 files changed
+352
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
133 | | - | |
| 133 | + | |
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
138 | | - | |
139 | | - | |
140 | | - | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
141 | 141 | | |
142 | 142 | | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
147 | 186 | | |
148 | 187 | | |
149 | 188 | | |
| |||
Lines changed: 81 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
45 | 59 | | |
46 | 60 | | |
47 | 61 | | |
| |||
50 | 64 | | |
51 | 65 | | |
52 | 66 | | |
53 | | - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
54 | 97 | | |
55 | 98 | | |
56 | 99 | | |
| |||
273 | 316 | | |
274 | 317 | | |
275 | 318 | | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
276 | 333 | | |
277 | 334 | | |
278 | 335 | | |
| |||
289 | 346 | | |
290 | 347 | | |
291 | 348 | | |
| 349 | + | |
292 | 350 | | |
293 | | - | |
| 351 | + | |
| 352 | + | |
294 | 353 | | |
295 | 354 | | |
296 | 355 | | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
297 | 361 | | |
298 | | - | |
299 | | - | |
300 | | - | |
301 | | - | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
302 | 377 | | |
303 | 378 | | |
304 | 379 | | |
| |||
0 commit comments