Commit fd643d5
refactor: Introduce ClusteringKernel type hierarchy, shared orchestration, and center initialization
L1Kernel incorrectly extended BregmanKernel despite having no valid gradient
or inverse gradient, which could silently produce wrong centers when paired
with GradMeanUDAFUpdate. This refactor introduces a proper type hierarchy
and centralizes duplicated factory/initialization code across estimators.
Type hierarchy:
- New ClusteringKernel root trait (divergence, validate, name)
- BregmanKernel extends ClusteringKernel (adds grad/invGrad)
- L1Kernel reclassified to extend ClusteringKernel directly
- SparseClusteringKernel trait for sparse-optimized non-Bregman kernels
- GradMeanUDAFUpdate now has runtime require guard for BregmanKernel
- All consumer signatures widened from BregmanKernel to ClusteringKernel
Shared orchestration (ClusteringOps):
- Single source of truth for createKernel, createAssignmentStrategy,
createUpdateStrategy, createEmptyClusterHandler, validateDomain
- 10 estimators updated to delegate to ClusteringOps
Center initialization (CenterInitializer):
- Extracted k-means++ and random initialization into shared utility
- GeneralizedKMeans and BalancedKMeans now share initialization code
44 files changed, 235 insertions, 569 deletions. All 249 non-Spark tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 4bb21d5 commit fd643d5
File tree
44 files changed
+698
-569
lines changed- src
- main/scala/com/massivedatascience/clusterer/ml
- df
- kernels
- strategies
- impl
- test/scala/com/massivedatascience/clusterer/ml/df
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
44 files changed
+698
-569
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
10 | 18 | | |
11 | 19 | | |
12 | 20 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
| 146 | + | |
| 147 | + | |
146 | 148 | | |
147 | 149 | | |
148 | 150 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| |||
434 | 434 | | |
435 | 435 | | |
436 | 436 | | |
437 | | - | |
| 437 | + | |
438 | 438 | | |
439 | 439 | | |
440 | 440 | | |
| |||
481 | 481 | | |
482 | 482 | | |
483 | 483 | | |
484 | | - | |
| 484 | + | |
485 | 485 | | |
486 | 486 | | |
487 | 487 | | |
| |||
Lines changed: 15 additions & 66 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
191 | 191 | | |
192 | 192 | | |
193 | 193 | | |
194 | | - | |
| 194 | + | |
195 | 195 | | |
196 | 196 | | |
197 | | - | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
198 | 209 | | |
199 | 210 | | |
200 | 211 | | |
| |||
239 | 250 | | |
240 | 251 | | |
241 | 252 | | |
242 | | - | |
| 253 | + | |
243 | 254 | | |
244 | 255 | | |
245 | 256 | | |
| |||
502 | 513 | | |
503 | 514 | | |
504 | 515 | | |
505 | | - | |
506 | | - | |
507 | | - | |
508 | | - | |
509 | | - | |
510 | | - | |
511 | | - | |
512 | | - | |
513 | | - | |
514 | | - | |
515 | | - | |
516 | | - | |
517 | | - | |
518 | | - | |
519 | | - | |
520 | | - | |
521 | | - | |
522 | | - | |
523 | | - | |
524 | | - | |
525 | | - | |
526 | | - | |
527 | | - | |
528 | | - | |
529 | | - | |
530 | | - | |
531 | | - | |
532 | | - | |
533 | | - | |
534 | | - | |
535 | | - | |
536 | | - | |
537 | | - | |
538 | | - | |
539 | | - | |
540 | | - | |
541 | | - | |
542 | | - | |
543 | | - | |
544 | | - | |
545 | | - | |
546 | | - | |
547 | | - | |
548 | | - | |
549 | | - | |
550 | | - | |
551 | | - | |
552 | | - | |
553 | | - | |
554 | | - | |
555 | | - | |
556 | | - | |
557 | | - | |
558 | | - | |
559 | | - | |
560 | | - | |
561 | | - | |
562 | | - | |
563 | | - | |
564 | | - | |
565 | | - | |
566 | | - | |
567 | 516 | | |
568 | 517 | | |
569 | 518 | | |
| |||
Lines changed: 9 additions & 44 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
147 | | - | |
| 147 | + | |
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
151 | | - | |
| 151 | + | |
152 | 152 | | |
153 | 153 | | |
154 | 154 | | |
155 | | - | |
| 155 | + | |
156 | 156 | | |
157 | 157 | | |
158 | 158 | | |
| |||
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
206 | | - | |
| 206 | + | |
207 | 207 | | |
208 | 208 | | |
209 | 209 | | |
| |||
324 | 324 | | |
325 | 325 | | |
326 | 326 | | |
327 | | - | |
| 327 | + | |
328 | 328 | | |
329 | 329 | | |
330 | 330 | | |
| |||
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
351 | | - | |
352 | | - | |
| 351 | + | |
| 352 | + | |
353 | 353 | | |
354 | 354 | | |
355 | 355 | | |
| |||
389 | 389 | | |
390 | 390 | | |
391 | 391 | | |
392 | | - | |
| 392 | + | |
393 | 393 | | |
394 | 394 | | |
395 | | - | |
| 395 | + | |
396 | 396 | | |
397 | 397 | | |
398 | 398 | | |
| |||
404 | 404 | | |
405 | 405 | | |
406 | 406 | | |
407 | | - | |
408 | | - | |
409 | | - | |
410 | | - | |
411 | | - | |
412 | | - | |
413 | | - | |
414 | | - | |
415 | | - | |
416 | | - | |
417 | | - | |
418 | | - | |
419 | | - | |
420 | | - | |
421 | | - | |
422 | | - | |
423 | | - | |
424 | | - | |
425 | | - | |
426 | | - | |
427 | | - | |
428 | | - | |
429 | | - | |
430 | | - | |
431 | | - | |
432 | | - | |
433 | | - | |
434 | | - | |
435 | | - | |
436 | | - | |
437 | | - | |
438 | | - | |
439 | | - | |
440 | | - | |
441 | | - | |
442 | 407 | | |
443 | 408 | | |
444 | 409 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
233 | 233 | | |
234 | 234 | | |
235 | 235 | | |
236 | | - | |
| 236 | + | |
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
273 | | - | |
| 273 | + | |
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
| |||
Lines changed: 5 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
296 | 296 | | |
297 | 297 | | |
298 | 298 | | |
299 | | - | |
| 299 | + | |
300 | 300 | | |
301 | 301 | | |
302 | 302 | | |
| |||
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
346 | | - | |
| 346 | + | |
347 | 347 | | |
348 | 348 | | |
349 | 349 | | |
| |||
373 | 373 | | |
374 | 374 | | |
375 | 375 | | |
376 | | - | |
| 376 | + | |
377 | 377 | | |
378 | 378 | | |
379 | 379 | | |
| |||
413 | 413 | | |
414 | 414 | | |
415 | 415 | | |
416 | | - | |
| 416 | + | |
417 | 417 | | |
418 | 418 | | |
419 | 419 | | |
| |||
471 | 471 | | |
472 | 472 | | |
473 | 473 | | |
474 | | - | |
| 474 | + | |
475 | 475 | | |
476 | 476 | | |
477 | 477 | | |
| |||
0 commit comments