Commit d5c40d6
feat: Implement Bregman-native k-means++ initialization
Updates k-means++ initialization to use proper D^2 weighting with the
actual Bregman divergence instead of simplified random sampling:
- Proper probability-proportional sampling using D(x, nearest_center)
- Works correctly with all Bregman divergences (KL, Itakura-Saito, etc.)
- Improved numerical stability with NaN/Inf handling
- Fallback to random selection when all distances are zero
Algorithm:
1. Select first center uniformly at random
2. For each subsequent center:
- Compute D(x, nearest_center) for all points using the kernel
- Select next center with probability proportional to distance
3. Repeat until k centers are selected
This provides better initialization quality for non-Euclidean divergences,
leading to faster convergence and better local optima.
Also updates determinism test to validate proper k-means++ behavior
on more ambiguous data where different seeds can lead to different
local optima.
Reference: Nock, Luosto & Kivinen (2008) "Mixed Bregman Clustering"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>1 parent 98df68c commit d5c40d6
File tree
3 files changed
+109
-67
lines changed- src
- main/scala/com/massivedatascience/clusterer/ml
- test/scala/com/massivedatascience/clusterer/ml
3 files changed
+109
-67
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
204 | 204 | | |
205 | 205 | | |
206 | 206 | | |
207 | | - | |
| 207 | + | |
208 | 208 | | |
209 | 209 | | |
210 | 210 | | |
| |||
Lines changed: 73 additions & 52 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
370 | 370 | | |
371 | 371 | | |
372 | 372 | | |
373 | | - | |
| 373 | + | |
374 | 374 | | |
375 | | - | |
376 | | - | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
377 | 387 | | |
378 | 388 | | |
379 | 389 | | |
| |||
385 | 395 | | |
386 | 396 | | |
387 | 397 | | |
388 | | - | |
389 | | - | |
| 398 | + | |
390 | 399 | | |
391 | | - | |
392 | | - | |
| 400 | + | |
| 401 | + | |
393 | 402 | | |
394 | 403 | | |
395 | | - | |
| 404 | + | |
396 | 405 | | |
397 | 406 | | |
398 | | - | |
399 | | - | |
400 | | - | |
401 | | - | |
402 | | - | |
403 | | - | |
404 | | - | |
405 | | - | |
406 | | - | |
407 | | - | |
408 | | - | |
409 | | - | |
410 | | - | |
411 | | - | |
412 | | - | |
413 | | - | |
414 | | - | |
415 | | - | |
416 | | - | |
417 | | - | |
418 | | - | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
419 | 428 | | |
420 | | - | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
421 | 434 | | |
422 | 435 | | |
423 | | - | |
424 | | - | |
425 | | - | |
426 | | - | |
427 | | - | |
428 | | - | |
429 | | - | |
430 | | - | |
431 | | - | |
432 | | - | |
433 | | - | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
434 | 457 | | |
435 | | - | |
| 458 | + | |
| 459 | + | |
436 | 460 | | |
437 | | - | |
438 | | - | |
| 461 | + | |
439 | 462 | | |
440 | | - | |
441 | | - | |
442 | | - | |
443 | | - | |
444 | | - | |
445 | | - | |
446 | | - | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
447 | 467 | | |
448 | 468 | | |
449 | | - | |
| 469 | + | |
| 470 | + | |
450 | 471 | | |
451 | 472 | | |
452 | 473 | | |
| |||
Lines changed: 35 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
294 | 294 | | |
295 | 295 | | |
296 | 296 | | |
297 | | - | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
298 | 311 | | |
| 312 | + | |
299 | 313 | | |
300 | | - | |
| 314 | + | |
301 | 315 | | |
302 | 316 | | |
303 | | - | |
304 | | - | |
| 317 | + | |
| 318 | + | |
305 | 319 | | |
306 | 320 | | |
307 | | - | |
| 321 | + | |
308 | 322 | | |
309 | | - | |
310 | | - | |
311 | | - | |
312 | | - | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
317 | | - | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
318 | 339 | | |
319 | 340 | | |
0 commit comments