Commit 45a54fa
feat: Add Mini-Batch K-Means implementation
Implements Mini-Batch K-Means algorithm for efficient clustering of large datasets:
- MiniBatchKMeans estimator with Spark ML API compatibility
- Incremental center updates using η = 1/(count+1) learning rate
- Support for all Bregman divergences (SE, KL, Itakura-Saito, L1, spherical)
- Early stopping based on no improvement for N consecutive batches
- Configurable batch size, reassignment ratio, and convergence tolerance
Key parameters:
- batchSize: samples per mini-batch (default: 1024)
- maxNoImprovement: early stopping patience (default: 10)
- reassignmentRatio: for empty cluster handling (default: 0.01)
Test suite includes 13 tests covering:
- Basic clustering with various divergences
- Early stopping behavior
- Deterministic results with fixed seed
- Parameter validation
Reference: Sculley (2010) "Web-Scale K-Means Clustering"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>1 parent fc4d8c0 commit 45a54fa
File tree
3 files changed
+925
-1
lines changed- src
- main/scala/com/massivedatascience/clusterer/ml
- test/scala/com/massivedatascience/clusterer/ml
3 files changed
+925
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
165 | 209 | | |
166 | 210 | | |
167 | 211 | | |
| |||
0 commit comments