chore: pin java 17 and docs for CI

derrickburns · derrickburns · commit 1a7bc0fca5bf · 2025-12-15T18:20:09.000-08:00
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -208,7 +208,9 @@ jobs:
           path: target/**/scoverage-report/*
           retention-days: 7
       - name: Upload to Codecov
-        uses: codecov/codecov-action@v4
+        uses: codecov/codecov-action@v3
+        env:
+          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
         with:
           files: ./target/scala-2.12/scoverage-report/scoverage.xml
           fail_ci_if_error: false
diff --git a/.java-version b/.java-version
@@ -0,0 +1 @@
+17
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,34 @@
+# Repository Guidelines
+
+Use this guide to make concise, high-signal contributions to the generalized k-means clustering library.
+
+## Project Structure & Module Organization
+- Scala sources live in `src/main/scala` (DataFrame/ML API under `com.massivedatascience.clusterer.ml`), with version-specific shims in `src/main/scala-2.12` and `src/main/scala-2.13`. Legacy RDD code remains in `com.massivedatascience.clusterer`.
+- Tests use ScalaTest under `src/test/scala` with Spark-local fixtures; shared data is in `src/test/resources`. Executable examples sit in `src/main/scala/examples`.
+- Python wrapper lives in `python/` (`massivedatascience` package, examples, and tests). Docs and release notes are in `docs/`, `release-notes/`, `ARCHITECTURE.md`, and `DATAFRAME_API_EXAMPLES.md`.
+
+## Build, Test, and Development Commands
+- `sbt compile` — compile against the default Scala/Spark matrix; use `sbt ++2.13.14` or `sbt ++2.12.18` to pin versions.
+- `sbt test` — full JVM suite (ScalaTest, Spark local[2]); CI mirrors this with multiple Scala/Spark combos.
+- `sbt scalafmtAll` then `sbt scalastyle` — required format/lint gates (`.scalafmt.conf`, `scalastyle-config.xml`).
+- `sbt coverage test coverageReport` — generate coverage; keep kernels and persistence paths covered.
+- Python: `cd python && pip install -e .[dev] && pytest` (see `python/TESTING.md`).
+
+## Coding Style & Naming Conventions
+- Scalafmt enforces 2-space indent and 100-col limit; keep trailing commas and aligned parameters. Prefer immutable vals, small helpers, and Spark ML `Estimator/Model` patterns (`set*`/`get*`).
+- Naming: PascalCase classes/objects, camelCase methods/vals/params. Document public APIs with Scaladoc and mirror existing parameter docs.
+- In tests, disable the Spark UI and keep partitions small (follow existing suites) to avoid flakiness.
+
+## Testing Guidelines
+- Add ScalaTest `AnyFunSuite` cases under `src/test/scala`; keep seeds deterministic and assert numerical tolerances for divergences. Reuse existing fixtures/utilities.
+- Include persistence round-trips when adding models/params; CI validates cross-version save/load.
+- For Python changes, update `python/tests/test_generalized_kmeans.py` and run `pytest --cov=massivedatascience tests/`.
+
+## Commit & Pull Request Guidelines
+- Use conventional commits (`feat|fix|docs|style|refactor|perf|test|build|ci|chore`, optional scope): `type(scope): subject`.
+- PRs should summarize behavior changes, list executed commands (e.g., `sbt ++2.13.14 test`, `sbt scalafmtAll`, `pytest`), and link issues (`Closes #123`). Provide before/after snippets for API or doc updates; screenshots only when user-facing outputs change.
+- CI runs lint, Scala/Spark matrix tests, python smoke, and CodeQL; align local runs to reduce iteration.
+
+## Security & Configuration Notes
+- Target Java 17; avoid committing large datasets or credentials. Report vulnerabilities via `SECURITY.md`.
+- When modifying dependencies or persistence formats, consult `DEPENDENCY_MANAGEMENT.md` and `PERSISTENCE_COMPATIBILITY.md` to preserve cross-version compatibility.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -44,7 +44,7 @@ This project adheres to a code of conduct adapted from the [Contributor Covenant
 
 ### Requirements
 
-- **Java**: JDK 17
+- **Java**: JDK 17 (required; Spark/Hadoop will fail on newer JDKs)
 - **Scala**: 2.12.18 or 2.13.14 (managed by sbt)
 - **SBT**: 1.9.x or later
 - **Spark**: 3.4.0+ or 3.5.1+ (managed by sbt)
diff --git a/build.sbt b/build.sbt
@@ -16,7 +16,16 @@
       if (isCi) base :+ "-Xfatal-warnings" else base
     }
 
-    javacOptions ++= Seq( "-source", "17.0", "-target", "17.0")
+// Enforce Java 17 runtime/toolchain (Spark/Hadoop incompatibilities on JDK 21+)
+val javaSpecVersion = sys.props.getOrElse("java.specification.version", "0")
+val javaMajor       = javaSpecVersion.split('.').lastOption.flatMap(_.toIntOption).getOrElse(0)
+if (javaMajor != 17) {
+  throw new RuntimeException(
+    s"Incompatible JDK detected: $javaSpecVersion. Please run sbt with Java 17 (spark/hadoop require it)."
+  )
+}
+
+javacOptions ++= Seq("-source", "17.0", "-target", "17.0")
     publishMavenStyle := true
     Test / publishArtifact := false
     pomIncludeRepository := { _ => false }