Skip to content

Commit 1a7bc0f

Browse files
committed
chore: pin java 17 and docs for CI
1 parent e2fcfd9 commit 1a7bc0f

File tree

5 files changed

+49
-3
lines changed

5 files changed

+49
-3
lines changed

.github/workflows/ci.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,9 @@ jobs:
208208
path: target/**/scoverage-report/*
209209
retention-days: 7
210210
- name: Upload to Codecov
211-
uses: codecov/codecov-action@v4
211+
uses: codecov/codecov-action@v3
212+
env:
213+
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
212214
with:
213215
files: ./target/scala-2.12/scoverage-report/scoverage.xml
214216
fail_ci_if_error: false

.java-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
17

AGENTS.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Repository Guidelines
2+
3+
Use this guide to make concise, high-signal contributions to the generalized k-means clustering library.
4+
5+
## Project Structure & Module Organization
6+
- Scala sources live in `src/main/scala` (DataFrame/ML API under `com.massivedatascience.clusterer.ml`), with version-specific shims in `src/main/scala-2.12` and `src/main/scala-2.13`. Legacy RDD code remains in `com.massivedatascience.clusterer`.
7+
- Tests use ScalaTest under `src/test/scala` with Spark-local fixtures; shared data is in `src/test/resources`. Executable examples sit in `src/main/scala/examples`.
8+
- Python wrapper lives in `python/` (`massivedatascience` package, examples, and tests). Docs and release notes are in `docs/`, `release-notes/`, `ARCHITECTURE.md`, and `DATAFRAME_API_EXAMPLES.md`.
9+
10+
## Build, Test, and Development Commands
11+
- `sbt compile` — compile against the default Scala/Spark matrix; use `sbt ++2.13.14` or `sbt ++2.12.18` to pin versions.
12+
- `sbt test` — full JVM suite (ScalaTest, Spark local[2]); CI mirrors this with multiple Scala/Spark combos.
13+
- `sbt scalafmtAll` then `sbt scalastyle` — required format/lint gates (`.scalafmt.conf`, `scalastyle-config.xml`).
14+
- `sbt coverage test coverageReport` — generate coverage; keep kernels and persistence paths covered.
15+
- Python: `cd python && pip install -e .[dev] && pytest` (see `python/TESTING.md`).
16+
17+
## Coding Style & Naming Conventions
18+
- Scalafmt enforces 2-space indent and 100-col limit; keep trailing commas and aligned parameters. Prefer immutable vals, small helpers, and Spark ML `Estimator/Model` patterns (`set*`/`get*`).
19+
- Naming: PascalCase classes/objects, camelCase methods/vals/params. Document public APIs with Scaladoc and mirror existing parameter docs.
20+
- In tests, disable the Spark UI and keep partitions small (follow existing suites) to avoid flakiness.
21+
22+
## Testing Guidelines
23+
- Add ScalaTest `AnyFunSuite` cases under `src/test/scala`; keep seeds deterministic and assert numerical tolerances for divergences. Reuse existing fixtures/utilities.
24+
- Include persistence round-trips when adding models/params; CI validates cross-version save/load.
25+
- For Python changes, update `python/tests/test_generalized_kmeans.py` and run `pytest --cov=massivedatascience tests/`.
26+
27+
## Commit & Pull Request Guidelines
28+
- Use conventional commits (`feat|fix|docs|style|refactor|perf|test|build|ci|chore`, optional scope): `type(scope): subject`.
29+
- PRs should summarize behavior changes, list executed commands (e.g., `sbt ++2.13.14 test`, `sbt scalafmtAll`, `pytest`), and link issues (`Closes #123`). Provide before/after snippets for API or doc updates; screenshots only when user-facing outputs change.
30+
- CI runs lint, Scala/Spark matrix tests, python smoke, and CodeQL; align local runs to reduce iteration.
31+
32+
## Security & Configuration Notes
33+
- Target Java 17; avoid committing large datasets or credentials. Report vulnerabilities via `SECURITY.md`.
34+
- When modifying dependencies or persistence formats, consult `DEPENDENCY_MANAGEMENT.md` and `PERSISTENCE_COMPATIBILITY.md` to preserve cross-version compatibility.

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ This project adheres to a code of conduct adapted from the [Contributor Covenant
4444

4545
### Requirements
4646

47-
- **Java**: JDK 17
47+
- **Java**: JDK 17 (required; Spark/Hadoop will fail on newer JDKs)
4848
- **Scala**: 2.12.18 or 2.13.14 (managed by sbt)
4949
- **SBT**: 1.9.x or later
5050
- **Spark**: 3.4.0+ or 3.5.1+ (managed by sbt)

build.sbt

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,16 @@
1616
if (isCi) base :+ "-Xfatal-warnings" else base
1717
}
1818

19-
javacOptions ++= Seq( "-source", "17.0", "-target", "17.0")
19+
// Enforce Java 17 runtime/toolchain (Spark/Hadoop incompatibilities on JDK 21+)
20+
val javaSpecVersion = sys.props.getOrElse("java.specification.version", "0")
21+
val javaMajor = javaSpecVersion.split('.').lastOption.flatMap(_.toIntOption).getOrElse(0)
22+
if (javaMajor != 17) {
23+
throw new RuntimeException(
24+
s"Incompatible JDK detected: $javaSpecVersion. Please run sbt with Java 17 (spark/hadoop require it)."
25+
)
26+
}
27+
28+
javacOptions ++= Seq("-source", "17.0", "-target", "17.0")
2029
publishMavenStyle := true
2130
Test / publishArtifact := false
2231
pomIncludeRepository := { _ => false }

0 commit comments

Comments
 (0)