Skip to content

feat(cts): add spec-driven contract test suite for Python, Java, and Rust clients#343

Open
XuQianJin-Stars wants to merge 5 commits into
lance-format:mainfrom
XuQianJin-Stars:feat/spec-driven-contract-testing
Open

feat(cts): add spec-driven contract test suite for Python, Java, and Rust clients#343
XuQianJin-Stars wants to merge 5 commits into
lance-format:mainfrom
XuQianJin-Stars:feat/spec-driven-contract-testing

Conversation

@XuQianJin-Stars
Copy link
Copy Markdown
Contributor

@XuQianJin-Stars XuQianJin-Stars commented May 16, 2026

This PR introduces a two-layer contract test suite (CTS) for the Lance Namespace REST API:

  1. A wire-level layer that exercises every generated client (Python urllib3, Java apache, Java async, Rust reqwest) against a WireMock standalone server, with both the per-client tests and the stub mappings code-generated from the OpenAPI spec.
  2. A behavioural layer that drives the in-process Rust harness (lance-namespace-cts) directly against DirectoryNamespace, replaying YAML-authored contracts from docs/src/cts-contracts/.

Both layers are produced by code generators and never mutate the source spec.

Summary

End-to-end contract testing pipeline driven entirely by code generators on top of the OpenAPI spec and an authoritative behavioural-contract YAML set:

  • Wire CTS — per-client test files + WireMock stub mappings are generated from the spec, then replayed against WireMock standalone to verify HTTP serialization/deserialization across all generated language clients.
  • Behavioural CTS — per-operation Rust test modules are generated from cts-contracts/*.yaml and run in-process against DirectoryNamespace (no JVM, no WireMock, no Spring Boot), capability-gated so impls advertise what they actually support.

Motivation

Multiple generated clients (Python urllib3, Java apache, Java async, Rust reqwest) and a Spring Boot server are all derived from a single OpenAPI spec. Until now there was no automated mechanism to ensure that:

  • Each client correctly serializes requests / deserializes responses for every operation.
  • All clients remain mutually consistent in wire format.
  • Spec changes don't silently break a particular language binding.
  • Server implementations actually satisfy the behavioural contract (pre-conditions, error mapping, idempotency), not just the wire shape.

This PR makes both kinds of regression mechanically detectable: adding a new operation in the spec automatically yields new wire-level tests in every client on the next make gen-cts, and authoring a new entry in docs/src/cts-contracts/*.yaml automatically yields a new behavioural test module on the next make gen-cts-behavior.

Architecture

   OpenAPI spec ─────────────────────────────────────┐
        │                                            │
        ├── examples overlay ── WireMock mappings ──┐│
        │                                           ││
        └── per-client test generators              ││
                  │                                 ││
                  ▼                                 ▼▼
         tests (py / java / rust) ───►  WireMock standalone (replay)
                                                              [wire layer]

   docs/src/cts-contracts/*.yaml ──► gen_contract_tests.py
                                            │
                                            ▼
                       rust/lance-namespace-cts/tests/contracts/*.rs
                                            │
                                            ▼
                     DirectoryNamespace (in-process, capability-gated)
                                                              [behavioural layer]

The source spec is never mutated — example payloads are layered via an overlay file and applied at generation time only. The behavioural-contract YAML is the authoritative spec for everything beyond wire shape.

Generators (the heart of the pipeline)

File LOC Purpose
ci/cts/gen_wiremock_tests.py 2009 Generates per-operation WireMock-driven tests for Python urllib3, Java apache, Java async, and Rust reqwest clients from Mustache templates
ci/cts/gen_wiremock_mappings.py 389 Generates WireMock stub mappings from spec example payloads
ci/cts/gen_examples_overlay.py 281 Builds an OpenAPI overlay carrying example payloads for stub generation
ci/cts/apply_overlay.py 107 Applies overlays without mutating the source spec
ci/cts/gen_contract_tests.py 826 Generates the per-operation behavioural-contract Rust test modules from cts-contracts/*.yaml, post-processed with rustfmt --edition 2024
ci/cts/contract_loader.py 407 Parses, validates and resolves cts-contracts/*.yaml against the JSON Schema
ci/cts/lint_contracts.py 317 Strict structural / coverage / capability-gating lint over the contract YAML
ci/cts/capabilities.py 69 Capability registry shared by the loader, lint and the Rust harness

CI and Build Tooling

  • Makefile: new targets — wire layer (gen-cts / build-cts / test-cts-wiremock), behavioural layer (gen-cts-behavior / test-cts-behavior), default test-cts runs test-spec-lint + behavioural, plus verify-spec-untouched to fence the source spec.
  • ci/spectral.yaml — Spectral lint rules for the OpenAPI spec.
  • ci/cts/schemathesis.toml — Schemathesis config for property-based contract testing.
  • ci/cts/cts-contracts.schema.json — JSON Schema for the behavioural-contract YAML.
  • .github/workflows/contract-tests.yml — CI workflow running the full CTS pipeline (253 lines) with PR-blocking limited to the behavioural job; the WireMock job runs alongside as a wire-level signal.

Generated Wire-Level Client Tests

Auto-generated — do not hand-edit. Re-run make gen-cts instead.

Client File LOC Framework
Python urllib3 python/lance_namespace_urllib3_client/tests/test_wiremock.py 536 pytest + WireMock (dynamic port)
Java apache java/lance-namespace-apache-client/.../WireMockIT.java 540 JUnit 5 IT
Java async java/lance-namespace-async-client/.../WireMockIT.java 584 JUnit 5 IT
Rust reqwest rust/lance-namespace-reqwest-client/tests/wiremock.rs 737 tokio async + WireMock process lifecycle

Each suite exercises every Namespace / Table / Transaction / Tag / Data API operation end-to-end. (Rename note: these files used to be test_contract.py / WireMockContractIT.java / contract.rs; they were renamed to *wiremock* so that the word "contract" can be reserved for the new behavioural layer.)

Generated Behavioural-Contract Tests

Auto-generated — do not hand-edit. Re-run make gen-cts-behavior instead.

  • Source of truth: docs/src/cts-contracts/{main,index,namespace,table,tag,transaction,data}.yaml — one entry per operation describing pre-conditions, request/response shape, expected outcomes (success, 4xx, 409, …) and required capabilities.
  • Harness crate: rust/lance-namespace-cts/Fixtures, ContractCaller, Capabilities, assert_contract_{ok,error}. Uses DirectoryNamespace via cargo path on the sibling lance repository's lance-namespace-impls crate.
  • Generated tests: rust/lance-namespace-cts/tests/contracts/*.rs — 43 per-operation modules + a mod.rs wired into tests/cts.rs.
  • Local result: make test-cts-behavior → 170 passed.

Dependencies

  • Python: pyproject.toml / uv.lock — test deps.
  • Java: parent pom.xml, apache-client/pom.xml, async-client/pom.xml — WireMock + JUnit 5.
  • Rust: rust/Cargo.toml, rust/lance-namespace-reqwest-client/Cargo.toml (async test deps for the WireMock layer), rust/lance-namespace-cts/Cargo.toml (in-process harness, depends on the sibling lance repo via path).

Documentation

  • AGENTS.md: top-level agent-role overview.
  • CONTRIBUTING.md: refreshed contributor guide covering both CTS layers.
  • New README.md files for apache-client, async-client, springboot-server, urllib3-client, and reqwest-client.
  • docs/src/cts-contracts/ is wired into the docs site so the behavioural contracts ship as published reference material.

How to Run Locally

# Wire layer — generated clients vs WireMock
make gen-cts-wiremock     # regenerate stubs + per-client wire tests from the spec
make build-cts-wiremock   # build all clients, then drop generated wire tests into the trees
make test-cts-wiremock    # run the full WireMock suite (Python + Java apache + Java async + Rust)

# Behavioural layer — in-process Rust harness vs DirectoryNamespace
make gen-cts-behavior     # regenerate per-operation Rust test modules from cts-contracts/*.yaml
make test-cts-behavior    # cargo test -p lance-namespace-cts (170 tests)

# Default — what CI blocks on
make test-cts             # = test-spec-lint + test-cts-behavior + test-cts-wiremock

Quality Gates

All green locally on the tip of this branch:

  • cargo fmt --check
  • cargo clippy -D warnings
  • gen_contract_tests.py --check (generator output is a fixed point) ✅
  • lint_contracts.py --strict
  • make test-cts → 170 passed ✅
  • make test-cts-wiremock → 49 (Rust) + 50 (Python) + 49 (Java) ✅

@github-actions github-actions Bot added python Python features java Java features rust Rust features labels May 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@XuQianJin-Stars XuQianJin-Stars changed the title feat(cts): add spec-driven, generator-based contract test suite for Python, Java, and Rust clients feat(cts): add spec-driven contract test suite for Python, Java, and Rust clients May 16, 2026
@github-actions github-actions Bot added the enhancement New feature or request label May 16, 2026
@XuQianJin-Stars XuQianJin-Stars force-pushed the feat/spec-driven-contract-testing branch 2 times, most recently from cf6f8be to b697d88 Compare May 17, 2026 04:25
…ython, Java, and Rust clients

Build an end-to-end Contract Testing Suite (CTS) driven entirely by code
generators on top of docs/src/spec.yaml. Per-client test files and WireMock
stub mappings are produced from the spec, then replayed against WireMock
standalone to verify HTTP serialization/deserialization across all generated
language clients — without ever modifying the source spec.

Highlights
----------

* Single source of truth: contract tests, WireMock stubs, and examples
  overlays are all generated from the OpenAPI spec. The spec file itself
  stays untouched; the generators read it and emit derived artifacts under
  generator-owned trees only.

* Four client surfaces covered:
    - Python    : python/lance_namespace_urllib3_client/tests/test_contract.py
    - Java sync : java/lance-namespace-apache-client/.../cts/WireMockContractIT.java
    - Java async: java/lance-namespace-async-client/.../cts/WireMockContractIT.java
    - Rust      : rust/lance-namespace-reqwest-client/tests/contract.rs

* CTS scripts under ci/cts/:
    - gen_client_tests.py        — emit per-language contract tests
    - gen_wiremock_mappings.py   — emit WireMock stub mappings from spec
    - gen_examples_overlay.py    — derive request/response examples overlay
    - apply_overlay.py           — apply overlay to the spec at build time

* Make targets to wire the whole flow (Makefile, java/Makefile):
    - merge-spec / clean / gen-cts / test-cts
    - Per-language gen/build/test entry points kept generator-owned, so
      `make clean && make gen` always reproduces the same artifacts.

* CI workflow .github/workflows/contract-tests.yml runs:
    1. spec               — Spectral lint + breaking-change check (strict)
    2. client-conformance — matrix over java / python / rust, generates
                            WireMock stubs from the merged spec and
                            replays them against each generated client
                            (strict).

  The originally planned server-conformance job (Schemathesis vs Spring Boot
  reference server) is intentionally NOT added in this PR: the
  lance-namespace-springboot-server module currently only ships generated
  API interfaces — there is no @SpringBootApplication, no controller
  implementations, and no application.yml, so the server cannot actually
  start. The job will be re-added in a follow-up PR once a runnable
  reference server lands.

* Schemathesis configuration (ci/schemathesis.toml) is added and prepared
  for that future server-conformance job. It already disables the three
  Arrow IPC endpoints whose request bodies use
  application/vnd.apache.arrow.stream, which Schemathesis cannot
  auto-serialize:
    - CreateTable           POST /v1/table/{id}/create
    - InsertIntoTable       POST /v1/table/{id}/insert
    - MergeInsertIntoTable  POST /v1/table/{id}/merge_insert

* Spec lint rules tightened: ci/spectral.yaml adds project-specific
  ruleset; .github/workflows/spec.yml runs Spectral on every change to
  docs/src/spec.yaml.

* Generator-owned files isolated from hand-written CTS edits:
    - ci/patch_apache_pom.py and java/async-client-pom.xml keep
      OpenAPI-generated Maven POMs reproducible while letting CTS
      add WireMock/JUnit5 dependencies via a post-gen patch.
    - .gitignore updated so build/, merged spec, and generator output
      stay out of the tree.

Outcome
-------

`make clean && make gen && make test-cts` regenerates everything from
spec.yaml and runs the full contract test matrix locally. CI mirrors the
same flow: spec lint must pass, then java/python/rust contract tests must
all pass against WireMock stubs derived from the spec. Any future spec
change is immediately reflected in tests on the next `make gen`, making
client/spec drift impossible to merge silently.
@XuQianJin-Stars XuQianJin-Stars force-pushed the feat/spec-driven-contract-testing branch from b697d88 to 5deb9c0 Compare May 17, 2026 05:08
- ci/patch_reqwest_arrow_content_type.py (new) post-processes the
  Rust reqwest client to inject the Arrow Content-Type header on
  every operation declared with application/vnd.apache.arrow.stream
  in the spec. The stock OpenAPI Generator reqwest template emits
  req_builder.body(p_body) with no Content-Type, while Java and
  Python templates honour the spec's 'consumes'. The header is
  emitted in rustfmt's expected multi-line layout so the resulting
  files do not increase the existing cargo fmt --check diff. The
  patch is idempotent and derives the operation set from the spec,
  so future Arrow ops are covered automatically. Wired into
  rust/Makefile after gen-reqwest-client.

- ci/cts/gen_client_tests.py and ci/cts/gen_wiremock_mappings.py
  updated to keep the generated WireMock mappings and contract
  tests in sync with the spec (Arrow ops, request/response bodies
  and content types).

- Makefile: test-cts now depends on build-cts so that running
  'make test-cts' from a clean state regenerates contract test
  artifacts (e.g. tests/contract.rs) before invoking cargo test.
  Previously test-clients only depended on gen-wiremock, which
  produced 'no test target named contract' when the rust client
  had been re-cleaned.

- rust/Makefile: invoke patch_reqwest_arrow_content_type.py at the
  end of gen-reqwest-client.
Regenerated artifacts produced by 'make build-cts' after the changes
in the previous commit:

- rust/lance-namespace-reqwest-client/src/apis/data_api.rs and
  table_api.rs now carry the Arrow Content-Type header injected by
  ci/patch_reqwest_arrow_content_type.py for every operation
  declared with application/vnd.apache.arrow.stream in the spec
  (insert_into_table, merge_insert_into_table, query_table, ...).
- rust/lance-namespace-reqwest-client/tests/contract.rs regenerated
  by ci/cts/gen_client_tests.py.
- java/lance-namespace-apache-client/.../WireMockContractIT.java
  and java/lance-namespace-async-client/.../WireMockContractIT.java
  regenerated by ci/cts/gen_client_tests.py.
- python/lance_namespace_urllib3_client/tests/test_contract.py
  regenerated by ci/cts/gen_client_tests.py.

Verification:
- make test-cts: 49 (java apache) + 49 (java async) + 50 (python) +
  49 (rust) all green.
- cargo fmt --check on rust/: 766 diffs, identical to baseline
  (pre-existing, unrelated to this change).
…atize generators

- Move ci/patch_apache_pom.py, ci/patch_reqwest_arrow_content_type.py, ci/schemathesis.toml into ci/cts/

- Refactor gen_client_tests.py to render Java/Python/Rust contract harnesses via Mustache templates

- Add ci/cts/render.py and ci/cts/templates/ (apache/async java, python, rust, shared partials)

- Update Makefile, java/Makefile, rust/Makefile, pyproject.toml, uv.lock to reflect new paths and deps
@jackye1995
Copy link
Copy Markdown
Collaborator

Thanks for putting this together! The infrastructure work here (WireMock lifecycle, CI matrix, Makefile targets) is solid and could be reusable.

That said, I think we can get more value from a CTS by focusing on behavioral contract testing rather than serialization/deserialization. The generated clients are produced by OpenAPI codegen, so serde correctness is largely the code generator's responsibility — what we really need to validate is that implementations conform to the spec's behavioral contracts.

For example, for CreateNamespace:

  1. Creating the root namespace should fail — root always exists
  2. Creating an already-existing namespace should fail with ALREADY_EXISTS (error code 2)
  3. Creating a non-existing namespace should succeed and the namespace should then be describable

The spec already declares the expected error types for every operation in errors.md. For example, CreateNamespace can return NamespaceAlreadyExists (2), DropNamespace can return NamespaceNotFound (1) and NamespaceNotEmpty (3), etc. A CTS should codify when each of these error codes must be returned — e.g. CreateNamespace with an existing namespace path must return error code 2, DropNamespace on a namespace containing child namespaces must return error code 3. We may want to enrich the spec with these detailed per-operation error conditions first, then generate CTS tests that assert each condition against a real implementation.

Tests could also be parameterized by capability flags (supports_one_level_namespace_path, supports_two_level_namespace_path, etc.) to accommodate differences across implementations.

I'd suggest pivoting toward:

  • Defining expected behaviors per operation based on the spec (success cases, error conditions, state transitions)
  • Enriching the error spec with detailed conditions for when each error code should be returned
  • Generating tests that assert those behavioral contracts
  • Running against a reference server implementation rather than canned stubs

Happy to discuss further if you'd like to align on the direction before investing more time.

@XuQianJin-Stars XuQianJin-Stars force-pushed the feat/spec-driven-contract-testing branch 3 times, most recently from 15ab83e to c2f6706 Compare May 20, 2026 08:07
Introduce a second contract test suite for the Lance namespace REST API
that exercises operation *semantics* in-process, complementing — not
replacing — the existing wire-level WireMock suite. The two suites now
have distinct, non-overlapping concerns and a clean directory layout.

What's new
----------
* Authoritative spec under docs/src/cts-contracts/ split by domain:
  main.yaml, namespace.yaml, table.yaml, data.yaml, index.yaml,
  tag.yaml, transaction.yaml. Each case declares pre-conditions
  (`given`), the request (`when`) and the expected outcome (`then`:
  success / error_code / 4xx / 409 …) plus required capabilities.
* JSON Schema (ci/cts/cts-contracts.schema.json) and strict linter
  (ci/cts/lint_contracts.py) enforcing single-file ownership per
  operation, capability validity and shape correctness.
* Capability model: ci/cts/capabilities.py plus the per-impl manifest
  ci/cts/capabilities.directory.txt; cases requiring an unsupported
  capability are skipped at runtime instead of failing.
* ci/cts/contract_loader.py parses & validates contracts;
  ci/cts/gen_contract_tests.py renders one Rust test module per
  operation under rust/lance-namespace-cts/tests/contracts/ (43
  modules), post-processed with `rustfmt --edition 2024` so
  `cargo fmt --check` is a fixed point and `--check` mode catches
  drift in CI.
* New workspace member rust/lance-namespace-cts/ hosting the
  in-process harness — Fixtures, ContractCaller, Capabilities,
  assert_contract_{ok,error} — driving DirectoryNamespace from the
  sibling `lance` repo via a cargo path dependency. No cdylib, no
  network, runs as a normal `cargo test` target.

Disambiguation renames (the name "contract" now refers to the new
behavioural suite; the wire-level suite is consistently "wiremock"):
* ci/cts/gen_client_tests.py            -> gen_wiremock_tests.py
* templates/{rust,python,java_*}_contract.mustache
                                        -> *_wiremock.mustache
* rust   tests/contract.rs              -> tests/wiremock.rs
* python tests/test_contract.py         -> tests/test_wiremock.py
* java   WireMockContractIT.java        -> WireMockIT.java

Build / CI surface
------------------
* Make:
    - `make gen-cts-behavior`  / `make test-cts-behavior` drive the
      new suite (codegen + cargo test, no JVM).
    - `make gen-cts-wiremock`  / `make build-cts-wiremock` /
      `make test-cts-wiremock` keep the historical multi-language
      WireMock pipeline (renamed from the old gen-cts/build-cts/
      test-cts targets).
    - `make test-cts` is the umbrella target and runs the full matrix:
      test-spec-lint  +  test-cts-behavior  +  test-cts-wiremock.
* GitHub Actions (.github/workflows/contract-tests.yml):
    - `behavior-conformance` runs `make test-cts-behavior` on every
      push and pull_request — fast, no JVM, PR-blocking.
    - The WireMock matrix (java / python / rust) is opt-in: it only
      fires on push to long-lived branches and on manual
      workflow_dispatch with `run_wiremock=true`, keeping PR latency
      low.
* CONTRIBUTING.md gains a "Contract Tests (CTS)" section documenting
  the two suites, how to author a behavioural case, and the
  lint / codegen / run loop.

Quality gates (all green locally)
---------------------------------
  cargo fmt --check              OK
  cargo clippy -D warnings       OK
  gen_contract_tests --check     OK
  lint_contracts --strict        OK
  make test-cts-behavior         170 passed
  make test-cts-wiremock         49 (rust) + 50 (python) + 49 (java)
@XuQianJin-Stars XuQianJin-Stars force-pushed the feat/spec-driven-contract-testing branch from 0b3c5a2 to 7f5d303 Compare May 20, 2026 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java Java features python Python features rust Rust features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants