Skip to content

Comprehensive testing, Cassandra rebuild, Confluent conformance, and documentation overhaul#263

Closed
millerjp wants to merge 95 commits intomainfrom
feature/testing
Closed

Comprehensive testing, Cassandra rebuild, Confluent conformance, and documentation overhaul#263
millerjp wants to merge 95 commits intomainfrom
feature/testing

Conversation

@millerjp
Copy link

Summary

This is a major branch that brings the project to public-release quality. It spans 84 commits, 216 files changed, ~73,600 lines added across testing infrastructure, storage engine fixes, Confluent API conformance, and a complete documentation overhaul.

Merge strategy: This PR SHOULD be squash merged to keep main history clean.


What Changed

1. Comprehensive Test Suite (~50,000 lines of test code)

Built a multi-layered test suite from scratch, covering every component of the system:

Unit Tests (~900 test functions)

  • Added unit tests for all previously untested packages: handlers, admin, account, audit, tls, ldap, oidc, association, cluster, context, rules, exporter, factory, compatibility/checker, compatibility/modes, compatibility/result, schema/types, schema/protobuf/resolver
  • Expanded existing tests for auth, registry, config, avro/parser, jsonschema/parser, protobuf/parser, avro/checker, jsonschema/checker, protobuf/checker
  • Added TestOpenAPISpecMatchesRoutes — bidirectional sync test that fails the build if api/openapi.yaml and the chi router drift apart

Storage Conformance Suite (108 tests x 4 backends)

  • Shared test suite in tests/storage/conformance/ that runs identically against memory, PostgreSQL, MySQL, and Cassandra
  • Covers: schema CRUD, subject operations, config operations, auth/user/API key operations, import operations, and all sentinel error conditions
  • Ensures all backends implement the Storage interface with identical behavior

BDD Tests (76 feature files, ~1,400 scenarios, ~24,000 lines of Gherkin)

  • Complete Gherkin specification of system behavior using godog
  • Covers: all API endpoints, all 7 compatibility modes for Avro/Protobuf/JSON Schema, transitive compatibility (3+ version chains), schema references, import API, mode enforcement, deletion lifecycle, error codes, edge cases, operational resilience
  • Runs in-process (memory, no Docker) and against all backends via Docker Compose
  • Runs against Confluent Schema Registry 8.1.1 to verify behavioral parity
  • Docker Compose infrastructure for all backends including Confluent + Kafka (KRaft mode)
  • Webhook sidecar for operational resilience testing (service restart, crash recovery)

Integration Tests

  • Full HTTP API stack testing against real database backends
  • Auth integration tests: user management, API key management, password auth, RBAC enforcement, bootstrap admin workflow

Concurrency Tests (10 scenarios)

  • Multi-instance concurrent schema registration, version updates, mixed operations
  • Hot subject contention, schema idempotency, global ID uniqueness
  • Runs against PostgreSQL, MySQL, and Cassandra

Auth Integration Tests

  • LDAP tests with OpenLDAP container (bind auth, group-to-role mapping, RBAC)
  • OIDC tests with Keycloak 24.0 (bearer token auth, group claims, token validation)
  • Vault tests with HashiCorp Vault 1.15 (KV v2 as auth storage backend)

Confluent Wire-Compatibility Tests

  • Go, Java (4 Confluent versions: 8.1, 7.9, 7.7.4, 7.7.3), Python (3 versions: 2.8.0, 2.7.0, 2.6.1)
  • Tests actual Confluent serializer/deserializer round-trip compatibility for Avro, Protobuf, JSON Schema

Migration Tests

  • Import with preserved schema IDs, duplicate handling, reference preservation, multi-version import, schema type preservation

2. Cassandra Storage Engine Rebuild

Complete rewrite of the Cassandra storage backend for Cassandra 5.0+:

  • SAI (Storage Attached Indexes) replacing legacy secondary indexes
  • Batch writes for atomic multi-table operations
  • Block-based ID allocation for reduced LWT contention
  • Lightweight transactions for atomic fingerprint deduplication
  • Fixed: infinite recursion in GetSchemaByID, reference loading, orphaned schema cleanup, concurrent idempotency races, deleted schema handling in fingerprint lookup
  • CI upgraded from Cassandra 4.1 to 5.0

3. Confluent API Conformance

Extensive fixes to match Confluent Schema Registry behavior:

  • Compatibility checkers aligned (Avro aliases, JSON Schema 13 new check categories, Protobuf structural import comparison)
  • Error codes corrected (lookup, soft-deleted subjects, fetchMaxId)
  • Import mode enforcement, compare-and-set, readonly enforcement
  • Metadata/rulesets support, subject aliases, reserved fields validation
  • Schema fingerprint deduplication fixed (Avro default values in fingerprint)
  • Cross-subject schema reference resolution implemented

4. Storage Backend Fixes

  • All backends: Global schema ID deduplication, permanent delete validation, global config/mode defaults aligned, error handling standardized
  • PostgreSQL: Error codes corrected for lookup and soft-deleted subjects
  • MySQL: Added deleted filter to countSchemasBySubject
  • Cassandra: Major rebuild (see above)
  • Memory: Aligned with DB backends for global defaults

5. OpenAPI Specification & API Documentation

  • Expanded api/openapi.yaml from basic to comprehensive (3,186 lines) — all 47+ endpoints with parameters, request/response schemas, error codes, security schemes
  • Embedded into binary via api/embed.go
  • scripts/generate-api-docs.sh — generates Markdown API reference (7,114 lines) and ReDoc HTML from the spec
  • Auto-generated TOC injection for the Markdown output
  • Swagger UI served at GET /docs when server.docs_enabled: true
  • Bidirectional sync test prevents spec/router drift

6. CI Pipeline Overhaul

22 CI jobs covering every test layer:

  • Build + unit tests + coverage
  • Static analysis: golangci-lint, go vet, gosec, Trivy
  • Integration + concurrency tests per backend (PostgreSQL 15, MySQL 8, Cassandra 5.0)
  • Storage conformance per backend
  • BDD tests: in-process, memory, PostgreSQL, MySQL, Cassandra, Confluent 8.1.1
  • Auth tests: LDAP (OpenLDAP), Vault (HashiCorp Vault 1.15), OIDC (Keycloak 24.0)
  • Migration tests
  • Multi-language compatibility tests (Go/Java/Python)
  • Docker image build
  • Pre-compiled test binary artifact sharing between jobs

7. Makefile Overhaul

Self-contained test targets with automatic Docker container lifecycle:

  • make test-unit, test-bdd, test-integration, test-concurrency, test-conformance, test-api, test-ldap, test-vault, test-oidc, test-auth, test-migration, test-compatibility
  • All accept BACKEND=postgres|mysql|cassandra|all
  • make docs-api for API doc generation
  • make docker-build, make docker-run, make dev (hot reload)
  • Port allocation strategy to prevent conflicts between test layers

8. Documentation Overhaul (15 docs, ~14,000 lines)

Complete documentation suite for public release:

Document Lines Description
README.md 310 Professional landing page with logo, badges, feature comparison, architecture diagrams
docs/getting-started.md 353 Quick start, Kafka client integration (Java/Go/Python)
docs/installation.md 665 Docker, APT, YUM, binary, Kubernetes, from-source
docs/configuration.md 830 Complete YAML reference with all options, env vars, defaults
docs/storage-backends.md 327 PostgreSQL, MySQL, Cassandra 5+, in-memory setup and tuning
docs/schema-types.md 455 Avro, Protobuf, JSON Schema support with reference examples
docs/compatibility.md 405 All 7 compatibility modes with per-type rules
docs/api-reference.md 7,114 Auto-generated from OpenAPI spec (all endpoints, examples, error codes)
docs/authentication.md 802 All 6 auth methods, RBAC, user management
docs/security.md 339 TLS, rate limiting, audit logging, hardening checklist
docs/deployment.md 667 Topologies, Docker Compose, Kubernetes, systemd, HA
docs/monitoring.md 411 Prometheus metrics, alerting rules, structured logging
docs/migration.md 330 Migrating from Confluent with preserved schema IDs
docs/testing.md 701 Comprehensive testing strategy guide
docs/development.md 360 Building, test suite, code conventions, contributing
docs/troubleshooting.md 379 Common issues, diagnostics, error code reference

All docs follow RFC 2119 conventions, include a ## Contents TOC, and cross-reference each other.


Test Plan

  • All unit tests pass (make test-unit)
  • Storage conformance passes on all 4 backends (make test-conformance BACKEND=all)
  • BDD tests pass in-process (make test-bdd)
  • BDD tests pass on all backends (make test-bdd BACKEND=all)
  • BDD tests pass against Confluent 8.1.1 (make test-bdd BACKEND=confluent)
  • Integration tests pass on all backends (make test-integration BACKEND=all)
  • Concurrency tests pass on all backends (make test-concurrency BACKEND=all)
  • Auth tests pass (LDAP, Vault, OIDC) (make test-auth)
  • Migration tests pass (make test-migration)
  • Compatibility tests pass (Go/Java/Python) (make test-compatibility)
  • API endpoint tests pass (make test-api)
  • Lint passes (make lint)
  • Docker image builds (make docker-build)
  • OpenAPI spec matches router (TestOpenAPISpecMatchesRoutes)
  • API docs regenerate cleanly (make docs-api)

Add 50 new tests across schema parsers and compatibility checker:
- Avro parser: deeply nested records, logical types, recursive types,
  records with defaults, PaymentEvent, namespaces, complex collections/unions
- Protobuf parser: deeply nested messages, complex maps, multiple top-level
  messages, PaymentEvent, proto3 optional, streaming services
- JSON Schema parser: cross-$ref, PaymentEvent, composition, deeply nested,
  conditional if/then/else, standalone non-object types
- Compatibility checker: all 7 modes across 3 schema types, transitive
  chains, edge cases, ParseMode, 4-version evolution scenarios
Add reusable storage conformance test suite with 108 test cases that can
run against any storage backend via RunAll(t, factoryFunc):
- Schema CRUD (25 tests)
- Subject operations (9 tests)
- Config and mode management (16 tests)
- Users and API keys (21 tests)
- Import and ID management (8 tests)
- Sentinel error verification (30 tests)
…(Phase 4)

Set up godog BDD test framework with in-process and Docker-based modes:
- godog test runner with tag filtering (~@operational for in-process)
- Docker Compose split files (base + per-backend overrides)
- Webhook sidecar for Docker container control (kill, restart, pause/unpause)
- Backend config files (memory, postgres, mysql, cassandra)
- Step definitions: schema, import, mode, reference, infrastructure
- Fresh httptest server per scenario for isolation
- BDD_REGISTRY_URL/BDD_WEBHOOK_URL env vars for Docker-based runs
…ase 5)

Comprehensive Gherkin features covering all API functionality:
- Schema types: Avro (15), Protobuf (14), JSON Schema (18) scenarios
  covering all type variants, nesting, collections, round-trips
- Compatibility modes: all 7 levels across 3 schema types, transitive
  3-version chains, per-subject overrides, check endpoint
- Schema references: cross-subject Avro, internal JSON $ref
- Import: bulk import with ID preservation, all schema types
- Mode management: READWRITE/READONLY/IMPORT, per-subject isolation
- API errors: all Confluent error codes (40401-50001), invalid schemas
- Health/metadata: cluster ID, server version, contexts endpoint
- Configuration: global/per-subject, all 7 levels, delete/fallback
- Deletion: soft/permanent delete, version isolation, deleted=true
Docker-based operational tests requiring webhook sidecar infrastructure:
- Memory: data loss on restart, ID reset after restart (2 scenarios)
- PostgreSQL: persistence, health on DB kill, recovery, pause/unpause,
  ID consistency (5 scenarios)
- MySQL: persistence, recovery, pause/unpause (3 scenarios)
- Cassandra: persistence, recovery (longer timeouts), pause/unpause (3 scenarios)
Add per-backend BDD test targets to Makefile:
- test-bdd-memory, test-bdd-postgres, test-bdd-mysql, test-bdd-cassandra
- test-bdd-all (runs all backends sequentially)
- test-bdd-functional (functional only, skip operational)
- test-all (unit + conformance + BDD in-process)

Add tests/PROGRESS.md documenting full test inventory and phase status.
Redesign the BDD test infrastructure to run the webhook process
directly inside the schema registry container instead of as a separate
sidecar. This fixes operational resilience tests on Podman/macOS where
Docker socket access is unavailable.

Key changes:
- Add Dockerfile.registry that builds the registry + webhook into a
  single container with entrypoint managing both processes
- Rewrite all webhook scripts for PID-based process control (restart,
  stop, start, kill, pause, unpause) instead of Docker API calls
- Fix zombie process reaping: start registry via intermediate shell so
  it's reparented to tini (PID 1) for proper wait() handling
- Add include-command-output-in-response to hooks.json for synchronous
  webhook execution
- Redirect registry stdout to /proc/1/fd/1 to avoid blocking webhook
  response pipe
- Add 5s HTTP client timeout to TestContext for pause/unpause scenarios
- Fix cleanup between operational scenarios (ensureRegistryRunning)
- Fix memory store version counter reset on permanent delete
- Fix hardcoded schema IDs in feature files to use stored values
- Expand operational_memory.feature from 2 to 13 scenarios covering
  restart, stop/start, SIGKILL recovery, pause/unpause, config/mode
  reset, and multiple restart cycles

All 160 BDD scenarios pass (147 functional + 13 operational).
Add 5 BDD test jobs to CI pipeline:
- bdd-functional-tests: in-process, no Docker, fast gate
- bdd-memory-tests: Docker Compose, functional + operational
- bdd-postgres-tests: Docker Compose, functional + operational
- bdd-mysql-tests: Docker Compose, functional + operational
- bdd-cassandra-tests: Docker Compose, functional + operational

Backend jobs depend on functional tests passing first to avoid
wasting resources when tests are fundamentally broken.

Also trigger CI on feature/** branch pushes.
- Fix gofmt import ordering in bdd_test.go (stdlib before third-party)
- Fix MySQL healthcheck: use TCP query instead of socket-based ping
  that passes against MySQL's temporary init server before real server
  is ready. Add start_period and increase retries for CI runners.
- Fix Cassandra healthcheck: add start_period and increase interval
  for slower CI runners.
- Fix start-service.sh: send SIGCONT to paused (SIGSTOP'd) processes
  so ensureRegistryRunning works after pause scenarios.
- MySQL: backtick-quote table names in TRUNCATE (schemas is reserved)
- Cassandra: add retry loop in entrypoint.sh for DB connection timing
- PostgreSQL: fix health check scenario to use waitForUnhealthy
- All backends: remove register-during-pause step that causes timeouts
- PostgreSQL: fix stored key mismatch (before_id → schema_id)
Root cause: gocql CreateSession() fails with "no connections were made"
when cluster.Keyspace is set but the keyspace doesn't exist. The regular
CI pre-creates the keyspace before running tests, but the BDD Docker
Compose didn't.

Fix: Cassandra healthcheck now creates the keyspace (idempotent) so it
exists before the registry starts. Also add start_period: 90s to the
schema-registry healthcheck to give the entrypoint retry loop enough
time for slow-starting backends.
- GetSchemaBySubjectVersion: return ErrVersionNotFound for deleted versions
- GetSchemasBySubject: return ErrSubjectNotFound when subject has no versions
- DeleteSubject: return ErrSubjectNotFound when subject doesn't exist
- GetSubjectsBySchemaID: validate schema ID exists before scanning subjects
- GetVersionsBySchemaID: validate schema ID exists before scanning subjects

These bugs were uncovered by BDD tests running against the Cassandra
backend — the memory backend already handled these cases correctly.
Add PostgreSQL, MySQL, and Cassandra conformance tests that run the
same ~100 tests against each backend, ensuring identical Storage
interface behavior. Add storage-conformance CI job.
Run PostgreSQL, MySQL, and Cassandra conformance tests as independent
CI jobs so they execute in parallel rather than serially.
Conformance jobs now depend on postgres-tests, mysql-tests, and
cassandra-tests so they only run once all integration tests succeed.
All four conformance backends now appear as separate CI jobs.
GetSchemaByID calls GetVersionsBySchemaID, which was calling
GetSchemaByID to validate schema existence, creating infinite
recursion. Replace with direct schema_by_id table query in both
GetSubjectsBySchemaID and GetVersionsBySchemaID.
Each sub-test calls defer store.Close(). For DB backends sharing a
single connection, this killed the connection after the first test.
Wrap shared stores with noCloseStore so Close() is a no-op in
sub-tests; the real Close() happens in the parent TestXxxBackend.
G704 (SSRF): admin CLI uses user-provided --server flag, not tainted
G705 (XSS): schema content from storage, response has registry content type
G117 (secret): OIDC config struct field, not a hardcoded secret
G117 flags all config struct fields named Password/Secret — these are
legitimate config structs, not hardcoded secrets. G202 flags
parameterized SQL query building using $N placeholders. Both are
false positives introduced by a newer gosec version.
PostgreSQL/MySQL fixes:
- Fix column name typo 'schema' -> 'schema_text' in GetSchemaByGlobalFingerprint
- Fix missing backticks on reserved table name in MySQL GetSchemaByFingerprint
- Fix SubjectExists to filter deleted rows (PostgreSQL)
- Fix GetSchemasBySubject to return empty slice vs ErrSubjectNotFound when
  subject exists but all versions are soft-deleted
- Fix ListSchemas LatestOnly query args mismatch
- Re-insert default global config/mode after table truncation

Cassandra fixes:
- Fix DeleteConfig/DeleteMode to return ErrNotFound when key doesn't exist
- Fix SubjectExists to check for non-deleted versions
- Fix GetLatestSchema to skip soft-deleted versions
- Fix GetSchemasBySubject to handle includeDeleted correctly
- Fix ListUsers to sort by ID
- Fix UpdateUser to detect duplicate usernames
- Fix CreateAPIKey to detect duplicate hashes
- Add reference tracking (schema_references + references_by_target) in CreateSchema

Conformance test fixes:
- Create users before API keys in auth tests (FK constraint compliance)
- Adjust schema dedup tests to work with all backends
PostgreSQL/MySQL:
- Fix UpdateAPIKey to include key_hash in UPDATE statement
- Fix UpdateAPIKeyLastUsed to check RowsAffected and return ErrAPIKeyNotFound

MySQL:
- Add id_alloc table for sequential NextID/SetNextID (replaces AUTO_INCREMENT read)
- Fix NextID off-by-one: use atomic SELECT FOR UPDATE + UPDATE on id_alloc

Cassandra:
- Fix CreateSchema to return ErrSchemaExists for duplicate fingerprint in same subject
- Use user-provided fingerprint when set (matches PostgreSQL/MySQL behavior)
- Fix GetSchemaBySubjectVersion to distinguish ErrSubjectNotFound vs ErrVersionNotFound
- Fix DeleteSchema to check existence before delete with proper error types

Tests:
- Fix error_tests.go: create users before API keys (FK constraint)
- Set valid ExpiresAt on API keys for MySQL compatibility
- Add id_alloc to MySQL truncation and re-initialization
Match PostgreSQL behavior: only count non-deleted schemas when
checking if a subject exists.
…rt references

- GetSchemaByFingerprint: build result directly instead of calling
  GetSchemaBySubjectVersion, which rejects deleted versions even when
  includeDeleted=true
- ImportSchema: write references to both schema_references and
  references_by_target tables, matching CreateSchema behavior
Add comprehensive handler-level unit tests:
- handlers_test.go: ~65 tests covering schema, subject, config,
  mode, and compatibility endpoints
- admin_test.go: ~40 tests covering user and API key admin endpoints
- account_test.go: ~9 tests covering self-service account endpoints

Total: 119 handler tests covering request parsing, response format,
error codes, and Confluent API compatibility.
Schema references are a first-class Confluent feature (since Platform
5.5) but were not being resolved — any schema using cross-subject
references would fail to parse, breaking Confluent compatibility.

Changes:
- Add Schema field to storage.Reference for resolved content
- Add resolveReferences() to registry layer, wired into all Parse
  and compatibility check call sites
- Avro parser: use avro.ParseWithCache to pre-register referenced
  named types
- JSON Schema parser: use compiler.AddResource for external $ref
- Protobuf resolver: store actual reference content for imports
- Add SchemaWithRefs type to compatibility interface so checkers
  can parse schemas that have cross-subject references
- Avro checker: parse with reference cache
- Protobuf checker: replace simpleResolver with checkerResolver
  that handles references and well-known types
- Add cross-subject reference tests for all three parser types
- Update all compatibility checker tests for new interface
…rites, and block-based IDs

Replace RDBMS-style patterns with Cassandra-native approaches:

- Add SAI indexes on subject_versions (schema_id, deleted) and schemas_by_id
  (fingerprint), eliminating schemas_by_fingerprint and subjects tables
- Batch reference writes in CreateSchema/ImportSchema with logged batches
- Batch soft-deletes in DeleteSubject with unlogged batch (same partition)
- Block-based ID allocation (default block size 50) reduces LWT frequency ~50x
- IN-clause batch reads in GetSchemasBySubject (2N+1 → 3 queries)
- SAI queries replace O(S×V) full-table scans in GetSubjectsBySchemaID,
  GetVersionsBySchemaID, cleanupOrphanedSchema, findSchemaInSubject, etc.
- Propagate errors in cleanup methods via slog.Warn instead of silent discard
- Update conformance test to remove dropped tables from truncation list

Requires Cassandra 5.0+ for SAI support. Breaking change — drops legacy tables.
All 1353 BDD tests pass against Cassandra.
The Cassandra storage layer now requires SAI (Storage Attached Index)
which was introduced in Cassandra 5.0.
…d tables from cleanup

- Re-check findSchemaInSubject on CAS retry to detect concurrent
  registrations of the same schema (fixes TestSchemaIdempotency)
- Remove schemas_by_fingerprint and subjects from BDD truncation list
  (tables were dropped in SAI migration)
Block-based ID allocator caches IDs in-process, but GetMaxSchemaID
reads from id_alloc table. After truncation, the table is empty and
GetMaxSchemaID fails, causing fetchMaxId responses to omit maxId.
gocql sessions are expensive to create (~500-1000ms each due to
topology discovery and connection pool setup). Previously, each
BDD scenario cleanup created and closed a new session, adding
significant overhead across 1355 scenarios. Now we lazily create
a single long-lived session and reuse it for all cleanup operations.
All 4 optimization phases are complete and CI-verified (23/23 green).
…s, OpenAPI spec, and bug fixes

- Makefile: 16 test targets (test-unit, test-bdd, test-integration, test-conformance,
  test-concurrency, test-migration, test-api, test-ldap, test-vault, test-oidc,
  test-auth, test-compatibility) with BACKEND= variable support and auto-detected
  container runtime (docker/podman)
- Helper scripts: start-db.sh, stop-db.sh, setup-ldap.sh, setup-vault.sh, setup-oidc.sh
  for Docker lifecycle management with sr-test-* container naming
- OpenAPI spec: complete 3100+ line spec with embedded serving at /docs endpoint
- Fix LDAP bootstrap.ldif: reorder users before groups so memberOf overlay works
- Fix migrate-from-confluent.sh: empty array expansion with set -u, container networking
- Fix concurrency test port conflict: 18081 → 28181 to avoid BDD container collision
- Fix migration test: dedicated container network for Podman macOS compatibility
Replace the SAI-based fingerprint dedup in ensureGlobalSchema with a
Lightweight Transaction (INSERT IF NOT EXISTS) on a new schema_fingerprints
table where fingerprint is the partition key.

The previous approach used an eventually-consistent SAI index on
schemas_by_id (where schema_id is the PK) to detect duplicate fingerprints.
Under concurrent registration of the same schema, two writers could both
miss each other's SAI entries, allocate different schema_ids, and create
duplicate global schemas — causing TestSchemaIdempotency failures.

The new schema_fingerprints table provides a true CAS: exactly one writer
wins the fingerprint claim and all others receive the winning schema_id
in the LWT response. An ensureSchemaData helper handles crash recovery
(fingerprint claimed but schemas_by_id data missing) by inserting the
data on the next request.

Also updates ImportSchema to claim fingerprints for consistency, and adds
a migration backfill step that populates schema_fingerprints from existing
schemas_by_id data for production upgrades.
Import mode preserves external IDs, so the same schema content can
legitimately have different IDs across subjects/imports. The fingerprint
LWT claim should not reject these — it's for CreateSchema dedup only.

Also add schema_fingerprints to BDD Cassandra cleanup truncation list.
The schema_fingerprints table may not exist if the schema-registry
hasn't finished migrating when the first BDD scenario cleanup runs.
Handle the "unconfigured table" error gracefully instead of failing
hard, which was causing the BDD Cassandra tests to hang.
Remove orphaned tracking and analysis files that are no longer
relevant as the work they tracked has been completed and merged.
Add 14 documentation guides covering all aspects of the registry:
- Getting started, installation, and configuration reference
- Storage backends (PostgreSQL, MySQL, Cassandra, memory)
- Schema types (Avro, Protobuf, JSON Schema) with references
- Compatibility modes, migration from Confluent, deployment
- Authentication (6 methods), security hardening, RBAC
- Monitoring (Prometheus metrics, alerting, Grafana)
- Development guide, troubleshooting, and error code reference

Add auto-generated API reference from OpenAPI spec:
- docs/api-reference.md (markdown, 7002 lines via widdershins)
- docs/api/index.html (ReDoc interactive HTML)
- scripts/generate-api-docs.sh for regeneration
- GitHub Actions workflow (workflow_dispatch) for CI generation

Rebuild README.md as a focused landing page with feature comparison
table, architecture diagrams, and documentation index.
Add consistent "## Contents" section with anchor links to all 14 docs
and README. Update generate-api-docs.sh to auto-generate and inject a
TOC into the api-reference.md output, positioned right after the title.
Restyle README to match AxonOps Workbench branding with centered logo,
shield badges, quick-links bar, centered tables, section dividers,
legal notices, and "Made with love" footer. Add AxonOps logo to assets.
Confluent Schema Registry stores schemas in Kafka (the _schemas topic),
not ZooKeeper. ZooKeeper was only used for leader election and was
removed in Confluent Platform 7.0. Update messaging to accurately state
the distinction: we use databases instead of Kafka for storage.
Move Feature Comparison to directly after "Why AxonOps Schema Registry"
for immediate visual impact. Replace Yes/No text with emoji ticks and
crosses for scannability. Update copyright year to 2026.
Replace Confluent-centric subtitle with one that highlights the
product's own value proposition: multi-backend storage and enterprise
security.
Add docs/testing.md covering all test layers in detail: unit tests,
storage conformance, integration, concurrency, BDD (76 feature files,
~1400 scenarios), API endpoint, auth (LDAP/OIDC/Vault), migration,
Confluent wire-compatibility, and OpenAPI validation. Includes test
pyramid, quick reference table, pre-commit workflow, and guidance on
which tests to write for each type of change.

Also fix Karapace OIDC/OAuth2 in feature comparison (supports it),
add Confluent trademark to legal notices, update Overview link to
point AxonOps to axonops.com, and add testing doc to README table.
Strip v1.0.0 from the auto-generated api-reference.md title via the
generation script. Fix TOC generation by exporting the TOC env var.
Add built-in API documentation (OpenAPI/Swagger UI/ReDoc) to the
README "Why" section.
Expand the terse "Contexts are single-tenant" bullet in the README with
a full explanation of what Confluent contexts are (multi-tenancy
namespaces for Schema Linking) and why we return only the default
context. Also clarify the cluster coordination difference. Update the
OpenAPI spec /contexts endpoint description and regenerate API docs.
Create GitHub issue #264 for multi-tenant context support with detailed
requirements, acceptance criteria, use cases, and implementation hints.
Link to the issue from README known differences, OpenAPI spec /contexts
endpoint, and auto-generated API reference. Add Multi-Tenant Contexts
and Schema Linking rows to the feature comparison table.
Karapace does not support schema registry contexts — no evidence in
their README, API docs, or codebase. Change from tick to cross.
Create docs/fundamentals.md covering what a schema registry is, the
problem it solves, core concepts (schemas, subjects, versions, IDs,
compatibility, references), producer/consumer serialization flow with
Mermaid diagrams, wire format, subject naming strategies, schema
evolution, compatibility modes, ID allocation, deduplication, modes,
and architectural overview. Link from README with a callout above the
"Why" section and in the documentation table.
@millerjp
Copy link
Author

Closing to address code review feedback. Will reopen with fixes.

@millerjp millerjp closed this Feb 16, 2026
millerjp added a commit that referenced this pull request Feb 16, 2026
…grity

Fixes 11 confirmed issues from PR review:

- Issues 1-2: Add schema_fingerprints table to PostgreSQL and MySQL for
  stable global schema IDs and reference preservation after permanent delete
- Issues 3-4: Enforce IMPORT mode for explicit ID registration and bulk
  import (error 42205)
- Issue 5: Propagate mode check errors instead of failing open
- Issue 7: Guard SetNextID against sequence rewind after import
- Issue 8: Include soft-deleted versions when computing next version in
  RegisterSchemaWithID
- Issue 9: Handle "latest" sentinel in findDeletedVersion for
  GET version?deleted=true
- Issue 10: Add external reference resolution to JSON Schema compatibility
  checker
- Issue 11: Fix Cassandra GetMaxSchemaID to query actual max instead of
  block allocator ceiling

Also adds BDD conformance tests covering all fixes (pr_fixes_conformance.feature)
and updates existing import feature files for IMPORT mode enforcement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant