Skip to content

Conversation

@withzombies
Copy link
Owner

@withzombies withzombies commented Dec 10, 2025

Summary

This PR refactors the CI infrastructure to properly run integration tests with osquery and generate coverage reports.

Key Changes

CI Workflow (ci.yml)

  • Split into two jobs: Build (format, clippy, build) and Test & Coverage
  • Install osquery directly from GitHub releases for native integration tests
  • Add coverage measurement with cargo-llvm-cov and dynamic badge updates
  • Use scripts/ci-test.sh for orchestrating osqueryd + extensions + tests

New CI Test Script (scripts/ci-test.sh)

  • Detects osqueryd in PATH or common install locations
  • Falls back to Docker if osqueryd not found locally
  • Builds and autoloads logger-file and config-static extensions
  • Waits for extensions to register via log grep (not osqueryi queries)
  • Runs tests with --all-features including osquery-tests feature
  • Generates lcov coverage report with cross-platform awk calculations
  • Proper cleanup using pkill -f to kill osqueryd by name (handles tee pipe)

Bug Fixes

  • Fix cleanup hanging: osqueryd | tee & captures tee's PID not osqueryd's, use pkill -f instead
  • Fix extension registration check: osqueryi --socket connects AS an extension, use log grep instead
  • Fix coverage calculation: paste -sd+ and bc not available on macOS, use awk instead

README Badges

  • Fix CI badge URL: "Rust CI" → "CI" to match actual workflow name
  • Fix coverage link: coverage.ymlci.yml (no separate coverage workflow)

Coverage Badge Setup

The coverage badge uses schneegans/dynamic-badges-action to update a Gist. Requirements:

  • GIST_TOKEN secret with gist scope (verify in repo settings)
  • Gist 36626ec8e61a6ccda380befc41f2cae1 exists (currently empty {})
  • Badge will populate on first push to main after merge

Test Results

  • 161 tests passing (104 unit + 57 integration)
  • ~87% code coverage
  • CI completes in ~1.5 minutes

Test Plan

  • CI passes on PR branch
  • Local ./scripts/ci-test.sh works on macOS
  • Local ./scripts/ci-test.sh --coverage generates lcov.info
  • Verify GIST_TOKEN secret exists in repo settings
  • After merge: verify coverage badge updates

🤖 Generated with Claude Code

withzombies and others added 30 commits December 8, 2025 12:37
- Add mockall 0.13 as dev-dependency for mock generation
- Add 23 unit tests for table plugin functionality:
  - ReadOnlyTable: name(), columns(), generate(), routes()
  - Table: insert(), update(), delete() operations
  - TablePlugin enum dispatch (Readonly, Writeable variants)
  - Error paths: readonly errors, invalid action, bad JSON, missing params
  - Edge cases: empty rows, ping behavior
- Initialize bd (beads) for task tracking

Test coverage improved from ~15% to ~25% for table plugin module.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Enable testing Server without a real osquery daemon by extracting
the client interface into a mockable trait.

Changes:
- Add OsqueryClient trait with register_extension, deregister_extension, ping
- Rename Client struct to ThriftClient, add type alias for backwards compat
- Use mockall's #[automock] to auto-generate MockOsqueryClient
- Make Server generic over client type: Server<P, C: OsqueryClient = ThriftClient>
- Add Server::with_client() constructor for injecting mock clients
- Add 7 new Server tests using MockOsqueryClient

This maintains full backwards compatibility - existing code using
Client::new() and Server::new() works unchanged.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…_shutdown, join_listener_thread, and wake_listener

- cleanup_socket: 3 tests (removes socket, handles missing, skips when no uuid)
- notify_plugins_shutdown: 3 tests (single, multiple, empty plugins)
- join_listener_thread: 2 tests (no thread, finished thread)
- wake_listener: 2 tests (no path, with path)

Coverage: server.rs ~59% (up from 37.57%)
All tests pass, pre-commit hooks pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add testcontainers v0.26 with blocking feature to dev-dependencies
- Create osquery-rust/tests/integration_test.rs scaffold
- Use osquery/osquery:5.17.0-ubuntu22.04 Docker image
- Verify container startup with test_osquery_container_starts

This enables Docker-based integration testing for client.rs ThriftClient
which requires real osquery socket communication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Integration tests for ThriftClient that:
- Check osquery availability on system
- Find osquery socket from common paths or OSQUERY_SOCKET env var
- Test ThriftClient connection and ping when socket is available
- Gracefully skip tests when no osquery socket available

These tests support running with local osqueryi or inside Docker
alongside osquery. Unix sockets cannot span Docker boundaries, so
tests skip when no socket is found rather than failing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Update integration tests to fail (not skip) when osquery unavailable
- Add pre-commit hook that runs integration tests with native osquery
  (falls back to Docker if osquery not installed locally)
- Add GitHub Actions workflow for integration tests
- Add unit tests for Plugin enum dispatch (config, logger)
- Add unit tests for Response variants
- Add unit tests for ConfigPlugin wrapper
- Add unit tests for LoggerPlugin wrapper and features
- Add unit tests for QueryConstraint and Operator

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Enables integration tests to execute SQL queries against osquery by
expanding the OsqueryClient trait with query methods that were
previously only available via the internal TExtensionManagerSyncClient.

Changes:
- Add query(sql) and get_query_columns(sql) to OsqueryClient trait
- Implement both methods for ThriftClient (delegates to Thrift client)
- Add unit tests verifying MockOsqueryClient can mock these methods

This is the first task in the integration tests epic (bd-p6i).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Tests the OsqueryClient::query() method against osquery's built-in
osquery_info table. Verifies:
- Query executes successfully (status code 0)
- Response contains at least one row
- Test FAILS (not skips) when osquery unavailable

Part of bd-81n: Integration Tests for Full Thrift Coverage epic.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Tests full Server lifecycle: create server, register table plugin,
run in background thread, and gracefully stop via ServerStopHandle.

This test verifies:
- Server::new() successfully connects to osquery socket
- server.register_plugin() accepts TablePlugin
- server.run() starts the extension
- ServerStopHandle.stop() triggers graceful shutdown
- server thread joins cleanly

Test correctly FAILS (panics) when osquery socket unavailable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
This test validates the complete end-to-end flow for table plugins:
1. Creates a test table returning known data (id=42, name=test_value)
2. Registers the table extension with osquery via Server
3. Queries the table through osquery using a separate ThriftClient
4. Verifies the query returns the expected row with correct values

This completes the Integration Tests epic by exercising the full path
from extension registration through table query execution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
This enables integration test coverage in CI by:
- Starting osquery container before running coverage
- Setting OSQUERY_SOCKET env var for test discovery
- Cleaning up container even on failure (if: always())

Integration tests exercise client.rs, server.rs, and plugin code
paths that were previously unmeasured, improving coverage accuracy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Adds scripts/coverage.sh for running coverage locally with integration tests.
- Prefers local osqueryi (works on all platforms including ARM)
- Falls back to Docker on amd64 only (osquery image is amd64-only)
- Supports --html flag for HTML report generation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add a reusable OsqueryContainer struct that implements the testcontainers
Image trait, enabling Docker-based osquery instances for integration tests.
This provides a foundation for future tests that need isolated osquery
environments without requiring a local osquery installation.

The container builder supports:
- Config plugin configuration
- Logger plugin configuration
- Extension autoloading paths
- Environment variable injection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Extends OsqueryContainer with socket bind mount functionality to allow
host-built extensions to connect to osquery running in Docker containers.
This enables testing extensions against a real osquery instance.

Changes:
- Add socket_host_path and socket_mount fields to track bind mount config
- Add with_socket_path() builder to configure socket directory mounting
- Add socket_path() getter to get full socket path (dir + osquery.em)
- Implement Image::mounts() trait to provide bind mount to Docker
- Add wait_for_socket() helper with timeout for socket file polling

Note: On macOS with Colima/Docker Desktop, Unix domain sockets created
inside containers are visible on the host filesystem but not connectable
across the VM boundary. The test verifies socket file creation; full
end-to-end tests run inside Docker (see hooks/pre-commit).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…tests

Add comprehensive integration tests for logger and config plugins alongside
existing table plugin tests. Improve pre-commit hook to use osqueryd daemon
mode with autoload for full plugin lifecycle testing.

Key changes:
- Add test_logger_plugin_registers_successfully integration test
- Add test_logger_plugin_log_lifecycle integration test
- Add test_config_plugin_registers_successfully integration test
- Update pre-commit hook to use osqueryd with extension autoload
- Enhance coverage.sh script with --examples-only mode
- Update examples with improved error handling and CLI usage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
This enables testcontainers-based integration tests that build and run
Rust osquery extensions entirely inside Docker, avoiding Unix socket
limitations on macOS where sockets don't cross VM boundaries.

Key additions:
- docker/Dockerfile.test: Multi-stage build that compiles extensions
  with Rust 1.85+ and installs osquery 5.20.0 from GitHub releases
  (supports both amd64 and arm64 architectures)
- scripts/build-test-image.sh: Build script with verification
- .dockerignore: Excludes target/ and other build artifacts
- OsqueryTestContainer: testcontainers Image impl for pre-built image
- exec_query(): Helper to run SQL queries via osqueryi --connect

Usage: ./scripts/build-test-image.sh && cargo test osquery_container

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
This allows `cargo test` to run fast unit tests without requiring osquery
or Docker installed. Integration tests are gated behind feature flags:

- `osquery-tests`: Tests requiring osquery running (integration_test.rs)
- `docker-tests`: Tests that spawn Docker containers (test_*_docker.rs)

The pre-commit hook uses `--features osquery-tests` to run integration
tests with locally installed osquery. Docker wrapper tests can be run
manually with `--features docker-tests`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace apt-based osquery installation with Docker containers using
testcontainers-rs. This eliminates the need for managing local osquery
installation in CI and provides better test isolation.

Changes:
- Remove "Install osquery" step (apt install)
- Remove "Start osqueryd" step (manual process management)
- Remove "Stop osquery" cleanup step
- Add "Build test Docker image" step
- Use docker-tests feature flag for integration tests
- Increase timeout from 5 to 15 minutes for container startup

The Docker-based tests run all Category A (client), B (server), and
C (autoload) integration tests inside containers where both osquery
and extensions run together, avoiding Unix socket VM boundary issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Reduce pre-commit hook from ~300 lines of bash orchestration to 65 lines
- Remove local osquery process management (replaced by testcontainers)
- Pre-commit now runs: fmt, clippy, unit tests, doc tests, docker-tests
- Close epic osquery-rust-nf4: Migrate Integration Tests to Testcontainers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace fixed 15-second sleeps with osqueryi --connect active polling
for faster extension readiness detection in Docker tests. Add test for
log_snapshot callback using scheduled queries. Enable LOG_EVENT feature
in file_logger example to receive snapshot events.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add wait_for_extension_registered() helper that polls osquery_extensions
table via ThriftClient until extension appears. This replaces fixed
2-second sleeps with responsive polling (100ms intervals, 10s timeout)
in test_server_lifecycle, test_table_plugin_end_to_end, and
test_logger_plugin_registers_successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update test_autoloaded_logger_receives_logs to verify the logger receives
actual osquery core log messages (format: 'file.cpp:line - message')
instead of just counting entries with severity markers. This ensures the
test validates meaningful osquery output rather than just plugin activity.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update test_autoloaded_config_provides_config to verify the scheduled
query content matches what our config plugin provides (contains
'file_events' table reference), not just that a row exists with that name.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add a uniquely-named canary schedule (rust_config_canary_7f3d2a) to the
config-static plugin with a specific query value (canary_value_abc123).
The config test now verifies this canary exists in osquery_schedule,
proving the config was actually applied by osquery - not just that the
plugin was registered.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add test_writeable_table_crud_operations that verifies INSERT, UPDATE,
and DELETE operations work correctly on the writeable_table extension.
Each operation is strictly verified by querying for expected values:
- INSERT: verify new row exists with exact values
- UPDATE: verify only targeted column changed
- DELETE: verify row no longer exists
- Final state: verify table returned to initial 3 rows

Also add writeable-table.ext to Docker extensions.load for autoloading,
and use explicit column selection for rowid (hidden column).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace standalone osquery container with testcontainers-based Docker
tests for coverage. This exercises real osquery I/O paths in client.rs
and server.rs, providing more accurate coverage measurement that
includes integration test code paths.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add osquery installation step from official apt repository so integration
tests can run on the host and be instrumented by cargo-llvm-cov. Docker
tests running inside containers cannot be instrumented from the host,
so native osquery enables full coverage measurement.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Three fixes for failing CI checks:

1. Docker Integration Tests: Remove Cargo.lock from .gitignore so it
   can be copied into the Docker build context

2. Coverage Workflow: Install osquery from GitHub releases tarball
   instead of apt repository (GPG key installation was failing).
   Run coverage measurement inside Docker container where osquery
   is available.

3. CI Workflow: Split into two jobs - basic build/unit tests on
   ubuntu/macos, and full integration tests inside Docker container
   where osquery socket is available.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Delete Docker test files (test_integration_docker.rs, test_client_docker.rs,
osquery_container.rs) and remove docker-tests feature from Cargo.toml.
CI will now run integration tests against native osquery on Ubuntu rather
than in Docker containers, enabling proper coverage instrumentation.

Changes:
- Delete osquery-rust/tests/test_integration_docker.rs
- Delete osquery-rust/tests/test_client_docker.rs
- Delete osquery-rust/tests/osquery_container.rs
- Remove docker-tests feature and testcontainers dependency
- Update integration_test.rs header comments (remove Docker references)
- Update pre-commit hook (remove Docker test execution)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace three GitHub Actions workflows (ci.yml, coverage.yml,
integration.yml) with a single consolidated ci.yml that:

- Runs on Ubuntu only (no macOS runners per epic requirements)
- Installs osquery 5.20.0 from GitHub releases
- Starts osqueryd with extensions autoloaded
- Runs cargo-llvm-cov with 90% coverage threshold enforcement
- Updates GitHub Gist badge on main push

Key implementation details:
- Socket permissions: chmod 777 after osqueryd creates socket
- Socket readiness: 30-second timeout with explicit error message
- Coverage threshold: >= 90% required (89.9% fails, 90.0% passes)
- Extensions autoload: Absolute paths in /etc/osquery/extensions.load

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
withzombies and others added 7 commits December 10, 2025 12:41
Extensions are workspace members, not traditional cargo examples, so they
are built to target/debug/ (not target/debug/examples/). Also fix binary
names:
- config_static (underscore, from [[bin]] name in Cargo.toml)
- logger-file (hyphen)
- two-tables (hyphen)

Changed:
- cargo build --examples → cargo build --workspace
- target/debug/examples/config-static → target/debug/config_static
- target/debug/examples/logger-file → target/debug/logger-file
- target/debug/examples/two-tables → target/debug/two-tables

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
osquery refuses to autoload extensions from directories with non-root
ownership. In CI, target/debug/ is owned by the runner user, not root.
Add --allow_unsafe flag to osqueryd to permit loading extensions from
this directory in the CI environment.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
osquery requires autoloaded extensions to end in .ext suffix. Create
symlinks (config_static.ext, logger-file.ext, two-tables.ext) pointing
to the actual binaries in target/debug/.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The autoloaded extensions need TEST_LOGGER_FILE to know where to write
logs. Using sudo -E preserves environment variables so osqueryd passes
them to the extension processes it spawns.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The logger-file extension reads FILE_LOGGER_PATH to determine where
to write logs. The integration tests read TEST_LOGGER_FILE to check
the log output. These must point to the same file for tests to pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Root cause: The file_logger extension was failing to start because
sudo -E wasn't properly passing FILE_LOGGER_PATH to child processes
spawned by osqueryd. The extension printed "Failed to create file
logger" and only 2/3 extensions registered.

Fix: Combine osqueryd startup and test execution into a single step
so environment variables stay in the same shell context. Also add:
- Pre-create log file with 666 permissions
- Verify osqueryd is still running before tests
- Show which extensions registered
- Show logger file contents for debugging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Create scripts/ci-test.sh that handles both development (osqueryi-only)
and CI (full osqueryd with extension autoload) environments:

- When osqueryd available: builds extensions, sets up autoload via
  extensions.load file, waits for socket + extension registration,
  runs all 10 integration tests
- When only osqueryi available: starts simple socket mode for basic
  integration tests (6 non-autoload tests)

The script manages lifecycle, cleanup, and coverage generation.
CI workflow updated to use this unified script.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
withzombies and others added 7 commits December 10, 2025 14:24
When piping osqueryd to tee with &, $! captures tee's PID, not osqueryd's.
This caused cleanup to kill only tee while osqueryd kept running, hanging
CI. Now use pkill -f to kill osqueryd by pattern matching its arguments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
osqueryi --socket connects AS an extension client, not as a query
interface. The previous check using osqueryi queries was failing
silently. Replace with grep for "registered * plugin" messages in
osqueryd.log which reliably indicates extension registration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Change CI badge from "Rust CI" to "CI" to match actual workflow name
- Change coverage link from coverage.yml to ci.yml (no separate coverage workflow)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@withzombies withzombies merged commit 315f7f9 into main Dec 10, 2025
2 checks passed
@withzombies withzombies deleted the testing-refactor branch December 10, 2025 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants