Overhaul test suite: shared fixtures, deterministic timing, and reduced boilerplate by wesm · Pull Request #17 · wesm/msgvault

wesm · 2026-02-02T22:31:32Z

This PR was the result of running roborev analyze refactor --per-file '*/*/*_test.go' and roborev analyze test-fixtures --per-file '*/*/*_test.go' and then using roborev fix to grind through the results. This is a very effective pattern for systematic refactoring.

Summary

Comprehensive overhaul of the test suite across 54 files (~12,600 lines added, ~12,000 removed). The goal is to make tests more reliable, readable, and maintainable as the codebase grows. No production code behavior changes.

Shared test infrastructure (`internal/testutil/`)

testutil/ptr — pointer/date helper functions used across packages
testutil/email — MakeRawEmail and slice assertions for MIME test data
testutil/dbtest — shared DB test helpers and typed TestDataBuilder replacing raw SQL fixtures
testutil/storetest — StoreFixture and MessageBuilder for store-layer tests
querytest.MockEngine — centralized query.Engine test double

Deterministic timing

Introduced a Clock interface on RateLimiter with a mock clock for tests
Replaced all time.Sleep-based synchronization with mock-clock timer polling
Eliminates flaky failures under CI load

Test pattern improvements

Converted many tests to table-driven format (search parser, aggregates, attachments, manifests, MCP error cases, etc.)
Extracted fluent builders: ManifestBuilder, GmailErrorBuilder, MessageBuilder (MIME), TestDataBuilder (Parquet)
Added assertion helpers (assertLevel, assertRow, assertSearchCount, etc.) to reduce repetitive check-and-fail blocks
Split the 5,800-line model_test.go into domain-specific test files (selection, search, navigation, view rendering)

Hardening

Fixed cross-test state leakage via defensive copies of shared slices and factory functions
Added duplicate-key detection in aggregate test helpers
Guarded fakeT against Fatal/FailNow/Skip/Skipf/SkipNow bypassing sentinel recovery
Added missing failure-path tests (nil summaries, DB errors, missing messages)
Fixed nondeterministic map iteration and header ordering in test fixtures

Developer tooling

Added .githooks/pre-commit hook that runs gofmt and golangci-lint on staged Go files
Enable with make setup-hooks — prevents lint failures from reaching CI

Test plan

make test — all tests pass
make lint — no new warnings
CI passes on all platforms

🤖 Generated with Claude Code

Move stubEngine and stubLLM to mocks_test.go and add newTestSession helper to reduce boilerplate in session tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…relocate mock tests - Merge ExecutorTestContext and DeletionMockTestContext into a single TestContext backed by gmail.DeletionMockAPI, eliminating ~100 lines of duplicate helpers - Remove local mockAPIWithErrors struct that reimplemented DeletionMockAPI functionality - Move TestDeletionMockAPI_* tests from deletion package to gmail package where they belong (internal/gmail/deletion_mock_test.go) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Combine six separate FormatSummary test functions into a single table-driven test and convert TestGenerateID to table-driven format, reducing boilerplate and improving test isolation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…rtions Remove tempPaths/att helpers, inline setup, add substring-based assertions, and cover additional cases (nil inputs, individual hash lengths). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add errorWithReason/errorWithDetail factory helpers to reduce boilerplate in test table entries - Panic on json.Marshal failure in gmailErrorBody instead of silently discarding the error Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace time.Sleep-based tests with a mock clock that allows advancing time deterministically, eliminating flaky timing-dependent tests. Replace drainBucket busy-loop with direct setTokens state manipulation. Consolidate throttle-related tests into subtests under a single TestRateLimiter_Throttle. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add generic runTool[T] helper that combines invocation, error checking, and JSON unmarshalling into a single call, eliminating repetitive boilerplate across all test functions - Add runToolExpectError helper for error-case assertions - Replace brittle switch-based callTool dispatcher with direct handler method references via toolHandler type - Convert procedural error sub-tests in TestGetMessage, TestGetAttachment, TestAggregate, TestListMessages, and TestAggregateInvalidDates to table-driven format - Extract createAttachmentFixture helper for file system test setup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Convert table-driven loops in TestExtractDomain, TestParseReferences, and TestParseDate to use t.Run() for better failure diagnostics. Sort header keys in makeRawEmail to ensure deterministic output. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Combine 11 individual TestDuckDBEngine_SearchFast_* functions into one table-driven test with subtests, reducing 159 lines of repetitive setup and assertion code to 52 lines. All test cases now use assertSubjects consistently instead of ad-hoc length checks and manual loops. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Load production schema.sql instead of hardcoded CREATE TABLE statements, preventing test/production schema drift - Add test data builder helpers (AddParticipant, AddMessage) on testEnv to reduce raw SQL in individual tests - Split monolithic sqlite_test.go (~2500 lines) into focused files: sqlite_testhelpers_test.go, sqlite_aggregate_test.go, sqlite_search_test.go, sqlite_crud_test.go Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Inline empty flag in addEmptyTable instead of mutating last table via withEmpty(), eliminating fragile state coupling - Extract createTempDirs and writeParquetFiles from build() for clarity Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…truct Replace parallel parseTestCase fields (wantFrom, wantTo, etc.) and check callback with a single `want Query` struct per test case. Consolidate TestParse_HasAttachment, TestParse_Dates, and TestParse_Sizes into the main TestParse table. Add assertQueryEqual helper that compares all Query fields with nil/empty slice equivalence. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… test Replace verbose if-err-Fatalf blocks with testutil.MustNoErr across all store tests for consistent, scannable error assertions. Combine UpsertMessage AllFields/MinimalFields into a single table-driven test. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ests Rewrite TestEnsureUTF8_AlreadyValid to use the shared runEncodingTests helper instead of reimplementing the iteration and assertion logic. Add a comment to TestEnsureUTF8_AsianEncodings documenting why exact string matching isn't feasible (chardet can't identify CJK encodings from short byte sequences without charset hints), and annotate the hex literals with the intended characters. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Move inline SQL verification into reusable helpers (assertRecipientCount, assertDisplayName, assertDeletedFromSource, assertBodyContains, etc.) in testenv_test.go, and extract MIME fixture data to fixtures_test.go. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Consolidate exported NewTestEnv and unexported newTestEnv into a single function that uses t.Cleanup for automatic resource cleanup instead of manual cleanup in error paths and returned cleanup functions - Accept variadic *Options parameter for configuration flexibility - Use testEmail constant instead of hardcoded string in profile setup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Split TestValidateRelativePath into two distinct loops: edge cases/invalid paths and shared valid paths fixture, removing the append-based merging. Rename helpers for clarity: writeAndVerify -> writeFileAndAssertExists, assertFileWritten -> writeFileAndAssertContent. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…alize assertions - Replace msgSummaries with msgSummary that takes explicit IDs, eliminating implicit sequential ID assignment and magic numbers in tests - Remove default view logic from stageForDeletion helper; all callers now explicitly pass the desired ViewType - Replace assertSingleFilter with generic assertStrings that supports arbitrary slice lengths - Simplify MultipleAggregates test assertion using assertStrings Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…st files Extract test infrastructure and tests from the monolithic model_test.go into focused files: setup_test.go (mocks, builder, helpers), selection_test.go, nav_test.go, search_test.go, and view_render_test.go. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…I with regex - Remove wantText from TestApplyHighlight table since it always equaled the input text; assertHighlight now enforces the invariant that highlighting never alters text content - Replace manual ANSI-stripping state machine with a simple regexp Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…n up test data - Add t.Parallel() to all test functions and subtests for faster execution - Add explicit name field to TestIsNewer cases documenting versioning rules - Extract repeated SHA256 hash strings into constants (testHash64) and use fmt.Sprintf to construct checksum test data - Add "fmt" import for checksum test data construction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ID assertion The setupManifest closures were capturing the parent test's t, which could abort the entire test on failure instead of just the subtest. Also restores the manifest ID assertion that was dropped during consolidation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The "mixed short hashes all reported" test case only checked for "No attachments exported" but did not verify that individual filenames appeared in the error output. Add wantAllInResult assertions for both "file.txt" and "file2.txt" error messages to prevent regressions in per-attachment error reporting. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The RateLimitExceeded case only tested reason-based detection with an empty message. Add a RateLimitExceededByMessage case that combines both quotaExceededMsg and the rateLimitExceeded reason to ensure the message-parsing path remains covered. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…against nil clock - Restore DeadlineExceeded coverage in TestRateLimiter_Acquire_ContextTimeout by using context.WithDeadline and advancing mock clock past the deadline - Add timeout context to TestRateLimiter_Acquire_Success to prevent hangs - Add ensureClock() helper that defaults nil clock to realClock{}, called in all public RateLimiter methods to prevent panics on zero-value structs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add tool name parameter to callToolDirect, runTool, and runToolExpectError so tests populate req.Params.Name matching real MCP behavior. Fix createAttachmentFixture comment that incorrectly claimed it returns a hash. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… subtests The table-driven consolidation dropped per-result field checks for FromEmail and HasAttachments. Restore these as switch-case assertions so mapping regressions in result fields are caught even when the query filter works. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…d IDs, support NULL emails - Drop messages_fts in setupTestDB and EnableFTS to prevent schema drift from silently breaking FTS-related tests or skipping coverage - Use AddParticipant return value instead of hardcoded FromID: 100 in TestAggregateBySenderName_FallbackToEmail - Change participantOpts.Email to *string so AddParticipant can insert SQL NULL, restoring NULL-vs-empty-string coverage in TestMatchEmptySenderName_MixedFromRecipients Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Merge TestStore_Message_Upsert and TestStore_UpsertMessage_Variations into a unified TestStore_UpsertMessage that covers both field variations (AllFields, MinimalFields) and upsert-update idempotency with stats verification in each subtest. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…est helpers Replace string-concatenated SQL with parameterized queries and st.Rebind() for consistency with other helpers and to avoid fragile SQL construction. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Prevents potential cross-test mutation if any test or model mutates the accounts slice via append or direct modification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… leakage The package-level pointer was shared across all tests, risking data races or state mutation if any test modified model.stats. Now each test gets a fresh copy via standardStats(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When MessagesBySourceID map is nil, scan the Messages map for a matching SourceMessageID instead of silently returning nil. Updated the comment to accurately describe the fallback behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add existence check when looking up aggregate functions by name in the test table, so a typo in aggName produces a clear test failure instead of a nil function panic. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ssage When AddMessage infers source_id from a conversation that doesn't exist (common in tests with foreign keys disabled), fall back to 1 instead of calling Fatalf. This prevents test breakage for callers that don't set up conversation rows before adding messages. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Previously, when both SourceID and ConversationID were provided, a failed conversation lookup was silently ignored, allowing inconsistent test data. Now the lookup error is fatal, matching the behavior of other validation paths. Also widen TestDB.T to testing.TB and add tests covering the mismatch and missing-conversation cases. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…idation tests AddAttachment now calls Fatalf when the messageID doesn't match any existing message, preventing silent no-ops that mask test setup mistakes. Added unit tests covering AddLabel valid names, AddMessage source defaults, AddAttachment auto-setting HasAttachments, and Build with empty auxiliary tables. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace global msgCounter with per-Fixture counter via Fixture.NewMessage() method, keeping the standalone NewMessage() with a global counter for backward compatibility. Add tests validating unique IDs, per-fixture determinism, and message creation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Define EncodedSamplesT once and use it for both the unexported canonical variable and the public API, eliminating the duplicated struct definition. Add a test verifying that EncodedSamples() returns independent copies that are not affected by mutation in other tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Header() now uses last-write-wins semantics instead of always appending, preventing duplicate headers when the same key is set twice. The CRLF test now scans all bytes to reject bare \n, catching mixed line endings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…bs paths on Windows Add a unit test verifying PathTraversalCases() returns independent slices on each call (the core motivation for the variable-to-function change). Also include /abs/path in the Windows test vectors since Windows APIs accept forward slashes as path separators. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ourceID The fallback that scanned the Messages map for a matching SourceMessageID used randomized Go map iteration, which could produce flaky tests if multiple messages shared the same source ID. Remove the fallback so tests must explicitly populate MessagesBySourceID for deterministic behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The AddMessage helper was silently swallowing all DB errors when looking up a conversation's source_id, masking real issues like connection failures or schema problems. Now only sql.ErrNoRows triggers the fallback to source_id=1; other errors correctly fail the test. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Avoids future collisions if the seeded dataset grows to include IDs at or above the previously hardcoded values (1000/2000). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Rename TestAddMessage_FailsWithoutSources to test the actual failure path (SourceID=0 with no sources) using a fakeT that captures Fatalf - Add TestAddAttachment_FailsWithMissingMessage to cover the fatal guard when attaching to a nonexistent message ID - Rename old test to TestAddMessage_ExplicitSourceID_BypassesCheck - Widen TestDataBuilder and helpers to accept testing.TB for testability Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ith package-level NewMessage Fixture.NewMessage now uses "fixture-msg-%d" prefix instead of "test-msg-%d", preventing duplicate SourceMessageID values when both builders are used against the same store. Adds a test covering the mixed-builder scenario. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…geBuilder Header field names are case-insensitive per RFC 2822, so setting "X-Custom" then "x-custom" should overwrite rather than create a duplicate. Uses strings.EqualFold for the comparison. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…asesIndependence Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…okup Closes testing gap: verifies that AddMessage correctly fails the test when QueryRow returns a DB error other than sql.ErrNoRows (e.g., closed connection), rather than silently falling back to source_id=1. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…rn semantics - fakeT now wraps a real testing.TB via newFakeT(t) so un-overridden methods don't nil-panic, making tests resilient to helper code changes. - Added comment clarifying that expectFatal's return value is always false and callers should check ft.failed instead. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…sageBuilder Header() now updates the key casing on overwrite (last-write-wins for both key and value). Added HeaderAppend() for intentional duplicate headers (e.g., multiple Received lines). Added tests for case-insensitive overwrite and HeaderAppend duplicate behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…sTest The test previously required both a recovered panic and a fatal message, coupling it to fakeT.Fatalf's panic implementation detail. Now it only asserts on ft.fatalMsg, making it resilient to changes in how fakeT signals fatal errors. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…l return value Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add BccIDs support to dbtest.MessageOpts and re-implement the BCC regression test from 9a9924c using the new test helpers and split file layout. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add Skipf and SkipNow overrides to prevent Goexit bypass - Set failed=true in Skip/Skipf/SkipNow so unexpected skips are caught by assertions checking ft.failed - Clear fatalMsg in FailNow to avoid stale state - Update comments to reflect all intercepted methods Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix unchecked recover() return value in dbtest_test.go - Remove unused assertDrillState helper in setup_test.go - Add .githooks/pre-commit that runs gofmt and golangci-lint To enable: git config core.hooksPath .githooks Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add make setup-hooks target and mention the hook in the Code Style section so contributors know to enable it. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wesm and others added 30 commits February 2, 2026 16:33

Refactor chat tests: extract mocks and add test setup helper

9fe7c4f

Move stubEngine and stubLLM to mocks_test.go and add newTestSession helper to reduce boilerplate in session tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Refactor attachment tests into table-driven format with stronger asse…

b5cd153

…rtions Remove tempPaths/att helpers, inline setup, add substring-based assertions, and cover additional cases (nil inputs, individual hash lengths). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wesm and others added 25 commits February 2, 2026 16:34

Defensively copy testAccounts slice in WithStandardAccounts

ab62ab3

Prevents potential cross-test mutation if any test or model mutates the accounts slice via append or direct modification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace manual level checks with assertLevel helper in nav_test.go

243113c

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use tdb.nextMessageID instead of hardcoded IDs in dbtest tests

ca4da91

Avoids future collisions if the seeded dataset grows to include IDs at or above the previously hardcoded values (1000/2000). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Guard against empty slice before accessing b[0] in TestPathTraversalC…

a0b4d13

…asesIndependence Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Override Fatal/FailNow/Skip on fakeT and remove misleading expectFata…

1957767

…l return value Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Port BCC RecipientName regression test into refactored test structure

84fef7d

Add BccIDs support to dbtest.MessageOpts and re-implement the BCC regression test from 9a9924c using the new test helpers and split file layout. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wesm force-pushed the test-suite-improvements branch from 3a5ac35 to 84fef7d Compare February 2, 2026 22:36

wesm and others added 3 commits February 2, 2026 16:39

Document pre-commit hook in Makefile and CLAUDE.md

2ffca64

Add make setup-hooks target and mention the hook in the Code Style section so contributors know to enable it. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wesm merged commit aaf1936 into main Feb 2, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul test suite: shared fixtures, deterministic timing, and reduced boilerplate#17

Overhaul test suite: shared fixtures, deterministic timing, and reduced boilerplate#17
wesm merged 111 commits intomainfrom
test-suite-improvements

wesm commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wesm commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Shared test infrastructure (internal/testutil/)

Deterministic timing

Test pattern improvements

Hardening

Developer tooling

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wesm commented Feb 2, 2026 •

edited

Loading

Shared test infrastructure (`internal/testutil/`)