Skip to content

Overhaul test suite: shared fixtures, deterministic timing, and reduced boilerplate#17

Merged
wesm merged 111 commits intomainfrom
test-suite-improvements
Feb 2, 2026
Merged

Overhaul test suite: shared fixtures, deterministic timing, and reduced boilerplate#17
wesm merged 111 commits intomainfrom
test-suite-improvements

Conversation

@wesm
Copy link
Copy Markdown
Owner

@wesm wesm commented Feb 2, 2026

This PR was the result of running roborev analyze refactor --per-file '*/*/*_test.go' and roborev analyze test-fixtures --per-file '*/*/*_test.go' and then using roborev fix to grind through the results. This is a very effective pattern for systematic refactoring.

Summary

Comprehensive overhaul of the test suite across 54 files (~12,600 lines added, ~12,000 removed). The goal is to make tests more reliable, readable, and maintainable as the codebase grows. No production code behavior changes.

Shared test infrastructure (internal/testutil/)

  • testutil/ptr — pointer/date helper functions used across packages
  • testutil/emailMakeRawEmail and slice assertions for MIME test data
  • testutil/dbtest — shared DB test helpers and typed TestDataBuilder replacing raw SQL fixtures
  • testutil/storetestStoreFixture and MessageBuilder for store-layer tests
  • querytest.MockEngine — centralized query.Engine test double

Deterministic timing

  • Introduced a Clock interface on RateLimiter with a mock clock for tests
  • Replaced all time.Sleep-based synchronization with mock-clock timer polling
  • Eliminates flaky failures under CI load

Test pattern improvements

  • Converted many tests to table-driven format (search parser, aggregates, attachments, manifests, MCP error cases, etc.)
  • Extracted fluent builders: ManifestBuilder, GmailErrorBuilder, MessageBuilder (MIME), TestDataBuilder (Parquet)
  • Added assertion helpers (assertLevel, assertRow, assertSearchCount, etc.) to reduce repetitive check-and-fail blocks
  • Split the 5,800-line model_test.go into domain-specific test files (selection, search, navigation, view rendering)

Hardening

  • Fixed cross-test state leakage via defensive copies of shared slices and factory functions
  • Added duplicate-key detection in aggregate test helpers
  • Guarded fakeT against Fatal/FailNow/Skip/Skipf/SkipNow bypassing sentinel recovery
  • Added missing failure-path tests (nil summaries, DB errors, missing messages)
  • Fixed nondeterministic map iteration and header ordering in test fixtures

Developer tooling

  • Added .githooks/pre-commit hook that runs gofmt and golangci-lint on staged Go files
  • Enable with make setup-hooks — prevents lint failures from reaching CI

Test plan

  • make test — all tests pass
  • make lint — no new warnings
  • CI passes on all platforms

🤖 Generated with Claude Code

wesm and others added 30 commits February 2, 2026 16:33
Move stubEngine and stubLLM to mocks_test.go and add newTestSession
helper to reduce boilerplate in session tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…relocate mock tests

- Merge ExecutorTestContext and DeletionMockTestContext into a single TestContext
  backed by gmail.DeletionMockAPI, eliminating ~100 lines of duplicate helpers
- Remove local mockAPIWithErrors struct that reimplemented DeletionMockAPI functionality
- Move TestDeletionMockAPI_* tests from deletion package to gmail package where
  they belong (internal/gmail/deletion_mock_test.go)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Combine six separate FormatSummary test functions into a single
table-driven test and convert TestGenerateID to table-driven format,
reducing boilerplate and improving test isolation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rtions

Remove tempPaths/att helpers, inline setup, add substring-based assertions,
and cover additional cases (nil inputs, individual hash lengths).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add errorWithReason/errorWithDetail factory helpers to reduce
  boilerplate in test table entries
- Panic on json.Marshal failure in gmailErrorBody instead of
  silently discarding the error

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace time.Sleep-based tests with a mock clock that allows
advancing time deterministically, eliminating flaky timing-dependent
tests. Replace drainBucket busy-loop with direct setTokens state
manipulation. Consolidate throttle-related tests into subtests under
a single TestRateLimiter_Throttle.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add generic runTool[T] helper that combines invocation, error checking,
  and JSON unmarshalling into a single call, eliminating repetitive
  boilerplate across all test functions
- Add runToolExpectError helper for error-case assertions
- Replace brittle switch-based callTool dispatcher with direct handler
  method references via toolHandler type
- Convert procedural error sub-tests in TestGetMessage, TestGetAttachment,
  TestAggregate, TestListMessages, and TestAggregateInvalidDates to
  table-driven format
- Extract createAttachmentFixture helper for file system test setup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Convert table-driven loops in TestExtractDomain, TestParseReferences,
and TestParseDate to use t.Run() for better failure diagnostics. Sort
header keys in makeRawEmail to ensure deterministic output.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Combine 11 individual TestDuckDBEngine_SearchFast_* functions into one
table-driven test with subtests, reducing 159 lines of repetitive setup
and assertion code to 52 lines. All test cases now use assertSubjects
consistently instead of ad-hoc length checks and manual loops.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Load production schema.sql instead of hardcoded CREATE TABLE statements,
  preventing test/production schema drift
- Add test data builder helpers (AddParticipant, AddMessage) on testEnv
  to reduce raw SQL in individual tests
- Split monolithic sqlite_test.go (~2500 lines) into focused files:
  sqlite_testhelpers_test.go, sqlite_aggregate_test.go,
  sqlite_search_test.go, sqlite_crud_test.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Inline empty flag in addEmptyTable instead of mutating last table via
  withEmpty(), eliminating fragile state coupling
- Extract createTempDirs and writeParquetFiles from build() for clarity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…truct

Replace parallel parseTestCase fields (wantFrom, wantTo, etc.) and check
callback with a single `want Query` struct per test case. Consolidate
TestParse_HasAttachment, TestParse_Dates, and TestParse_Sizes into the
main TestParse table. Add assertQueryEqual helper that compares all Query
fields with nil/empty slice equivalence.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… test

Replace verbose if-err-Fatalf blocks with testutil.MustNoErr across all
store tests for consistent, scannable error assertions. Combine
UpsertMessage AllFields/MinimalFields into a single table-driven test.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ests

Rewrite TestEnsureUTF8_AlreadyValid to use the shared runEncodingTests
helper instead of reimplementing the iteration and assertion logic. Add
a comment to TestEnsureUTF8_AsianEncodings documenting why exact string
matching isn't feasible (chardet can't identify CJK encodings from short
byte sequences without charset hints), and annotate the hex literals
with the intended characters.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move inline SQL verification into reusable helpers (assertRecipientCount,
assertDisplayName, assertDeletedFromSource, assertBodyContains, etc.) in
testenv_test.go, and extract MIME fixture data to fixtures_test.go.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Consolidate exported NewTestEnv and unexported newTestEnv into a single
  function that uses t.Cleanup for automatic resource cleanup instead of
  manual cleanup in error paths and returned cleanup functions
- Accept variadic *Options parameter for configuration flexibility
- Use testEmail constant instead of hardcoded string in profile setup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Split TestValidateRelativePath into two distinct loops: edge cases/invalid
paths and shared valid paths fixture, removing the append-based merging.
Rename helpers for clarity: writeAndVerify -> writeFileAndAssertExists,
assertFileWritten -> writeFileAndAssertContent.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…alize assertions

- Replace msgSummaries with msgSummary that takes explicit IDs, eliminating
  implicit sequential ID assignment and magic numbers in tests
- Remove default view logic from stageForDeletion helper; all callers now
  explicitly pass the desired ViewType
- Replace assertSingleFilter with generic assertStrings that supports
  arbitrary slice lengths
- Simplify MultipleAggregates test assertion using assertStrings

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…st files

Extract test infrastructure and tests from the monolithic model_test.go into
focused files: setup_test.go (mocks, builder, helpers), selection_test.go,
nav_test.go, search_test.go, and view_render_test.go.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…I with regex

- Remove wantText from TestApplyHighlight table since it always equaled
  the input text; assertHighlight now enforces the invariant that
  highlighting never alters text content
- Replace manual ANSI-stripping state machine with a simple regexp

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n up test data

- Add t.Parallel() to all test functions and subtests for faster execution
- Add explicit name field to TestIsNewer cases documenting versioning rules
- Extract repeated SHA256 hash strings into constants (testHash64) and use
  fmt.Sprintf to construct checksum test data
- Add "fmt" import for checksum test data construction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ID assertion

The setupManifest closures were capturing the parent test's t, which could
abort the entire test on failure instead of just the subtest. Also restores
the manifest ID assertion that was dropped during consolidation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The "mixed short hashes all reported" test case only checked for
"No attachments exported" but did not verify that individual filenames
appeared in the error output. Add wantAllInResult assertions for both
"file.txt" and "file2.txt" error messages to prevent regressions in
per-attachment error reporting.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The RateLimitExceeded case only tested reason-based detection with an
empty message. Add a RateLimitExceededByMessage case that combines both
quotaExceededMsg and the rateLimitExceeded reason to ensure the
message-parsing path remains covered.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…against nil clock

- Restore DeadlineExceeded coverage in TestRateLimiter_Acquire_ContextTimeout
  by using context.WithDeadline and advancing mock clock past the deadline
- Add timeout context to TestRateLimiter_Acquire_Success to prevent hangs
- Add ensureClock() helper that defaults nil clock to realClock{}, called in
  all public RateLimiter methods to prevent panics on zero-value structs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add tool name parameter to callToolDirect, runTool, and runToolExpectError
so tests populate req.Params.Name matching real MCP behavior. Fix
createAttachmentFixture comment that incorrectly claimed it returns a hash.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… subtests

The table-driven consolidation dropped per-result field checks for FromEmail
and HasAttachments. Restore these as switch-case assertions so mapping
regressions in result fields are caught even when the query filter works.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…d IDs, support NULL emails

- Drop messages_fts in setupTestDB and EnableFTS to prevent schema drift
  from silently breaking FTS-related tests or skipping coverage
- Use AddParticipant return value instead of hardcoded FromID: 100 in
  TestAggregateBySenderName_FallbackToEmail
- Change participantOpts.Email to *string so AddParticipant can insert
  SQL NULL, restoring NULL-vs-empty-string coverage in
  TestMatchEmptySenderName_MixedFromRecipients

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Merge TestStore_Message_Upsert and TestStore_UpsertMessage_Variations
into a unified TestStore_UpsertMessage that covers both field variations
(AllFields, MinimalFields) and upsert-update idempotency with stats
verification in each subtest.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…est helpers

Replace string-concatenated SQL with parameterized queries and st.Rebind()
for consistency with other helpers and to avoid fragile SQL construction.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
wesm and others added 25 commits February 2, 2026 16:34
Prevents potential cross-test mutation if any test or model mutates
the accounts slice via append or direct modification.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… leakage

The package-level pointer was shared across all tests, risking data races
or state mutation if any test modified model.stats. Now each test gets a
fresh copy via standardStats().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When MessagesBySourceID map is nil, scan the Messages map for a matching
SourceMessageID instead of silently returning nil. Updated the comment to
accurately describe the fallback behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add existence check when looking up aggregate functions by name in the
test table, so a typo in aggName produces a clear test failure instead
of a nil function panic.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ssage

When AddMessage infers source_id from a conversation that doesn't exist
(common in tests with foreign keys disabled), fall back to 1 instead of
calling Fatalf. This prevents test breakage for callers that don't set
up conversation rows before adding messages.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, when both SourceID and ConversationID were provided, a failed
conversation lookup was silently ignored, allowing inconsistent test data.
Now the lookup error is fatal, matching the behavior of other validation
paths. Also widen TestDB.T to testing.TB and add tests covering the
mismatch and missing-conversation cases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…idation tests

AddAttachment now calls Fatalf when the messageID doesn't match any
existing message, preventing silent no-ops that mask test setup mistakes.
Added unit tests covering AddLabel valid names, AddMessage source
defaults, AddAttachment auto-setting HasAttachments, and Build with
empty auxiliary tables.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace global msgCounter with per-Fixture counter via Fixture.NewMessage()
method, keeping the standalone NewMessage() with a global counter for
backward compatibility. Add tests validating unique IDs, per-fixture
determinism, and message creation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Define EncodedSamplesT once and use it for both the unexported canonical
variable and the public API, eliminating the duplicated struct definition.
Add a test verifying that EncodedSamples() returns independent copies that
are not affected by mutation in other tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Header() now uses last-write-wins semantics instead of always appending,
preventing duplicate headers when the same key is set twice. The CRLF
test now scans all bytes to reject bare \n, catching mixed line endings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…bs paths on Windows

Add a unit test verifying PathTraversalCases() returns independent slices
on each call (the core motivation for the variable-to-function change).
Also include /abs/path in the Windows test vectors since Windows APIs
accept forward slashes as path separators.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ourceID

The fallback that scanned the Messages map for a matching SourceMessageID
used randomized Go map iteration, which could produce flaky tests if
multiple messages shared the same source ID. Remove the fallback so tests
must explicitly populate MessagesBySourceID for deterministic behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The AddMessage helper was silently swallowing all DB errors when looking
up a conversation's source_id, masking real issues like connection
failures or schema problems. Now only sql.ErrNoRows triggers the
fallback to source_id=1; other errors correctly fail the test.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Avoids future collisions if the seeded dataset grows to include IDs
at or above the previously hardcoded values (1000/2000).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rename TestAddMessage_FailsWithoutSources to test the actual failure
  path (SourceID=0 with no sources) using a fakeT that captures Fatalf
- Add TestAddAttachment_FailsWithMissingMessage to cover the fatal guard
  when attaching to a nonexistent message ID
- Rename old test to TestAddMessage_ExplicitSourceID_BypassesCheck
- Widen TestDataBuilder and helpers to accept testing.TB for testability

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ith package-level NewMessage

Fixture.NewMessage now uses "fixture-msg-%d" prefix instead of "test-msg-%d",
preventing duplicate SourceMessageID values when both builders are used against
the same store. Adds a test covering the mixed-builder scenario.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…geBuilder

Header field names are case-insensitive per RFC 2822, so setting
"X-Custom" then "x-custom" should overwrite rather than create a
duplicate. Uses strings.EqualFold for the comparison.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…asesIndependence

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…okup

Closes testing gap: verifies that AddMessage correctly fails the test
when QueryRow returns a DB error other than sql.ErrNoRows (e.g., closed
connection), rather than silently falling back to source_id=1.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rn semantics

- fakeT now wraps a real testing.TB via newFakeT(t) so un-overridden
  methods don't nil-panic, making tests resilient to helper code changes.
- Added comment clarifying that expectFatal's return value is always false
  and callers should check ft.failed instead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…sageBuilder

Header() now updates the key casing on overwrite (last-write-wins for both
key and value). Added HeaderAppend() for intentional duplicate headers
(e.g., multiple Received lines). Added tests for case-insensitive overwrite
and HeaderAppend duplicate behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…sTest

The test previously required both a recovered panic and a fatal message,
coupling it to fakeT.Fatalf's panic implementation detail. Now it only
asserts on ft.fatalMsg, making it resilient to changes in how fakeT
signals fatal errors.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…l return value

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add BccIDs support to dbtest.MessageOpts and re-implement the BCC
regression test from 9a9924c using the new test helpers and split
file layout.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm wesm force-pushed the test-suite-improvements branch from 3a5ac35 to 84fef7d Compare February 2, 2026 22:36
wesm and others added 3 commits February 2, 2026 16:39
- Add Skipf and SkipNow overrides to prevent Goexit bypass
- Set failed=true in Skip/Skipf/SkipNow so unexpected skips are
  caught by assertions checking ft.failed
- Clear fatalMsg in FailNow to avoid stale state
- Update comments to reflect all intercepted methods

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix unchecked recover() return value in dbtest_test.go
- Remove unused assertDrillState helper in setup_test.go
- Add .githooks/pre-commit that runs gofmt and golangci-lint

To enable: git config core.hooksPath .githooks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add make setup-hooks target and mention the hook in the Code Style
section so contributors know to enable it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm wesm merged commit aaf1936 into main Feb 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant