Skip to content

HTTP analytics API via query engine (aggregate, messages, search)#188

Closed
asaif wants to merge 17 commits intowesm:mainfrom
asaif:feat/query-engine-api
Closed

HTTP analytics API via query engine (aggregate, messages, search)#188
asaif wants to merge 17 commits intowesm:mainfrom
asaif:feat/query-engine-api

Conversation

@asaif
Copy link
Copy Markdown

@asaif asaif commented Mar 10, 2026

Summary

Adds three new HTTP endpoints backed by the query engine (SQLite/DuckDB), enabling the web UI to power its analytics views, message listing, and full-text search without coupling directly to SQLite queries.

Changes

New endpoints

  • GET /api/v1/aggregate — Returns grouped statistics by sender, sender name, recipient, domain, label, or time granularity (year/month/day). Supports drill-down via from, subject, domain, and account filter params.
  • GET /api/v1/engine/messages — Paginated message list with sort (date/size/subject), period= or after=/before= date range, account, has_attachment, and file_type= MIME category filters (Images, PDFs, Calendar invites, Documents, etc.).
  • GET /api/v1/engine/search — Full-text search with Gmail-style query syntax (from:, subject:, etc.). Returns HTTP 400 if q is missing.

All three endpoints return HTTP 503 when no engine is configured, allowing graceful degradation.

Server wiring (internal/api/server.go)

  • Added engine query.Engine field and WithEngine(e query.Engine) *Server method.
  • Added api_key query param fallback in authMiddleware for direct browser navigation.

Query package (internal/query/)

  • Added MimeCategory string field to MessageFilter for MIME-type-based filtering.
  • Added MimeCategoryPatterns() and MimeCategoryExistsSQL() helpers that map category names to SQL LIKE patterns and build EXISTS subqueries (used by both engines).
  • Added CountMessages(ctx, filter) (int64, error) to the Engine interface with implementations in both SQLiteEngine and DuckDBEngine (DuckDB delegates to SQLite when available, falls back to Parquet COUNT).

Testing

  • Unit tests added for all three handlers covering: valid requests, invalid params (HTTP 400), missing engine (HTTP 503), pagination, and filter combinations.
  • All existing internal/api and internal/query tests continue to pass.

asaif added 8 commits March 10, 2026 22:29
… routes

- Add engine field and WithEngine() method to Server struct
- Register GET /api/v1/aggregate, /engine/messages, /engine/search
- Add api_key query param fallback in authMiddleware for browser access
…rch handlers

- GET /api/v1/aggregate: grouped stats by sender/domain/label/time
  with drill-down support via from/subject/domain/account filter params
- GET /api/v1/engine/messages: paginated list with sort, period/date range,
  account, attachment and MIME category (file_type=) filters
- GET /api/v1/engine/search: full-text search with Gmail-style query syntax
- All endpoints return HTTP 503 when no engine is configured
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 10, 2026

roborev: Combined Review (b8d6eb5)

Summary Verdict: The PR introduces valuable engine-backed API endpoints, but contains a high-severity wiring gap and several medium-severity issues around data consistency, input validation, and security that must be addressed.

High

  • Missing engine initialization in production path

    cmd/msgvault/cmd/serve.go#L124 and [internal/api/server.go#L70](/home/roborev/repos
    /msgvault/internal/api/server.go#L70)
    The production serve path never calls WithEngine, so the new /api/v1/aggregate, /api/v1/engine/messages, and /api/v1/engine/search handlers will
    all hit s.engine == nil and return 503. The new tests only cover manually-constructed servers with a mock engine, missing this wiring gap. Build/attach the query engine in runServe before Start(), and add a serve-level integration test.

Medium

  • In
    consistent data sources for pagination

    internal/query/duckdb.go#L1023 and internal/query/duckdb.go#L11
    75

    DuckDBEngine.ListMessages reads from Parquet, but CountMessages switches to SQLite whenever a direct DB is available. When the Parquet cache
    is stale relative to SQLite, /engine/messages can return a page from one dataset and a total from another, breaking pagination semantics. CountMessages should use the same backing store as ListMessages, or only delegate to SQLite on paths where list queries do the same.

  • Unvalidated file_ type silently broadens search results
    internal/api/engine_handlers.go#L180 and internal/query/shared.go#L271

    file_type is copied directly into filter.MimeCategory, but the matcher only recognizes lowercase singular tokens (e.g., image, pdf). For unsupported
    values like Images, MimeCategoryExistsSQL returns false, yet WithAttachmentsOnly is still forced to true. This silently broadens the result set to "all messages with attachments" instead of the requested file type. Normalize accepted aliases/plurals or reject unsupported values with a 400 Bad Request.

  • API key exposure via query parameters
    /home/roborev/repos/msgvault/internal/api/server.go (lines 225-228,
    authMiddleware)
    The middleware now accepts the long-lived API key via ?api_key=. This exposes the bearer credential in browser history, bookmarks, pasted links, proxy logs, and Referer headers. Because this applies to all authenticated routes via the shared middleware, it also weakens CSRF resistance.

    Remediation: Do not accept the primary API key in query parameters globally. For download flows, issue a short-lived scoped token or signed URL. At a minimum, restrict query-parameter auth to specific read-only endpoints on GET requests rather than the shared middleware.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 11, 2026

roborev: Combined Review (c141125)

Verdict: The code requires fixes for medium-severity issues related to unhandled file_type parameters and hardcoded table references in the new filtering logic.

Medium

  • Unhandled file_type Parameters

    • Locations: internal/api/engine_ handlers.go:180, [internal/query/shared.go:271](/home/roborev/repos/msgvault/internal/query/
      shared.go#L271), internal/api/engine_handlers_test.go:238
    • Finding: file_ type is passed through verbatim, but MimeCategoryPatterns only recognizes canonical singular lowercase values like image, pdf, document. Unrecognized values like Images are silently dropped while WithAttachmentsOnly remains true. This can cause a request for a specific type to erroneously return all messages with
      any attachment.
    • Suggested Fix: Normalize UI values (e.g., Images -> image) or reject unsupported values with a 400 status. Update tests to assert the actual filtered result set rather than just checking for a 200 response.
  • Hardcoded sqlite _db.attachments in DuckDB

    • Location: internal/query/duckdb.go:685
    • Finding: The DuckDB
      MIME-category path hardcodes sqlite_db.attachments inside buildFilterConditions. This breaks DuckDB configurations that do not have sqlite_scanner attached, including the documented Windows/direct-SQLite fallback.
    • Suggested Fix: Implement the MIME-category EXISTS against Parquet attachment data,
      or only enable this filter when sqlite_db is actually attached.
  • Hardcoded Table Alias in SQLite Filter

    • Location: internal/query/sqlite.go (in buildFilterJoinsAndConditions)
    • Finding: The MimeCategory filter hardcodes
      the table alias as "m.id", ignoring the dynamically computed prefix derived from the tableAlias parameter. This will break queries if the caller uses a different alias.
    • Suggested Fix: Change "m.id" to prefix + "id" in the MimeCategory ExistsSQL call.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 11, 2026

roborev: Combined Review (2945d38)

The PR introduces several
medium-severity issues, including API key exposure via URLs, silent failures on count queries and invalid parameters, and potential database compatibility regressions.

Medium

  • API Key Exposure in URLs

    • Location: [internal/api/server.go](/home/roborev/repos/msgvault
      /internal/api/server.go)
    • Details: authMiddleware now accepts ?api_key= for all /api/v1 routes. This exposes the API key in browser history, intermediary access logs, clipboard sharing, and telemetry. It also weakens anti-CSRF protections
      , as leaked keys can be replayed against state-changing endpoints.
    • Suggested Fix: Keep API keys header-only. For browser downloads/navigation, use a short-lived signed token scoped to that specific GET route, or an HttpOnly SameSite cookie. If query-token auth is strictly required
      , limit it to the specific minimal download endpoint (not global middleware) and enforce strict referrer suppression.
  • Silent Failure on Count Query Errors

    • Location: [internal/api/engine_handlers.go:221](/home/roborev/repos/msgvault/internal/api
      /engine_handlers.go#L221)
    • Details: The /api/v1/engine/messages endpoint silently drops failures from CountMessages. If the list query succeeds but the count query fails, the endpoint returns a 200 status with total: 0 . This hides underlying database issues and will silently break client-side pagination.
    • Suggested Fix: Either fail the request entirely when the count fails, or make the total field explicitly nullable/unknown and log the failure.
  • Invalid Date Parameters Silently Ignored
    *
    Location: internal/api/engine_handlers.go:95 and [internal/api/engine_handlers.go:196](/home/roborev
    /repos/msgvault/internal/api/engine_handlers.go#L196)

    • Details: Invalid after / before query parameters are silently ignored instead of rejected. Requests like /aggregate?after=not-a-date will return a successful but unexpectedly
      broad result set. This is difficult for API clients to diagnose and inconsistent with existing CLI/MCP parsing, which correctly returns an error.
    • Suggested Fix: Return a 400 Bad Request on parse failure and add handler tests for malformed dates.
  • **Compatibility Regression in Cache Export
    **

    • Location: cmd/msgvault/cmd/build_cache.go:306 and cmd/msgvault/cmd/
      build_cache.go:600
    • Details: The cache/export path now hard-selects attachments.mime_type unconditionally.
      However, there are existing in-tree test fixtures (e.g., build_cache_test.go:1136, 133
      6
      , [1765](/home/roborev/repos/msgvault/cmd/msgvault/cmd/build_cache_test.
      go#L1765)) that use minimal schemas like attachments(message_id, size, filename). Any older databases shaped like these fixtures will fail with no such column: mime_type.
    • Suggested Fix: Detect column presence and project '' AS mime_type when absent
      , or explicitly migrate/retire that schema and update the corresponding tests.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 11, 2026

roborev: Combined Review (60741fe)

Verdict: The PR successfully implements query-engine
API routes and MIME filtering, but requires fixes for missing DuckDB fields and insecure API key transport.

Medium

1. Missing fields in DuckDB results

  • [internal/query/duckdb.go](/home/roborev/repos/msgvault/internal/query/duckdb. go):1079
  • [internal/api/engine_handlers.go](/home/roborev/repos/msgvault/internal/api/engine_handlers.go):298

/api/v1/engine/messages now serializes labels and
attachment_count, but DuckDBEngine.ListMessages never populates either field. The DuckDB query does not select attachment_count, and unlike the SQLite path it returns before any label-enrichment step. If the server is backed by DuckDBEngine, this endpoint will report empty labels and
attachment_count: 0 for every message. Fix by joining att in DuckDBEngine.ListMessages, scanning attachment_count, and populating labels the same way SQLite does or by aggregating them in-query. Add an integration test that exercises the endpoint with a DuckDB engine.

2. Credential exposure via query string

  • [internal/api/server.go:222](/home/roborev/repos/msgvault/internal/api/server.go#L222)

Accepting api_key via the query string expands credential
exposure significantly. Unlike Authorization or X-API-Key headers, URL parameters are routinely persisted in browser history, bookmarks, copied links, crash reports, and proxy/access logs. Because this is added in the shared auth middleware, it applies to every protected /api/v1/* endpoint,
not just the browser-download case mentioned in the comment.

Suggested remediation: remove query-param API-key support from the general middleware. For window.open()/download flows, prefer a same-origin cookie/session, or issue a short-lived, endpoint-scoped signed token specifically for the
download URL. If query transport must exist temporarily, restrict it to the minimal endpoint set and aggressively redact api_key from any logging/telemetry.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 11, 2026

roborev: Combined Review (91cb181)

Verdict: The PR successfully wires the query engine to new API endpoints, but introduces a high-severity SQL query error in DuckDB and medium-severity issues with SQLite backwards compatibility and API key exposure
in URLs.

High

  • Missing JOIN clauses in DuckDB CountMessages
    • Location: internal/query/duckdb.go (around line 1198)
    • Description: CountMessages queries FROM msg but lacks
      the LEFT JOIN msg_sender ms and LEFT JOIN conv c clauses that are present in ListMessages. Since buildFilterConditions(filter) likely returns SQL referencing the ms. and c. aliases (when filtering by sender or conversation), executing the count query with those filters will result in
      a SQL error (table/alias not found).
    • Suggested Fix: Update the CountMessages query string to include the missing joins before the WHERE clause:
      FROM msg
      LEFT JOIN msg_sender ms ON ms.message_id = msg.id
      
      LEFT JOIN conv c ON c.id = msg.conversation_id
      WHERE %s

Medium

  • SQLite schema incompatibility for MIME filtering

    • Location: [internal/query/sqlite.go:306](file:///home/robore
      v/repos/msgvault/internal/query/sqlite.go#L306), cmd/msgvault/cmd/serve.go:125
    • Description: The new production serve path uses query.NewSQLiteEngine, but the SQLite filter builder unconditionally emits a.mime_type LIKE ? whenever file_type is set. This breaks against older SQLite databases that do not yet have the attachments.mime _type column (resulting in no such column: a.mime_type), causing /api/v1/engine/messages?file_type=... to regress to a 500 error.
    • Suggested Fix: Make the SQLite engine schema-aware for MIME filtering the same way cache
      export is, or reject file_type with a clear 400 response when the column is absent.
  • API key exposure via query parameters

    • Location: [internal/api/server.go:222](/home/roborev/repos/
      msgvault/internal/api/server.go#L222)
    • Description: authMiddleware now accepts the API key from ?api_key= in the URL. Putting long-lived credentials in query strings exposes them via browser history, bookmarks, reverse-proxy/access
      logs, and same-origin referrers. Because this is in the global auth middleware, every authenticated endpoint can now be authorized with a URL-carried secret.
    • Suggested Fix: Keep API keys in Authorization/X-API-Key headers only. For browser download support, use
      a short-lived signed token or a same-origin session flow. At minimum, restrict query-param auth to the specific download route instead of the entire API surface.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@asaif asaif closed this Mar 13, 2026
@wesm
Copy link
Copy Markdown
Owner

wesm commented Mar 13, 2026

Would you like me to work on getting this PR in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants