Skip to content

chore: production hardening — Go 1.26, security (SSRF, hide signing key), config, pagination#46

Merged
xernobyl merged 12 commits intomainfrom
chore/ci_config_security_new_pattern
Mar 9, 2026
Merged

chore: production hardening — Go 1.26, security (SSRF, hide signing key), config, pagination#46
xernobyl merged 12 commits intomainfrom
chore/ci_config_security_new_pattern

Conversation

@xernobyl
Copy link
Contributor

@xernobyl xernobyl commented Mar 6, 2026

What does this PR do?

Production hardening: CI upgrades, security hardening, config validation, and hiding webhook signing keys.

1. CI and tooling

  • Upgrade to Go 1.26 and golangci-lint for compatibility
  • Update tool versions (goose, River, etc.) in Makefile, Dockerfile, workflows
  • Add test timeout for make tests
  • Scope gosec suppressions (G704/G706/G101) to test files

2. Config and security

  • Add/use database pool options (max/min conns, lifetime, idle, health check, connect timeout)
  • SSRF hardening: reject private/loopback/link-local IPs for webhook URLs; optional host blacklist
  • Config validation (e.g. embedding provider/model)

3. Pagination

  • Enforce pagination invariant: return ErrPaginationInvariantViolated when hasMore is true but encodeLast is nil
  • Align test expectations with repository ordering

4. Hide webhook signing key (security)

  • Omit signing_key from GET, List, and Update responses
  • Add WebhookPublic (no signing_key), ToWebhookPublic(), and ListWebhooksPublicResponse
  • Update OpenAPI with WebhookPublicData and point GET/PATCH webhook and List webhooks responses to it
  • Create response still returns full webhook (including signing_key) so clients can store it once

5. Other

  • Replace ptrFloat64 with Go 1.22+ new() for pointer fields
  • Improve tests (deterministic ordering, SSRF validation coverage)
  • Fix invalid new() usage in tests

Review Provenance

  • This PR includes AI-assisted review content.
  • The posted review comments were human-reviewed and approved before submission.

Base: main (or chore/ci_config_security_new_pattern if that is the PR base)

How should this be tested?

  • make build
  • make tests — all integration tests pass
  • make fmt and make lint — no new warnings
  • Verify webhook endpoints:
    • GET /v1/webhooks/{id} — response omits signing_key
    • GET /v1/webhooks — list items omit signing_key
    • PATCH /v1/webhooks/{id} — response omits signing_key
    • POST /v1/webhooks — response includes signing_key
  • (Optional) Test SSRF rejection with loopback/private IP webhook URLs

Checklist

Required

  • Filled out the "How to test" section
  • Read Repository Guidelines
  • Self-reviewed
  • Ran make build
  • Ran make tests
  • Ran make fmt and make lint; no new warnings
  • Removed debug prints / temporary logging
  • Merged latest from base branch
  • No schema changes (no migrations)

Appreciated

  • Updated OpenAPI spec for webhook responses
  • API behavior changes documented above
  • Updated docs in docs/ if applicable
  • Ran make tests-coverage for new logic

@xernobyl xernobyl marked this pull request as ready for review March 6, 2026 10:40
@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Walkthrough

This pull request upgrades the Go toolchain from 1.25.7 to 1.26.1 across all build configurations and dependencies. It introduces webhook URL blacklist configuration for SSRF protection and adds database connection pool parameters. The API key authentication now uses constant-time comparison. New configuration fields support webhook HTTP timeouts and enqueue retry behavior. Dependencies are updated including OpenAI client, River job queue, OpenTelemetry stack, and security libraries. Pagination validation is enhanced with an invariant check. Test code is modernized to use direct pointer creation instead of helper functions.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: Go version upgrade to 1.26, security enhancements (SSRF mitigation and hiding signing key), configuration management improvements, and pagination safeguards.
Description check ✅ Passed The PR description comprehensively covers all required sections: a clear summary of changes organized by category, detailed testing instructions, and a completed checklist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/service/pagination.go (1)

5-24: ⚠️ Potential issue | 🟡 Minor

Make the sentinel error describe the actual failure.

The new guard checks hasMore && encodeLast == nil, but ErrPaginationInvariantViolated says hasMore with empty list. Those are not the same condition, and the doc comment still talks about list emptiness instead of the missing encoder precondition. Rename the error/message, or move the empty-list invariant check to the caller.

Suggested cleanup
-// ErrPaginationInvariantViolated indicates hasMore was true with an empty list (repository invariant violation).
-var ErrPaginationInvariantViolated = errors.New("pagination invariant violated: hasMore with empty list")
+// ErrPaginationEncoderRequired indicates BuildListPaginationMeta was called with hasMore=true and no encoder.
+var ErrPaginationEncoderRequired = errors.New("pagination invariant violated: hasMore requires encodeLast")
@@
-// encodeLast is called only when hasMore is true to produce next_cursor. Callers must ensure
-// that when hasMore is true, the underlying list is non-empty so encodeLast can safely access the last item.
+// encodeLast is called only when hasMore is true to produce next_cursor.
+// When hasMore is true, callers must pass encodeLast and ensure it can safely encode the last item.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/service/pagination.go` around lines 5 - 24, The sentinel error
ErrPaginationInvariantViolated is misleading for the guard in
BuildListPaginationMeta that checks hasMore && encodeLast == nil; rename or
replace it with a clearer error (e.g., ErrMissingEncodeLast or
errors.New("pagination invariant violated: hasMore is true but encodeLast is
nil")) and update the BuildListPaginationMeta doc comment to reflect that
callers must supply a non-nil encodeLast when hasMore is true; keep the existing
empty-list invariant language only if you also add a separate check/caller
responsibility for non-empty lists.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.golangci.yml:
- Around line 46-49: The gosec suppression currently lacks a path and therefore
disables G704/G706/G101 globally; restrict the suppression to test-only files by
adding a path filter limiting it to *_test.go and tests/** (or create two
entries) instead of the global rule — update the linters block that lists
"gosec" and the "text: 'G70[46]|G101'" to include a path: pattern matching test
files (e.g., path: "_test.go$" or path: "tests/") so only test code is excluded,
or alternatively remove the global suppression and add targeted `#nosec` comments
in the specific test files that triggered the false positives.

In `@internal/config/config.go`:
- Around line 243-283: The config parsing added new Database* fields but the
Postgres bootstrap (in pkg/database/postgres.go — look for functions like
NewPostgresDB, OpenPostgres, or NewPGPool that build the pool from the
connection string) never applies them; update that bootstrap to read the loaded
Config (e.g., fields DatabaseMaxConns, DatabaseMinConns,
DatabaseMaxConnLifetime, DatabaseMaxConnIdleTime, DatabaseHealthCheckPeriod,
DatabaseConnectTimeout) and set the corresponding pool options on the
pgxpool.Config (or whatever pool type is used): set MaxConns/MinConns, set
MaxConnLifetime and MaxConnIdleTime using time.Duration, set HealthCheckPeriod,
and apply ConnectTimeout on the connection config before creating the pool so
the environment variables actually change runtime behavior.

In `@internal/service/webhooks_service_test.go`:
- Around line 67-68: Tests currently construct NewWebhooksService with a nil
blacklist so the new URL-host validation paths in CreateWebhook and
UpdateWebhook (loopback/private/blacklisted checks) are never exercised; update
webhooks_service_test.go to instantiate NewWebhooksService with a non-nil
blacklist and add test cases using mockWebhooksRepo/noopPublisher that call
CreateWebhook and UpdateWebhook with URLs that are loopback (e.g., 127.0.0.1),
private (e.g., 10.x.x.x), and entries present in the blacklist to assert the
methods return the expected validation errors/rejections, ensuring the SSRF
guard logic is covered.

In `@internal/service/webhooks_service.go`:
- Around line 121-149: The validateWebhookURLHost function currently only
rejects literal IPs or exact blacklist hits and returns early when blacklist is
empty; change it to always perform address checks: after parsing and
canonicalizing the host (canonicalizeHost/u.Hostname), if host is a literal IP
keep the existing netip.ParseAddr checks, otherwise resolve the hostname (e.g.,
via net.LookupIP or net.DefaultResolver.LookupIPAddr) and iterate all returned
IPs, Unmap each and reject any that IsLoopback, IsPrivate, IsLinkLocalUnicast,
IsLinkLocalMulticast, or IsUnspecified; keep the blacklist check on the
canonical host but also consider blocking any resolved IP that matches a
blacklist entry (if blacklist stores IPs) and remove the early return when
blacklist is empty so literal IP checks still run; finally, duplicate the same
IP resolution+validation in the HTTP client's dial path (the DialContext
implementation used for webhook delivery) to prevent DNS rebinding during
connection establishment.

In `@Makefile`:
- Around line 203-206: Make the goose and river versions consistent: update any
references to GOOSE_VERSION and RIVER_VERSION outside the Makefile (specifically
in the Dockerfile and CI workflow files tests.yml, migrations-validate.yml, and
api-contract-tests.yml) so they use the exact values from the Makefile
(GOOSE_VERSION := v3.27.0 and RIVER_VERSION := v0.31.0), and replace any
occurrences of older tags (v3.26.0, v0.30.2) or `@latest` with these pinned
versions to avoid environment drift.

---

Outside diff comments:
In `@internal/service/pagination.go`:
- Around line 5-24: The sentinel error ErrPaginationInvariantViolated is
misleading for the guard in BuildListPaginationMeta that checks hasMore &&
encodeLast == nil; rename or replace it with a clearer error (e.g.,
ErrMissingEncodeLast or errors.New("pagination invariant violated: hasMore is
true but encodeLast is nil")) and update the BuildListPaginationMeta doc comment
to reflect that callers must supply a non-nil encodeLast when hasMore is true;
keep the existing empty-list invariant language only if you also add a separate
check/caller responsibility for non-empty lists.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 34fba094-5275-4d4b-bef9-2e5b2b381c9c

📥 Commits

Reviewing files that changed from the base of the PR and between 50c8603 and 2ad31f8.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (18)
  • .env.example
  • .github/workflows/api-contract-tests.yml
  • .github/workflows/code-quality.yml
  • .github/workflows/migrations-validate.yml
  • .github/workflows/tests.yml
  • .golangci.yml
  • Dockerfile
  • Makefile
  • cmd/api/app.go
  • go.mod
  • internal/api/middleware/auth.go
  • internal/config/config.go
  • internal/service/embedding_provider_test.go
  • internal/service/pagination.go
  • internal/service/pagination_test.go
  • internal/service/webhooks_service.go
  • internal/service/webhooks_service_test.go
  • tests/integration_test.go

@github-actions
Copy link

github-actions bot commented Mar 6, 2026

✱ Stainless preview builds

This PR will update the hub SDKs with the following commit message.

chore: CI, config, security, and new() pattern improvements
hub-openapi studio · code

Your SDK built successfully.
generate ✅

hub-typescript studio · code

Your SDK built successfully.
generate ✅build ✅lint ✅test ✅

npm install https://pkg.stainless.com/s/hub-typescript/cdc3bfb3c8b18897716910e75cc4f422d9b80af0/dist.tar.gz

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-03-09 11:17:01 UTC

@xernobyl xernobyl changed the title chore: CI, config, security, and new() pattern improvements chore: production hardening — Go 1.26, security (SSRF, hide signing key), config, pagination Mar 6, 2026
Copy link
Member

@mattinannt mattinannt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-assisted review, human-reviewed and approved before posting. Requesting changes for three issues that introduce dead configuration or make webhook delivery less reliable in production:

  1. WEBHOOK_ENQUEUE_* is parsed and documented but not wired into any retry path.
  2. WEBHOOK_HTTP_TIMEOUT_SECONDS is parsed and documented but the sender still uses a hard-coded timeout.
  3. The new SSRF-safe dialer now pins delivery to the first resolved IP, which can fail healthy multi-address webhook endpoints.

Recommended direction: either fully wire the new config through the runtime behavior in this PR, or remove the new knobs until the behavior exists. For the dialer, keep the SSRF validation but preserve multi-address connection attempts instead of hard-pinning allowed[0].

@xernobyl xernobyl enabled auto-merge March 9, 2026 10:47
@xernobyl xernobyl added this pull request to the merge queue Mar 9, 2026
Merged via the queue into main with commit 0949ba1 Mar 9, 2026
10 checks passed
@xernobyl xernobyl deleted the chore/ci_config_security_new_pattern branch March 9, 2026 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants