Skip to content

feat: Add HTTP client configuration options for token_hook to improve performance #4067

@kamalshkeir

Description

@kamalshkeir

Preflight checklist

Ory Network Project

No response

Describe your problem

When using oauth2.token_hook in production (Kubernetes, sidecar container at 127.0.0.1), we observe ~100-120ms of overhead at p90 on /oauth2/token for JWT token generation with sporadic traffic patterns.

Production metrics (Hydra v2.3.0, EKS):

  • WITH token_hook: p50 ≈ 100ms, p90 ≈ 200-250ms, p99 ≈ 300ms
  • WITHOUT token_hook: p50 ≈ 80ms, p90 ≈ 100ms, p99 ≈ 120ms

Local Docker testing:

  • WITH token_hook: p50 = 57ms, p90 = 79ms
  • WITHOUT token_hook: p50 = 54ms, p90 = 76ms (only ~2-3ms overhead)

The huge difference between local (~2ms) and production (~100ms) overhead points to HTTP connection management issues, not the hook logic itself. CPU profiling confirms the token hook uses virtually 0 CPU/memory — the overhead is purely network/connection setup.

Isolation tests performed:

  • ✅ Disabling token_hook → p90 drops from 200ms to 100ms (confirmed root cause)
  • ❌ Increasing Hydra CPU (500m → 1000m): no impact
  • ❌ Increasing token_hook CPU (100m → 500m): no impact
  • ❌ Increasing DB pool (26 → 50): no impact
  • ❌ Reducing tracing sampling (0.4 → 0.1): no impact

The token hook HTTP client in oauth2/token_hook.go uses the default HTTPClient from the registry without configurable connection pooling or keep-alive settings. With sporadic traffic, connections are closed between requests, causing re-establishment overhead on each call.

Describe your ideal solution

Add optional configuration for the token hook HTTP client transport:

oauth2:
token_hook: https://my-hook
token_hook_http_client:
timeout: 5s
keep_alive: true
max_idle_conns: 100
max_idle_conns_per_host: 10
idle_conn_timeout: 90s

This would configure http.Transport with proper connection pooling:

transport := &http.Transport{
MaxIdleConns: config.MaxIdleConns,
MaxIdleConnsPerHost: config.MaxIdleConnsPerHost,
IdleConnTimeout: config.IdleConnTimeout,
DisableKeepAlives: !config.KeepAlive,
}

We're happy to submit a PR if the approach is approved.

Workarounds or alternatives

Currently there is no workaround within Hydra configuration. The only options are:
1 - Remove the token hook entirely — loses the custom claims functionality we need
2 - Increase traffic volume to keep connections warm — not feasible for B2B sporadic patterns
3 - Fork Hydra and patch the HTTP client — maintenance burden

Version

v2.3.0

Additional Context

Deployment: Kubernetes (AWS EKS), token_hook as sidecar container (http://127.0.0.1:8089)
Traffic pattern: Sporadic B2B bursts (~100 requests over 2-3 minutes)
Strategy: strategies.access_token: jwt with client_credentials grant
Local CPU profiling (pprof) confirms:

  • ~80% of Hydra CPU is in crypto/rsa.SignPKCS1v15 (JWT signing) — expected and normal
  • ~10% in pbkdf2.Key (bcrypt client_secret verification) — expected
  • Token hook: virtually 0% CPU, 0 heap — the hook itself is not the bottleneck
  • The production overhead is purely HTTP connection management

Metadata

Metadata

Assignees

No one assigned

    Labels

    featNew feature or request.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions