-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Preflight checklist
- I could not find a solution in the existing issues, docs, nor discussions.
- I agree to follow this project's Code of Conduct.
- I have read and am following this repository's Contribution Guidelines.
- I have joined the Ory Community Slack.
- I am signed up to the Ory Security Patch Newsletter.
Ory Network Project
No response
Describe your problem
When using oauth2.token_hook in production (Kubernetes, sidecar container at 127.0.0.1), we observe ~100-120ms of overhead at p90 on /oauth2/token for JWT token generation with sporadic traffic patterns.
Production metrics (Hydra v2.3.0, EKS):
- WITH token_hook: p50 ≈ 100ms, p90 ≈ 200-250ms, p99 ≈ 300ms
- WITHOUT token_hook: p50 ≈ 80ms, p90 ≈ 100ms, p99 ≈ 120ms
Local Docker testing:
- WITH token_hook: p50 = 57ms, p90 = 79ms
- WITHOUT token_hook: p50 = 54ms, p90 = 76ms (only ~2-3ms overhead)
The huge difference between local (~2ms) and production (~100ms) overhead points to HTTP connection management issues, not the hook logic itself. CPU profiling confirms the token hook uses virtually 0 CPU/memory — the overhead is purely network/connection setup.
Isolation tests performed:
- ✅ Disabling token_hook → p90 drops from 200ms to 100ms (confirmed root cause)
- ❌ Increasing Hydra CPU (500m → 1000m): no impact
- ❌ Increasing token_hook CPU (100m → 500m): no impact
- ❌ Increasing DB pool (26 → 50): no impact
- ❌ Reducing tracing sampling (0.4 → 0.1): no impact
The token hook HTTP client in oauth2/token_hook.go uses the default HTTPClient from the registry without configurable connection pooling or keep-alive settings. With sporadic traffic, connections are closed between requests, causing re-establishment overhead on each call.
Describe your ideal solution
Add optional configuration for the token hook HTTP client transport:
oauth2:
token_hook: https://my-hook
token_hook_http_client:
timeout: 5s
keep_alive: true
max_idle_conns: 100
max_idle_conns_per_host: 10
idle_conn_timeout: 90s
This would configure http.Transport with proper connection pooling:
transport := &http.Transport{
MaxIdleConns: config.MaxIdleConns,
MaxIdleConnsPerHost: config.MaxIdleConnsPerHost,
IdleConnTimeout: config.IdleConnTimeout,
DisableKeepAlives: !config.KeepAlive,
}
We're happy to submit a PR if the approach is approved.
Workarounds or alternatives
Currently there is no workaround within Hydra configuration. The only options are:
1 - Remove the token hook entirely — loses the custom claims functionality we need
2 - Increase traffic volume to keep connections warm — not feasible for B2B sporadic patterns
3 - Fork Hydra and patch the HTTP client — maintenance burden
Version
v2.3.0
Additional Context
Deployment: Kubernetes (AWS EKS), token_hook as sidecar container (http://127.0.0.1:8089)
Traffic pattern: Sporadic B2B bursts (~100 requests over 2-3 minutes)
Strategy: strategies.access_token: jwt with client_credentials grant
Local CPU profiling (pprof) confirms:
- ~80% of Hydra CPU is in crypto/rsa.SignPKCS1v15 (JWT signing) — expected and normal
- ~10% in pbkdf2.Key (bcrypt client_secret verification) — expected
- Token hook: virtually 0% CPU, 0 heap — the hook itself is not the bottleneck
- The production overhead is purely HTTP connection management