-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
NOTE: DEBUGGING PROCESS WAS EXTENSIVE,IM LIMITED BY MY INEXPERIENCE REGARDING THIS REPOSITORY, SO THIS ISSUE WAS MADE WITH THE HELP OF AN LLM TO WRITE AND ORGANIZE EVERYTHING IN A WAY THATS CLEAR AND CONCISE.
TL;DR
OIDC login with Microsoft Entra ID works perfectly when running from source locally, but breaks in production self-hosted deployment. After extensive debugging with HAR captures and Jaeger distributed tracing, I can confirm the issue is 100% frontend-side: the getLoginInfoByToken() RPC call is simply never made in production.
The account service does its job flawlessly—Microsoft authenticates, the callback is processed, a valid JWT is generated, and the redirect happens. But then... silence. The frontend loads, sees no token (or fails to parse it), and dumps the user back to /login/login.
Environment
| Component | Local (✅ Working) | Production (❌ Failing) |
|---|---|---|
| Host | localhost:8087 |
huly.redacted.com |
| Account Service | Direct localhost:3000 |
nginx /_accounts/ → 127.0.0.1:3000 |
| SSL/TLS | No | Yes (Cloudflare origin certs) |
| Reverse Proxy | None | nginx |
| CDN | None | Cloudflare |
| Huly Version | From source (dev/docker-compose.yaml) |
hardcoreeng/*:${HULY_VERSION} (self-host images) |
The Smoking Gun: Jaeger Traces
I added Jaeger to production to capture distributed traces during the OIDC flow. The results are... illuminating.
| RPC Operation | Local | Production |
|---|---|---|
getLoginInfoByToken |
✅ 2 calls (20ms, 33ms) | ❌ 0 calls |
getUserWorkspaces |
✅ 1 call | ❌ 0 calls |
PUT /cookie |
✅ 1 call | ❌ 0 calls |
OPTIONS (preflight) |
✅ 2 calls | ❌ 0 calls |
| OIDC token exchange | ✅ Present (~800ms) | ✅ Present (~500ms) |
Translation: The OIDC dance completes successfully in both environments. But in production, the frontend simply... doesn't call the account service afterward. At all. Not even a preflight request.
Local Traces (What Should Happen)
account | getLoginInfoByToken | 20.4ms ← JWT validation
account | getUserWorkspaces | 5.9ms ← Get workspaces
account | PUT | 2.8ms ← Set cookie
account | OPTIONS | 0.5ms ← CORS preflight
account | getLoginInfoByToken | 33.4ms ← Second validation
Production Traces (What Actually Happens)
account | GET | 505.0ms ← OIDC token exchange (works!)
account | POST | 333.8ms ← Callback processing (works!)
account | GET | 130.1ms
...
(crickets - no getLoginInfoByToken, no cookie, nothing)
Expected vs Actual Flow
Expected (and what happens locally)
1. User clicks "Login with OpenID" → /_accounts/auth/openid
2. Account service → Microsoft Entra ID
3. Microsoft authenticates → /_accounts/auth/openid/callback?code=...
4. Account service generates JWT → redirects to /login/auth?token%3D<JWT>
5. Auth.svelte parses token, calls getLoginInfoByToken() ✅
6. Token validated, account-metadata-Token cookie set ✅
7. User sees workspace ✅
Actual (production)
1-4. Same as above ✅
5. Auth.svelte loads... ✅
6. getLoginInfoByToken() NEVER CALLED ❌ (0 calls in Jaeger)
7. result = null → goTo('login', true) → /login/login ❌
What I've Ruled Out
| Hypothesis | Status | Evidence |
|---|---|---|
| Account service failure | ❌ Ruled out | Jaeger shows OIDC flow completes successfully |
X-Forwarded-Proto missing |
❌ Ruled out | nginx config explicitly sets $scheme |
| CORS issues | ❌ Ruled out | Same origin (/_accounts/ on same domain) |
Cookie Secure flag mismatch |
❌ Ruled out | nginx forwards proto correctly |
| Token not in redirect URL | ❌ Ruled out | HAR shows token reaches frontend |
| Frontend JS not executing | ✅ CONFIRMED | Jaeger shows 0 API calls from frontend |
Relevant Code Paths
The auth flow is handled in plugins/login-resources/src/components/Auth.svelte:
onMount(async () => {
// ...
try {
result = await getLoginInfoFromQuery() // ← This should call account service
} catch (err) {
// handle error
}
await handleLoginInfo(result)
})
async function handleLoginInfo(result) {
if (result == null) {
goTo('login', true) // ← This is where we end up in production
}
// ...
}And getLoginInfoFromQuery() in plugins/login-resources/src/utils.ts:
export async function getLoginInfoFromQuery(): Promise<LoginInfoByToken | null> {
const token = getCurrentLocation().query?.token
if (token == null) return null // ← Possible failure point?
const client = getAccountClient(token)
return await client.getLoginInfoByToken() // ← Never called in production
}Configuration Reference
nginx configuration (huly.conf)
server {
server_name huly.redacted.com;
listen 443 ssl;
ssl_certificate /etc/ssl/cloudflare/origin-cert.pem;
ssl_certificate_key /etc/ssl/cloudflare/pk.key;
location /_accounts/ {
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://127.0.0.1:3000/;
}
location / {
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://127.0.0.1:8080;
}
}
server {
listen 80;
server_name huly.redacted.com;
return 301 https://$host$request_uri;
}docker-compose.yml (production)
name: ${DOCKER_NAME}
services:
account:
image: hardcoreeng/account:${HULY_VERSION}
ports:
- 3000:3000
environment:
- SERVER_PORT=3000
- SERVER_SECRET=${SECRET}
- DB_URL=${CR_DB_URL}
- TRANSACTOR_URL=ws://transactor:3333;ws${SECURE:+s}://${HOST_ADDRESS}/_transactor
- STORAGE_CONFIG=minio|minio?accessKey=redacted&secretKey=redacted
- FRONT_URL=http${SECURE:+s}://${HOST_ADDRESS}
- STATS_URL=http${SECURE:+s}://${HOST_ADDRESS}/_stats
- MODEL_ENABLED=*
- ACCOUNTS_URL=http${SECURE:+s}://${HOST_ADDRESS}/_accounts
- ACCOUNT_PORT=3000
- QUEUE_CONFIG=redpanda:9092
- OPENID_CLIENT_ID=redacted
- OPENID_CLIENT_SECRET=redacted
- OPENID_ISSUER=https://login.microsoftonline.com/redacted/v2.0
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318/v1/traces
restart: unless-stopped
networks:
- huly_net
front:
image: hardcoreeng/front:${HULY_VERSION}
environment:
- SERVER_PORT=8080
- SERVER_SECRET=${SECRET}
- LOVE_ENDPOINT=http${SECURE:+s}://${HOST_ADDRESS}/_love
- ACCOUNTS_URL=http${SECURE:+s}://${HOST_ADDRESS}/_accounts
- ACCOUNTS_URL_INTERNAL=http://account:3000
- REKONI_URL=http${SECURE:+s}://${HOST_ADDRESS}/_rekoni
- CALENDAR_URL=http${SECURE:+s}://${HOST_ADDRESS}/_calendar
- GMAIL_URL=http${SECURE:+s}://${HOST_ADDRESS}/_gmail
- TELEGRAM_URL=http${SECURE:+s}://${HOST_ADDRESS}/_telegram
- STATS_URL=http${SECURE:+s}://${HOST_ADDRESS}/_stats
- UPLOAD_URL=/files
- ELASTIC_URL=http://elastic:9200
- COLLABORATOR_URL=ws${SECURE:+s}://${HOST_ADDRESS}/_collaborator
- STORAGE_CONFIG=minio|minio?accessKey=redacted&secretKey=redacted
- TITLE=${TITLE:-Huly Self Host}
- DEFAULT_LANGUAGE=${DEFAULT_LANGUAGE:-en}
- LAST_NAME_FIRST=${LAST_NAME_FIRST:-true}
- DESKTOP_UPDATES_CHANNEL=${DESKTOP_CHANNEL}
- OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318/v1/traces
restart: unless-stopped
networks:
- huly_net
# Other services: nginx, cockroach, redpanda, minio, elastic, rekoni,
# transactor, collaborator, workspace, fulltext, stats, jaeger, kvs
# (using standard huly-selfhost configuration)
volumes:
elastic:
files:
cr_data:
cr_certs:
redpanda:
telemetry:
networks:
huly_net:Questions for the Team
-
Is there something special about how the self-host frontend images handle URL parsing?
TheparseQuery()function inpackages/ui/src/location.tsusesdecodeURIComponent()before splitting—could this behave differently in the built images vs source? -
Could the double-encoded token (
token%3D→token=) be causing issues?
The redirect URL is/login/auth?token%3D<JWT>. This is intentional, but maybe something in the build process or runtime environment handles it differently? -
Is
getCurrentLocation().query?.tokenreturningundefinedin production?
This would explain whygetLoginInfoFromQuery()returns null immediately without making any API calls. -
Are there any differences between the source docker-compose and self-host images regarding how
ACCOUNTS_URLis embedded?
How to Reproduce
- Deploy Huly self-host with OIDC configured (Microsoft Entra ID)
- Use nginx as reverse proxy with Cloudflare for SSL
- Click "Login with OpenID"
- Complete Microsoft authentication
- Observe redirect to
/login/logininstead of workspace
What I Need Help With
The Jaeger traces definitively prove the frontend isn't making the API call. But I can't figure out why. The browser console shows no JavaScript errors. The network tab shows no failed requests. It's like the code path that calls getLoginInfoByToken() simply isn't being reached.
Any insights would be greatly appreciated! 🙏