Skip to content

OIDC Authentication Fails in Self-Hosted Production - Frontend Never Calls getLoginInfoByToken #10290

@crafael23

Description

@crafael23

NOTE: DEBUGGING PROCESS WAS EXTENSIVE,IM LIMITED BY MY INEXPERIENCE REGARDING THIS REPOSITORY, SO THIS ISSUE WAS MADE WITH THE HELP OF AN LLM TO WRITE AND ORGANIZE EVERYTHING IN A WAY THATS CLEAR AND CONCISE.

TL;DR

OIDC login with Microsoft Entra ID works perfectly when running from source locally, but breaks in production self-hosted deployment. After extensive debugging with HAR captures and Jaeger distributed tracing, I can confirm the issue is 100% frontend-side: the getLoginInfoByToken() RPC call is simply never made in production.

The account service does its job flawlessly—Microsoft authenticates, the callback is processed, a valid JWT is generated, and the redirect happens. But then... silence. The frontend loads, sees no token (or fails to parse it), and dumps the user back to /login/login.


Environment

Component Local (✅ Working) Production (❌ Failing)
Host localhost:8087 huly.redacted.com
Account Service Direct localhost:3000 nginx /_accounts/127.0.0.1:3000
SSL/TLS No Yes (Cloudflare origin certs)
Reverse Proxy None nginx
CDN None Cloudflare
Huly Version From source (dev/docker-compose.yaml) hardcoreeng/*:${HULY_VERSION} (self-host images)

The Smoking Gun: Jaeger Traces

I added Jaeger to production to capture distributed traces during the OIDC flow. The results are... illuminating.

RPC Operation Local Production
getLoginInfoByToken 2 calls (20ms, 33ms) 0 calls
getUserWorkspaces ✅ 1 call ❌ 0 calls
PUT /cookie ✅ 1 call ❌ 0 calls
OPTIONS (preflight) ✅ 2 calls ❌ 0 calls
OIDC token exchange ✅ Present (~800ms) ✅ Present (~500ms)

Translation: The OIDC dance completes successfully in both environments. But in production, the frontend simply... doesn't call the account service afterward. At all. Not even a preflight request.

Local Traces (What Should Happen)

account | getLoginInfoByToken   | 20.4ms  ← JWT validation
account | getUserWorkspaces     |  5.9ms  ← Get workspaces
account | PUT                   |  2.8ms  ← Set cookie
account | OPTIONS               |  0.5ms  ← CORS preflight
account | getLoginInfoByToken   | 33.4ms  ← Second validation

Production Traces (What Actually Happens)

account | GET                   | 505.0ms ← OIDC token exchange (works!)
account | POST                  | 333.8ms ← Callback processing (works!)
account | GET                   | 130.1ms
...
(crickets - no getLoginInfoByToken, no cookie, nothing)

Expected vs Actual Flow

Expected (and what happens locally)

1. User clicks "Login with OpenID" → /_accounts/auth/openid
2. Account service → Microsoft Entra ID
3. Microsoft authenticates → /_accounts/auth/openid/callback?code=...
4. Account service generates JWT → redirects to /login/auth?token%3D<JWT>
5. Auth.svelte parses token, calls getLoginInfoByToken() ✅
6. Token validated, account-metadata-Token cookie set ✅
7. User sees workspace ✅

Actual (production)

1-4. Same as above ✅
5. Auth.svelte loads... ✅
6. getLoginInfoByToken() NEVER CALLED ❌ (0 calls in Jaeger)
7. result = null → goTo('login', true) → /login/login ❌

What I've Ruled Out

Hypothesis Status Evidence
Account service failure ❌ Ruled out Jaeger shows OIDC flow completes successfully
X-Forwarded-Proto missing ❌ Ruled out nginx config explicitly sets $scheme
CORS issues ❌ Ruled out Same origin (/_accounts/ on same domain)
Cookie Secure flag mismatch ❌ Ruled out nginx forwards proto correctly
Token not in redirect URL ❌ Ruled out HAR shows token reaches frontend
Frontend JS not executing ✅ CONFIRMED Jaeger shows 0 API calls from frontend

Relevant Code Paths

The auth flow is handled in plugins/login-resources/src/components/Auth.svelte:

onMount(async () => {
  // ...
  try {
    result = await getLoginInfoFromQuery() // ← This should call account service
  } catch (err) {
    // handle error
  }
  await handleLoginInfo(result)
})

async function handleLoginInfo(result) {
  if (result == null) {
    goTo('login', true) // ← This is where we end up in production
  }
  // ...
}

And getLoginInfoFromQuery() in plugins/login-resources/src/utils.ts:

export async function getLoginInfoFromQuery(): Promise<LoginInfoByToken | null> {
  const token = getCurrentLocation().query?.token
  if (token == null) return null // ← Possible failure point?

  const client = getAccountClient(token)
  return await client.getLoginInfoByToken() // ← Never called in production
}

Configuration Reference

nginx configuration (huly.conf)
server {
    server_name huly.redacted.com;
    listen 443 ssl;

    ssl_certificate     /etc/ssl/cloudflare/origin-cert.pem;
    ssl_certificate_key /etc/ssl/cloudflare/pk.key;

    location /_accounts/ {
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_pass http://127.0.0.1:3000/;
    }

    location / {
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_pass http://127.0.0.1:8080;
    }
}

server {
    listen 80;
    server_name huly.redacted.com;
    return 301 https://$host$request_uri;
}
docker-compose.yml (production)
name: ${DOCKER_NAME}

services:
  account:
    image: hardcoreeng/account:${HULY_VERSION}
    ports:
      - 3000:3000
    environment:
      - SERVER_PORT=3000
      - SERVER_SECRET=${SECRET}
      - DB_URL=${CR_DB_URL}
      - TRANSACTOR_URL=ws://transactor:3333;ws${SECURE:+s}://${HOST_ADDRESS}/_transactor
      - STORAGE_CONFIG=minio|minio?accessKey=redacted&secretKey=redacted
      - FRONT_URL=http${SECURE:+s}://${HOST_ADDRESS}
      - STATS_URL=http${SECURE:+s}://${HOST_ADDRESS}/_stats
      - MODEL_ENABLED=*
      - ACCOUNTS_URL=http${SECURE:+s}://${HOST_ADDRESS}/_accounts
      - ACCOUNT_PORT=3000
      - QUEUE_CONFIG=redpanda:9092
      - OPENID_CLIENT_ID=redacted
      - OPENID_CLIENT_SECRET=redacted
      - OPENID_ISSUER=https://login.microsoftonline.com/redacted/v2.0
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318/v1/traces
    restart: unless-stopped
    networks:
      - huly_net

  front:
    image: hardcoreeng/front:${HULY_VERSION}
    environment:
      - SERVER_PORT=8080
      - SERVER_SECRET=${SECRET}
      - LOVE_ENDPOINT=http${SECURE:+s}://${HOST_ADDRESS}/_love
      - ACCOUNTS_URL=http${SECURE:+s}://${HOST_ADDRESS}/_accounts
      - ACCOUNTS_URL_INTERNAL=http://account:3000
      - REKONI_URL=http${SECURE:+s}://${HOST_ADDRESS}/_rekoni
      - CALENDAR_URL=http${SECURE:+s}://${HOST_ADDRESS}/_calendar
      - GMAIL_URL=http${SECURE:+s}://${HOST_ADDRESS}/_gmail
      - TELEGRAM_URL=http${SECURE:+s}://${HOST_ADDRESS}/_telegram
      - STATS_URL=http${SECURE:+s}://${HOST_ADDRESS}/_stats
      - UPLOAD_URL=/files
      - ELASTIC_URL=http://elastic:9200
      - COLLABORATOR_URL=ws${SECURE:+s}://${HOST_ADDRESS}/_collaborator
      - STORAGE_CONFIG=minio|minio?accessKey=redacted&secretKey=redacted
      - TITLE=${TITLE:-Huly Self Host}
      - DEFAULT_LANGUAGE=${DEFAULT_LANGUAGE:-en}
      - LAST_NAME_FIRST=${LAST_NAME_FIRST:-true}
      - DESKTOP_UPDATES_CHANNEL=${DESKTOP_CHANNEL}
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318/v1/traces
    restart: unless-stopped
    networks:
      - huly_net

  # Other services: nginx, cockroach, redpanda, minio, elastic, rekoni,
  # transactor, collaborator, workspace, fulltext, stats, jaeger, kvs
  # (using standard huly-selfhost configuration)

volumes:
  elastic:
  files:
  cr_data:
  cr_certs:
  redpanda:
  telemetry:

networks:
  huly_net:

Questions for the Team

  1. Is there something special about how the self-host frontend images handle URL parsing?
    The parseQuery() function in packages/ui/src/location.ts uses decodeURIComponent() before splitting—could this behave differently in the built images vs source?

  2. Could the double-encoded token (token%3Dtoken=) be causing issues?
    The redirect URL is /login/auth?token%3D<JWT>. This is intentional, but maybe something in the build process or runtime environment handles it differently?

  3. Is getCurrentLocation().query?.token returning undefined in production?
    This would explain why getLoginInfoFromQuery() returns null immediately without making any API calls.

  4. Are there any differences between the source docker-compose and self-host images regarding how ACCOUNTS_URL is embedded?


How to Reproduce

  1. Deploy Huly self-host with OIDC configured (Microsoft Entra ID)
  2. Use nginx as reverse proxy with Cloudflare for SSL
  3. Click "Login with OpenID"
  4. Complete Microsoft authentication
  5. Observe redirect to /login/login instead of workspace

What I Need Help With

The Jaeger traces definitively prove the frontend isn't making the API call. But I can't figure out why. The browser console shows no JavaScript errors. The network tab shows no failed requests. It's like the code path that calls getLoginInfoByToken() simply isn't being reached.

Any insights would be greatly appreciated! 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions