Skip to content

Remove route_info from Model Service JWT Token Payload #7704

@kyujin-cho

Description

@kyujin-cho

Motivation

The Model Service endpoint generates JWT tokens for authentication purposes when accessing inference endpoints. Currently, these tokens include a route_info array in the payload that contains detailed routing information for each backend kernel/replica. As observed in the JWT payload structure, each route entry in the route_info array contains approximately 10+ fields including route_id, session_id, session_name, kernel_host, kernel_port, protocol, traffic_ratio, health_status, last_health_check, and consecutive_failures.

This routing information is primarily used for internal traffic distribution and health monitoring within the AppProxy coordinator, and embedding it directly into the JWT token creates several issues:

The route_info array can grow significantly as more routing entries (replicas) are added to a model service, resulting in unnecessarily large JWT tokens. For instance, a model service with multiple replicas would have multiple route entries, each containing internal IP addresses, port numbers, health check timestamps, and other operational metadata. This data is not required for client-side authentication or authorization purposes.

Including internal infrastructure details such as kernel host IPs and ports in client-facing tokens also poses a potential information disclosure concern, as clients can decode the JWT payload and observe internal network topology and backend instance details.

The AppProxy coordinator and workers already have access to routing information through Redis-based route management (as introduced in #5134), making the inclusion of route_info in the JWT token redundant. The token should only contain the minimal claims necessary for authentication and authorization validation.

Required Features

Remove the route_info field from the JWT token payload generated for Model Service endpoint authentication. The token payload should be simplified to contain only essential claims such as:

  • id: The endpoint identifier
  • user_id: The user identifier associated with the token
  • protocol: The protocol type (e.g., "http")
  • frontend_mode: The frontend mode (e.g., "port")
  • endpoint_id: The endpoint UUID
  • worker: The worker identifier
  • port: The designated port number
  • runtime_variant: The runtime variant type (e.g., "custom")
  • app_mode: The application mode (e.g., "inference")
  • Standard JWT claims (exp, iat, etc.)

The routing logic should continue to be handled internally by the AppProxy coordinator using the existing Redis-based route management system, without exposing this information in the client token.

Impact

The following components will be affected by this change:

  • src/ai/backend/appproxy/coordinator/api/endpoint.py: The endpoint API handler that generates JWT tokens for model service authentication needs to be modified to exclude the route_info field from the token payload.
  • Token validation logic in the AppProxy worker may need to be reviewed to ensure it does not depend on the presence of route_info in the token payload for routing decisions (routing should be fetched from Redis instead).
  • Client SDK and documentation may need updates if any client-side code currently parses or relies on the route_info field from decoded tokens (though this is unlikely for properly designed clients).

Testing Scenarios

  1. Generate a new Model Service endpoint token and verify that the JWT payload no longer contains the route_info field. Decode the token and confirm the payload size is significantly reduced.
  2. Verify that model service inference requests continue to work correctly with the new token format, ensuring that the AppProxy coordinator and workers properly route requests using Redis-based route information.
  3. Test model services with multiple replicas to confirm that load balancing and traffic distribution continue to function correctly without route_info in the token.
  4. Validate that health check status changes for backend kernels do not affect token validity or require token regeneration.
  5. Perform token validation tests to ensure authentication and authorization still work correctly with the reduced payload.
  6. Test edge cases such as expired tokens, revoked tokens, and tokens for deleted endpoints to ensure proper error handling remains intact.

JIRA Issue: BA-3668

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions