-
Notifications
You must be signed in to change notification settings - Fork 164
Description
Motivation
The Model Service endpoint generates JWT tokens for authentication purposes when accessing inference endpoints. Currently, these tokens include a route_info array in the payload that contains detailed routing information for each backend kernel/replica. As observed in the JWT payload structure, each route entry in the route_info array contains approximately 10+ fields including route_id, session_id, session_name, kernel_host, kernel_port, protocol, traffic_ratio, health_status, last_health_check, and consecutive_failures.
This routing information is primarily used for internal traffic distribution and health monitoring within the AppProxy coordinator, and embedding it directly into the JWT token creates several issues:
The route_info array can grow significantly as more routing entries (replicas) are added to a model service, resulting in unnecessarily large JWT tokens. For instance, a model service with multiple replicas would have multiple route entries, each containing internal IP addresses, port numbers, health check timestamps, and other operational metadata. This data is not required for client-side authentication or authorization purposes.
Including internal infrastructure details such as kernel host IPs and ports in client-facing tokens also poses a potential information disclosure concern, as clients can decode the JWT payload and observe internal network topology and backend instance details.
The AppProxy coordinator and workers already have access to routing information through Redis-based route management (as introduced in #5134), making the inclusion of route_info in the JWT token redundant. The token should only contain the minimal claims necessary for authentication and authorization validation.
Required Features
Remove the route_info field from the JWT token payload generated for Model Service endpoint authentication. The token payload should be simplified to contain only essential claims such as:
id: The endpoint identifieruser_id: The user identifier associated with the tokenprotocol: The protocol type (e.g., "http")frontend_mode: The frontend mode (e.g., "port")endpoint_id: The endpoint UUIDworker: The worker identifierport: The designated port numberruntime_variant: The runtime variant type (e.g., "custom")app_mode: The application mode (e.g., "inference")- Standard JWT claims (
exp,iat, etc.)
The routing logic should continue to be handled internally by the AppProxy coordinator using the existing Redis-based route management system, without exposing this information in the client token.
Impact
The following components will be affected by this change:
src/ai/backend/appproxy/coordinator/api/endpoint.py: The endpoint API handler that generates JWT tokens for model service authentication needs to be modified to exclude theroute_infofield from the token payload.- Token validation logic in the AppProxy worker may need to be reviewed to ensure it does not depend on the presence of
route_infoin the token payload for routing decisions (routing should be fetched from Redis instead). - Client SDK and documentation may need updates if any client-side code currently parses or relies on the
route_infofield from decoded tokens (though this is unlikely for properly designed clients).
Testing Scenarios
- Generate a new Model Service endpoint token and verify that the JWT payload no longer contains the
route_infofield. Decode the token and confirm the payload size is significantly reduced. - Verify that model service inference requests continue to work correctly with the new token format, ensuring that the AppProxy coordinator and workers properly route requests using Redis-based route information.
- Test model services with multiple replicas to confirm that load balancing and traffic distribution continue to function correctly without
route_infoin the token. - Validate that health check status changes for backend kernels do not affect token validity or require token regeneration.
- Perform token validation tests to ensure authentication and authorization still work correctly with the reduced payload.
- Test edge cases such as expired tokens, revoked tokens, and tokens for deleted endpoints to ensure proper error handling remains intact.
JIRA Issue: BA-3668