Skip to content

Conversation

TheRealHaoLiu
Copy link
Member

AAP-50407 JWT Claims Gateway Integration - Complete Implementation

Description

This PR implements a complete migration from JWT token-embedded claims to gateway-sourced claims via the service-index/jwt_claims/<user_ansible_id> endpoint.

What is being changed?

  • JWT Authentication: Modified to fetch permission claims from gateway service-index API instead of using embedded JWT token fields
  • RBAC Processing: Updated to exclusively use gateway claims data with no fallback to deprecated JWT token fields
  • JWT Validation: Removed deprecated fields (objects, object_roles, global_roles) from required JWT token validation
  • Resource Creation: Updated to use gateway claims exclusively for building organization and team resources

Why is this change needed?

  • Deprecation: JWT token fields objects, object_roles, and global_roles are being deprecated across the platform
  • Centralization: Move permission data management to the gateway service for better consistency and real-time updates
  • Performance: Reduce JWT token size by removing large permission data structures
  • Architecture: Clean separation between authentication (JWT) and authorization (gateway)

How does this change address the issue?

  • Implements get_jwt_claims() method in ResourceAPIClient to call the gateway endpoint
  • Updates JWTCommonAuth to fetch and store gateway claims in self.gateway_claims
  • Modifies process_rbac_permissions() and get_or_create_resource() to use gateway data exclusively
  • Removes all fallback logic to deprecated JWT token fields
  • Provides clear error handling when gateway claims are unavailable

Type of Change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (no functional changes)

Self-Review Checklist

  • I have performed a self-review of my code
  • I have added relevant comments to complex code sections
  • I have updated documentation where needed
  • I have considered the security impact of these changes
  • I have considered performance implications
  • I have thought about error handling and edge cases
  • I have tested the changes in my local environment

Testing Instructions

Prerequisites

  • Gateway service running with functional service-index/jwt_claims/<user_ansible_id> endpoint
  • Valid JWT tokens for authentication
  • Test users with various permission configurations

Steps to Test

  1. Basic Authentication Test:

    • Send JWT token via X-DAB-JW-TOKEN header
    • Verify user authentication succeeds
    • Check that self.gateway_claims is populated
  2. RBAC Processing Test:

    • Authenticate user with various role assignments
    • Call process_rbac_permissions()
    • Verify roles are correctly assigned from gateway claims data
    • Confirm no usage of deprecated JWT token fields
  3. Resource Creation Test:

    • Test with team and organization creation scenarios
    • Verify resources are created using gateway claims data
    • Test error handling when gateway claims are unavailable
  4. Error Handling Test:

    • Simulate gateway endpoint unavailability
    • Verify authentication succeeds but RBAC processing fails gracefully
    • Check appropriate error messages are logged

Expected Results

  • JWT authentication works without deprecated token fields
  • Permission processing exclusively uses gateway claims
  • Clear error messages when gateway is unavailable
  • No fallback to deprecated JWT token fields under any circumstance

Additional Context

Required Actions

  • Requires coordination with other teams
  • Requires downstream repository changes

Breaking Changes

This is a breaking change that affects systems relying on JWT token embedded claims:

  1. Gateway Dependency: RBAC functionality now requires the gateway service-index/jwt_claims/ endpoint
  2. No Fallback: Systems cannot process permissions without gateway connectivity
  3. JWT Token Changes: Deprecated fields are no longer required in JWT tokens

Migration Impact

Before: JWT tokens contained full permission data

{
  "sub": "user-id",
  "objects": {"organization": [...], "team": [...]},
  "object_roles": {"Org Admin": {...}},
  "global_roles": ["Platform Auditor"]
}

After: JWT tokens contain minimal auth data, permissions from gateway

{
  "sub": "user-id",
  "user_data": {...}
}

Implementation Details

Files Modified

  • ansible_base/jwt_consumer/common/auth.py - Main JWT processing logic
  • ansible_base/jwt_consumer/common/cache.py - Cache methods (cleanup)
  • ansible_base/resource_registry/rest_client.py - Gateway API client
  • test_app/tests/conftest.py - Test fixtures updated

Key Methods Updated

  • JWTCommonAuth.parse_jwt_token() - Now fetches from gateway
  • JWTCommonAuth.process_rbac_permissions() - Uses gateway claims only
  • JWTCommonAuth.get_or_create_resource() - Uses gateway claims only
  • ResourceAPIClient.get_jwt_claims() - New gateway endpoint method

Error Handling Strategy

  • Authentication: Succeeds even if gateway unavailable (basic user creation)
  • Authorization: Fails gracefully with clear error messages
  • Logging: Detailed debug/error logs for troubleshooting

Performance Considerations

  • Network Calls: Now makes HTTP request to gateway for each JWT authentication
  • Caching: Removed complex claims-hash optimization (can be re-added later)
  • Token Size: JWT tokens will be significantly smaller without embedded claims

Security Impact

  • Separation of Concerns: Authentication (JWT) and authorization (gateway) are cleanly separated
  • Real-time Permissions: Gateway provides up-to-date permission data
  • Reduced Attack Surface: Smaller JWT tokens, less embedded data to validate

This implementation provides a clean, no-fallback solution that exclusively uses the gateway service for permission claims while maintaining robust error handling for operational scenarios.

@Copilot Copilot AI review requested due to automatic review settings August 6, 2025 19:17
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors JWT claims handling to use a gateway endpoint instead of embedded JWT token fields. It removes the deprecated objects, object_roles, and global_roles fields from JWT tokens and fetches permission data from the gateway's service-index/jwt_claims/<user_ansible_id> endpoint.

Key changes:

  • Modified JWT authentication to fetch claims from gateway API instead of using embedded token fields
  • Updated RBAC processing to exclusively use gateway claims with no fallback to deprecated JWT fields
  • Added new get_jwt_claims() method to ResourceAPIClient for gateway communication

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
test_app/tests/conftest.py Removed deprecated JWT token fields and added explanatory comments
ansible_base/resource_registry/rest_client.py Added new get_jwt_claims() method and improved import formatting
ansible_base/jwt_consumer/common/cache.py Added trailing newline
ansible_base/jwt_consumer/common/auth.py Major refactoring to use gateway claims exclusively with comprehensive error handling
Comments suppressed due to low confidence (1)

ansible_base/jwt_consumer/common/auth.py:165

  • [nitpick] The empty line between method definition and the next method creates inconsistent spacing. Consider removing this extra blank line to maintain consistent code formatting.
    def _fetch_jwt_claims_from_gateway(self, user_ansible_id):

return resource, resource.content_object
else:
logger.error(f"build_resource_stub does not know how to build an object of type {type}")
logger.error(f"build_resource_stub does not know how to build an object of type {content_type}")
Copy link
Preview

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message references 'build_resource_stub' but this method is actually called 'get_or_create_resource'. The error message should be updated to reflect the correct method name for clarity.

Suggested change
logger.error(f"build_resource_stub does not know how to build an object of type {content_type}")
logger.error(f"get_or_create_resource does not know how to build an object of type {content_type}")

Copilot uses AI. Check for mistakes.

@TheRealHaoLiu TheRealHaoLiu force-pushed the AAP-47811-update-jwt_consumer-to-load-user-claims-from-new-endpoint branch from 30d4e41 to df0798d Compare August 6, 2025 19:28
JWT claims are now exclusively fetched from the gateway service-index API instead of being included in the JWT token. Deprecated fields (objects, object_roles, global_roles) are removed from token processing and all RBAC logic now relies on gateway claims. Added helper to ResourceAPIClient for fetching claims, and updated tests to reflect the new claims source.
@TheRealHaoLiu TheRealHaoLiu force-pushed the AAP-47811-update-jwt_consumer-to-load-user-claims-from-new-endpoint branch from df0798d to 77db991 Compare August 7, 2025 17:08
Switches to using the full service path from settings when retrieving JWT claims, adds robust JSON parsing with error handling, and enhances logging for invalid responses and failures. This improves reliability and debuggability when interacting with the gateway.
Replaces usage of the 'token' attribute with 'gateway_claims' in JWTCommonAuth tests to reflect recent code changes. Adds comprehensive tests for fetching JWT claims from the gateway, including success, invalid JSON, non-200 responses, and cache/hash logic.
Removed unused exception variable in JWTCommonAuth and improved formatting in related unit tests for clarity and consistency. No functional changes to logic.
Modified test_auth.py and hub/test_auth.py to reflect changes in JWT claims structure, replacing 'token' with 'gateway_claims' and updating role representations. Adjusted test parameters and assertions to align with the new claims format.
Update the condition to explicitly check for None when verifying the presence of gateway_claims. This prevents issues when gateway_claims is an empty value but not None.
Introduces tests to cover cache miss scenarios, exception handling during JWT claims fetch, and correct usage of RESOURCE_SERVICE_PATH. Also adds tests for caching behavior when claims hash or gateway claims are missing, improving coverage and reliability of JWTCommonAuth.
Cleaned up extra spaces and ensured consistent formatting in test_auth.py for better readability and code style adherence. No functional changes were made.
@TheRealHaoLiu
Copy link
Member Author

no matter how much test i add... sonar just saying i have 0% coverage for the new code! LIES!

logger.debug(f"Claims hash changed for user {user_ansible_id}: cached={cached_hash}, current={current_claims_hash}")
return True

# Hash matches cached value, try to get cached claims
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, don't do that.

If hashes match, do nothing. Don't do approximately nothing, do exactly nothing. Don't save the claims for later. If the hashes match that means the claims have already been saved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update to do approximately nothing 🤣

self.cache = JWTCache()
self.user = None
self.token = None
self.gateway_claims = None # Store claims from gateway
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very strongly convinced this shouldn't be on self. Let the de-referenced be garbage collected.

for system_role_name in self.token.get("global_roles", []):
# Process global roles from gateway claims
global_roles = self.gateway_claims.get("global_roles", [])
for system_role_name in global_roles:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're trying too hard to not change the existing code. And in this case it's not good code.

Make a new method, like save_claims(user, claims). See other stuff in ansible_base/rbac/claims.py, it should fit in with that crowd.

It should take the claims, and make them true for that user. You don't have to refactor the logic itself here (although you should, you don't have to), but you do really need to refactor the interface.

This method should be callable from unit tests, and it should be called from unit tests. We shouldn't need to see the JWT auth class within a mile of that logic.

self.cache.set_claims_hash(user_ansible_id, claims_hash)
self.cache.set_cached_claims(user_ansible_id, self.gateway_claims)

def _fetch_jwt_claims_from_gateway(self, user_ansible_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _fetch_jwt_claims_from_gateway(self, user_ansible_id):
def fetch_jwt_claims_from_gateway(self, user_ansible_id) -> dict[str,dict[dict[dict[str,list],list],dict]:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do what #583 does. I'm telling you right now, the reason this is a duplicated mess is because:

TODO - galaxy does not have an org admin roledef yet

and that this should no longer apply. If galaxy_ng errors because it doesn't have an Organization Admin role, add it there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, we'll do this in a separate PR

cached_hash = self.cache.get_claims_hash(user_ansible_id)
if cached_hash != current_claims_hash:
logger.debug(f"Claims hash changed for user {user_ansible_id}: cached={cached_hash}, current={current_claims_hash}")
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a modifier to this, from another tangent conversation - if hashes mismatch, you can re-generate the local hash, and if that matches, replace the local cache value with it, and return True. But if you re-generate the local hash and it still doesn't match, then request has to be made.

Copy link

DVCS PR Check Results:

PR appears valid (JIRA key(s) found)

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants