internal API access for scoped k8s objects#3220
Merged
Conversation
…nternal indexing/crawling access via backend container - use custom jwt tokens for internal access - avoids creating large configmaps that may exceed size limit
tw4l
approved these changes
Mar 12, 2026
Member
tw4l
left a comment
There was a problem hiding this comment.
Tested locally. Only one minor suggestion.
catch jwt exceptions block /internal/ access in frontend
emma-sg
approved these changes
Mar 12, 2026
8ddcde7 to
bcad9b0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3219
Adds a way to expose certain API endpoints (collection replay resources list) via a new
/collections/.../internal/replay.jsonendpoint. This is done with a custom JWT token which includes additional data. The subject set to collection, and type is set tocoll. Also, scope and scope_type are used to tie the token to existing of a particular k8s object, in this case, the index import job. Thus, the token is only valid while the k8s object exists (expiration date is set to a year to avoid expiry).The API url with token is passed directly to the import job, and then job will load from the URL, while the configmap is left empty (it is still used to block index job from starting until index is ready). Of course, the API url with the token should not be exposed outside k8s.
For this use case, the custom JWT token is set to
{"sub": coll_id, "sub_type": "coll", "scope": "job", job_id, "scope_type": "job"}which provides access to collectioncoll_idwhile k8s jobjob_idexists via the/collections/.../internal/replay.jsonendpoint.We could also expand this later to support QA for collections, for example, by using
"scope": <qa_job_id>, "scope_type": "crawljob"or for QA for a single crawl as well.Note: API could be made further secure by storing the API url in a secret, and mapping the secret instead of a configmap, but that requires changes in the crawler indexer.