Skip to content

internal API access for scoped k8s objects#3220

Merged
ikreymer merged 3 commits intomainfrom
issue-3219-internal-coll-replay-access
Mar 12, 2026
Merged

internal API access for scoped k8s objects#3220
ikreymer merged 3 commits intomainfrom
issue-3219-internal-coll-replay-access

Conversation

@ikreymer
Copy link
Copy Markdown
Member

@ikreymer ikreymer commented Mar 12, 2026

Fixes #3219

Adds a way to expose certain API endpoints (collection replay resources list) via a new /collections/.../internal/replay.json endpoint. This is done with a custom JWT token which includes additional data. The subject set to collection, and type is set to coll. Also, scope and scope_type are used to tie the token to existing of a particular k8s object, in this case, the index import job. Thus, the token is only valid while the k8s object exists (expiration date is set to a year to avoid expiry).
The API url with token is passed directly to the import job, and then job will load from the URL, while the configmap is left empty (it is still used to block index job from starting until index is ready). Of course, the API url with the token should not be exposed outside k8s.

For this use case, the custom JWT token is set to {"sub": coll_id, "sub_type": "coll", "scope": "job", job_id, "scope_type": "job"} which provides access to collection coll_id while k8s job job_id exists via the /collections/.../internal/replay.json endpoint.

We could also expand this later to support QA for collections, for example, by using "scope": <qa_job_id>, "scope_type": "crawljob" or for QA for a single crawl as well.

Note: API could be made further secure by storing the API url in a secret, and mapping the secret instead of a configmap, but that requires changes in the crawler indexer.

…nternal indexing/crawling access via backend container

- use custom jwt tokens for internal access
- avoids creating large configmaps that may exceed size limit
@ikreymer ikreymer requested review from emma-sg and tw4l March 12, 2026 17:32
Copy link
Copy Markdown
Member

@tw4l tw4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally. Only one minor suggestion.

catch jwt exceptions
block /internal/ access in frontend
Copy link
Copy Markdown
Member

@emma-sg emma-sg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as far as I can tell! Seems to work. Left a suggestion for ensuring that audience and expiry are included in jwt verification.

@ikreymer ikreymer force-pushed the issue-3219-internal-coll-replay-access branch from 8ddcde7 to bcad9b0 Compare March 12, 2026 23:18
@ikreymer ikreymer merged commit b30e069 into main Mar 12, 2026
46 checks passed
@ikreymer ikreymer deleted the issue-3219-internal-coll-replay-access branch March 12, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task]: Support internal replay access for collections to avoid large configmaps

3 participants