Skip to content

feat(sdk): add job queue pattern for distributed work claiming#16518

Open
shirshanka wants to merge 2 commits intomasterfrom
sdk/job-queue
Open

feat(sdk): add job queue pattern for distributed work claiming#16518
shirshanka wants to merge 2 commits intomasterfrom
sdk/job-queue

Conversation

@shirshanka
Copy link
Copy Markdown
Contributor

Summary

  • Adds a new datahub.sdk.patterns.job_queue module implementing distributed work claiming using CAS (Compare-And-Swap) via GMS's ConditionalWriteValidator
  • ConditionalWriter (Layer 1): Generic CAS mechanics over any aspect — read versioned aspects, write-if-version-matches
  • Claim (Layer 2a): Atomic claim/release of entities with version tracking, is_claimed safety check, and from_fields shortcut for simple aspects
  • Discovery (Layer 2b): Pluggable work discovery — MCLDiscovery (event-driven via MCL stream) and SearchDiscovery (search-based catch-up/fallback)
  • Sweeper (Layer 2c): Opt-in stale claim cleanup via CAS force-release with configurable timeout
  • JobQueue (Layer 3): Composes Discovery + Claim into a single poll → claim → release API
  • Includes comprehensive unit tests (5 test files) and smoke tests (claiming races, sweeper cleanup, performance)
  • Adds tutorial documentation and design doc

Design

See design doc for full architecture, layer breakdown, sequence diagrams, and design rationale.

Key design decisions:

  • Lambda-based configuration — no subclassing or per-domain wrapper classes; callers supply make_claimed/make_released/is_claimed as constructor args
  • No new PDL aspects or GMS endpoints — works with whatever aspects already exist on entities
  • Discovery and Claim are separated — orthogonal concerns that compose independently
  • Sweeper is opt-in — caller runs it on a dedicated process; distributed leader election is out of scope

Test plan

  • Unit tests for all layers: ConditionalWriter, Claim, Discovery, Sweeper, JobQueue
  • Smoke tests: multi-worker claim races (exactly-one-winner), sweeper stale cleanup, performance benchmarks
  • Verify ./gradlew :metadata-ingestion:testQuick passes
  • Verify smoke tests pass against running GMS

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata docs Issues and Improvements to docs smoke_test Contains changes related to smoke tests labels Mar 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Linear: ING-1874

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 94.25000% with 23 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...on/src/datahub/sdk/patterns/job_queue/discovery.py 87.69% 16 Missing ⚠️
...estion/src/datahub/sdk/patterns/job_queue/claim.py 91.86% 7 Missing ⚠️

📢 Thoughts on this report? Let us know!

@maggiehays maggiehays added the needs-review Label for PRs that need review from a maintainer. label Mar 10, 2026
@rajatoss
Copy link
Copy Markdown
Member

rajatoss commented Mar 10, 2026

Connector Tests Results

All connector tests passed for commit 4f97d4d

View full test logs →

To skip connector tests, add the skip-connector-tests label (org members only).

Autogenerated by the connector-tests CI pipeline.

@maggiehays maggiehays added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Mar 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Your PR has been assigned to @skrydal (piotr.skrydalewicz) for review (ING-1874).

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 15, 2026

Bundle Report

Bundle size has no change ✅

@maggiehays maggiehays added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Mar 15, 2026
…smoke tests

Introduces the job queue SDK for scalable execution request processing:
- ConditionalWriter: CAS (Compare-And-Swap) layer using If-Version-Match headers
- Claim: atomic claim/release of entities with is_claimed predicate to prevent
  sequential overwrites
- Discovery and JobQueue orchestration layers
- Fix OpenAPI request_helper to forward per-MCP headers (If-Version-Match)
- Add executorInstanceId field to ExecutionRequestResult PDL
- Smoke tests validating claim lifecycle, concurrent claim races, and sequential
  conflict detection against live GMS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Issues and Improvements to docs ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer. smoke_test Contains changes related to smoke tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants