-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add bulk apis for pipeline status #25731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
5ee37f1
Add bulk apis for pipeline status
harshach 6036cee
Update generated TypeScript types
github-actions[bot] b1b7c68
Fix gitar comments
harshach 18dac89
Merge remote-tracking branch 'origin/batch_pipeline_status' into batc…
harshach 64ca150
Update generated TypeScript types
github-actions[bot] da96314
Merge branch 'main' into batch_pipeline_status
harshach d15675f
Fix pycheck
harshach 193b202
Merge remote-tracking branch 'origin/batch_pipeline_status' into batc…
harshach 42b2f8c
Merge branch 'main' into batch_pipeline_status
harshach afa2bc7
Address comments
harshach a9fd2d7
Merge branch 'main' into batch_pipeline_status
harshach 4aaf8dc
Merge branch 'main' into batch_pipeline_status
harshach db760cc
Merge branch 'main' into batch_pipeline_status
harshach 5ce7457
Merge branch 'main' into batch_pipeline_status
harshach 5a5afb1
Merge branch 'main' into batch_pipeline_status
harshsoni2024 e65d10f
Merge remote-tracking branch 'origin/main' into batch_pipeline_status
harshach 1415ff3
Fix databricks test
harshach 4a073fa
Move schema changes to 1.11.9
harshach File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
11 changes: 11 additions & 0 deletions
11
bootstrap/sql/migrations/native/1.11.9/mysql/schemaChanges.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| -- Change entity_extension_time_series.timestamp from VIRTUAL to STORED for performance. | ||
| -- STORED columns are materialized on disk, making unique constraint checks and range | ||
| -- queries on timestamp significantly faster (especially for bulk pipeline status upserts). | ||
| -- MySQL does not allow ALTER from VIRTUAL to STORED directly, so we drop and re-add. | ||
| -- NOTE: This will lock the table for a full rebuild. On large deployments with millions | ||
| -- of rows in entity_extension_time_series, plan for downtime accordingly. | ||
| ALTER TABLE entity_extension_time_series | ||
| DROP INDEX entity_extension_time_series_constraint, | ||
| DROP COLUMN `timestamp`, | ||
| ADD COLUMN `timestamp` bigint unsigned GENERATED ALWAYS AS (json_unquote(json_extract(`json`, _utf8mb4'$.timestamp'))) STORED NOT NULL, | ||
| ADD UNIQUE KEY `entity_extension_time_series_constraint` (`entityFQNHash`, `extension`, `timestamp`); |
3 changes: 3 additions & 0 deletions
3
bootstrap/sql/migrations/native/1.11.9/postgres/schemaChanges.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| -- No changes needed for entity_extension_time_series.timestamp on PostgreSQL. | ||
| -- PostgreSQL already uses STORED for the generated timestamp column (since table creation). | ||
| -- MySQL migration changes it from VIRTUAL to STORED for consistency and performance. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
yield_pipeline_statusmethod usesbreakwhen it encounters a run withstart_time < cutoff_ts. This assumes thatself.client.get_job_runs()returns runs in strictly descending chronological order (newest first).If the Databricks API returns runs in a different order (e.g., ascending, or unordered), this
breakwill cause the method to skip recent runs that appear after an older one in the list. This would result in incomplete pipeline status data being ingested.Suggestion: Either:
get_job_runs()guarantees descending order (Databricks Jobs API does return runs in descending order by default when usinglist_runs)continueinstead ofbreakto safely handle all runs regardless of ordering, though this loses the early-termination optimizationsorted(runs, key=lambda r: r.get('start_time', 0), reverse=True)Was this helpful? React with 👍 / 👎