-
-
Notifications
You must be signed in to change notification settings - Fork 24
Dataset cache purge cronjob #1211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1211 +/- ##
===========================================
+ Coverage 86.02% 86.20% +0.18%
===========================================
Files 94 95 +1
Lines 3256 3284 +28
Branches 373 377 +4
===========================================
+ Hits 2801 2831 +30
+ Misses 380 378 -2
Partials 75 75 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a dataset cache purge cronjob that automatically marks datasets as stale after they've exceeded a configurable age threshold (default 24 hours). The implementation adds a new internal API endpoint for dataset lifecycle management and a Kubernetes CronJob to periodically trigger cache cleanup.
Key changes:
- Added new internal API endpoint
/servicex/internal/dataset-lifecyclefor dataset cache purging - Introduced Kubernetes CronJob to automatically purge old cached datasets every hour
- Refactored dataset deletion logic to enable reuse between manual and automated operations
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| servicex_app/routes.py | Registers the new DatasetLifecycleOps resource endpoint |
| servicex_app/resources/internal/dataset_lifecycle_ops.py | New endpoint implementation that purges datasets older than specified age |
| servicex_app/resources/datasets/get_all.py | Extracts dataset fetching logic into reusable function |
| servicex_app/resources/datasets/delete_dataset.py | Extracts deletion logic into reusable function |
| helm/servicex/values.yaml | Adds configuration parameters for dataset lifecycle cronjob |
| helm/servicex/templates/dataset-lifecycle/cronjob.yaml | Defines Kubernetes CronJob for periodic dataset cache cleanup |
| docs/deployment/reference.md | Documents new configuration parameters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| env: | ||
| - name: LIFETIME | ||
| value: {{ .Values.datasetLifecycle.cacheLifetime }} | ||
| command: |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command should be formatted as an array with shell invocation, similar to the data-lifecycle cronjob. Change to use '/bin/sh' with '-c' flag for better error handling and consistency: command: ['/bin/sh', '-c', 'curl --request POST \"http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME\"']
| command: | |
| command: | |
| - /bin/sh | |
| - -c |
servicex_app/servicex_app/resources/internal/dataset_lifecycle_ops.py
Outdated
Show resolved
Hide resolved
| - name: LIFETIME | ||
| value: {{ .Values.datasetLifecycle.cacheLifetime }} | ||
| command: | ||
| - curl --request POST "http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME" |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The curl command lacks error handling flags. Add '-f' (fail on HTTP errors) and '--silent' or '--show-error' for better error reporting: curl -f --silent --show-error --request POST ... This ensures the job fails appropriately when the API returns an error.
| - curl --request POST "http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME" | |
| - curl -f --silent --show-error --request POST "http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
servicex_app/servicex_app/resources/internal/dataset_lifecycle_ops.py
Outdated
Show resolved
Hide resolved
servicex_app/servicex_app/resources/internal/dataset_lifecycle_ops.py
Outdated
Show resolved
Hide resolved
servicex_app/servicex_app/resources/internal/dataset_lifecycle_ops.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
Co-authored-by: ponyisi <4177101+ponyisi@users.noreply.github.com>
BenGalewsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - glad you were able to tackle this.
Just a couple of issues:
- Don't call methods in one resource from another. Migrate common database operations to
models.py - Make Transform cleanup and dataset cleanup siblings in the helm values
dataLifecycleproperties
servicex_app/servicex_app/resources/internal/dataset_lifecycle_ops.py
Outdated
Show resolved
Hide resolved
BenGalewsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great!
Address #1167 , except the cache lifetime is reduced to one day.