Skip to content

Conversation

@ponyisi
Copy link
Collaborator

@ponyisi ponyisi commented Nov 17, 2025

Address #1167 , except the cache lifetime is reduced to one day.

@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.20%. Comparing base (f343c26) to head (c069ca4).
⚠️ Report is 10 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1211      +/-   ##
===========================================
+ Coverage    86.02%   86.20%   +0.18%     
===========================================
  Files           94       95       +1     
  Lines         3256     3284      +28     
  Branches       373      377       +4     
===========================================
+ Hits          2801     2831      +30     
+ Misses         380      378       -2     
  Partials        75       75              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a dataset cache purge cronjob that automatically marks datasets as stale after they've exceeded a configurable age threshold (default 24 hours). The implementation adds a new internal API endpoint for dataset lifecycle management and a Kubernetes CronJob to periodically trigger cache cleanup.

Key changes:

  • Added new internal API endpoint /servicex/internal/dataset-lifecycle for dataset cache purging
  • Introduced Kubernetes CronJob to automatically purge old cached datasets every hour
  • Refactored dataset deletion logic to enable reuse between manual and automated operations

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
servicex_app/routes.py Registers the new DatasetLifecycleOps resource endpoint
servicex_app/resources/internal/dataset_lifecycle_ops.py New endpoint implementation that purges datasets older than specified age
servicex_app/resources/datasets/get_all.py Extracts dataset fetching logic into reusable function
servicex_app/resources/datasets/delete_dataset.py Extracts deletion logic into reusable function
helm/servicex/values.yaml Adds configuration parameters for dataset lifecycle cronjob
helm/servicex/templates/dataset-lifecycle/cronjob.yaml Defines Kubernetes CronJob for periodic dataset cache cleanup
docs/deployment/reference.md Documents new configuration parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

env:
- name: LIFETIME
value: {{ .Values.datasetLifecycle.cacheLifetime }}
command:
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command should be formatted as an array with shell invocation, similar to the data-lifecycle cronjob. Change to use '/bin/sh' with '-c' flag for better error handling and consistency: command: ['/bin/sh', '-c', 'curl --request POST \"http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME\"']

Suggested change
command:
command:
- /bin/sh
- -c

Copilot uses AI. Check for mistakes.
- name: LIFETIME
value: {{ .Values.datasetLifecycle.cacheLifetime }}
command:
- curl --request POST "http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME"
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The curl command lacks error handling flags. Add '-f' (fail on HTTP errors) and '--silent' or '--show-error' for better error reporting: curl -f --silent --show-error --request POST ... This ensures the job fails appropriately when the API returns an error.

Suggested change
- curl --request POST "http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME"
- curl -f --silent --show-error --request POST "http://{{ .Release.Name }}-servicex-app:8000/servicex/internal/dataset-lifecycle?age=$LIFETIME"

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ponyisi and others added 2 commits November 17, 2025 13:54
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

Copilot AI commented Nov 17, 2025

@ponyisi I've opened a new pull request, #1214, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits November 17, 2025 14:10
Co-authored-by: ponyisi <4177101+ponyisi@users.noreply.github.com>
@ponyisi ponyisi requested a review from MattShirley November 24, 2025 03:38
Copy link
Contributor

@BenGalewsky BenGalewsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - glad you were able to tackle this.
Just a couple of issues:

  1. Don't call methods in one resource from another. Migrate common database operations to models.py
  2. Make Transform cleanup and dataset cleanup siblings in the helm values dataLifecycle properties

@ponyisi ponyisi added this to the 1.7.6 milestone Dec 11, 2025
@ponyisi ponyisi requested a review from BenGalewsky December 22, 2025 22:07
Copy link
Contributor

@BenGalewsky BenGalewsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great!

@ponyisi ponyisi merged commit 1bb2c44 into develop Jan 9, 2026
78 of 82 checks passed
@ponyisi ponyisi deleted the dataset-cache-cleanup branch January 9, 2026 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants