-
Notifications
You must be signed in to change notification settings - Fork 83
chore: Enforce strict backend version compatibility checks #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow SDK! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a strict version compatibility check between the Kubeflow Trainer SDK and the Kubernetes control plane. The check runs automatically when initializing the KubernetesBackend and compares the local kubeflow-trainer-api package version against the version published in a ConfigMap by the control plane. Users can opt out via the KUBEFLOW_TRAINER_SKIP_VERSION_CHECK environment variable.
Changes:
- Added
verify_backend()method to KubernetesBackend that performs version compatibility checks - Added abstract
verify_backend()hook to the base RuntimeBackend class - Comprehensive test coverage for success, mismatch, missing ConfigMap, and skip scenarios
- Updated existing test fixtures to mock the version check behavior
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| kubeflow/trainer/backends/kubernetes/backend.py | Implements version verification logic that reads control-plane version from ConfigMap and compares with local SDK version |
| kubeflow/trainer/backends/base.py | Adds optional verify_backend() hook to the RuntimeBackend abstract class with default no-op implementation |
| kubeflow/trainer/backends/kubernetes/backend_test.py | Adds comprehensive test coverage for version check scenarios and updates existing fixtures to mock version check dependencies |
| kubeflow/trainer/api/trainer_client_test.py | Updates test mocks to include ConfigMap and version metadata mocking for backend initialization |
Signed-off-by: Surya Sameer Datta Vaddadi <f20220373@goa.bits-pilani.ac.in>
9e3c3e5 to
bc0979f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Surya Sameer Datta Vaddadi <137607947+sameerdattav@users.noreply.github.com>
| version published by the control plane in a public ConfigMap. | ||
|
|
||
| The verification can be skipped entirely by setting the | ||
| ``KUBEFLOW_TRAINER_SKIP_VERSION_CHECK`` environment variable to any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason to introduce this env variable?
I would suggest that we always check the version for now.
| if local_version != server_version: | ||
| raise RuntimeError( | ||
| "Kubeflow Trainer SDK version mismatch detected: local " | ||
| f"kubeflow-trainer-api=={local_version}, control plane " | ||
| f"kubeflow-trainer-api=={server_version}. The control plane " | ||
| "may be outdated or mismatched with this SDK. Please upgrade " | ||
| "either the SDK or the control plane so the versions match, " | ||
| "or set KUBEFLOW_TRAINER_SKIP_VERSION_CHECK=1 to bypass " | ||
| "this verification (not recommended)." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, let's do logger.warning() if version of control plane is incompatible or config map is missing.
We need to release the Trainer v2.2 where we add this ConfigMap.
| try: | ||
| local_version = metadata.version("kubeflow-trainer-api") | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this check, since this package is always installed.
| "KUBEFLOW_TRAINER_SKIP_VERSION_CHECK=1 to skip this check." | ||
| ) from e | ||
|
|
||
| if local_version != server_version: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, we should do this check:
- For the minor version it should:
server_version == local_version - For the patch version it should:
server_version >= local_version
Since we don't introduce new features for the patch versions.
We might also want to introduce strict dependency for minor version here
(e.g. kubeflow-trainer-api~=2.1.0): https://github.com/kubeflow/sdk/blob/7797f35dfe8dce1993e968364c17c59a6a422697/pyproject.toml#L31C4-L31C24
What do you think @kubeflow/kubeflow-sdk-team @kubeflow/kubeflow-trainer-team ?
Let's talk about it more at the Kubeflow SDK call: https://docs.google.com/document/d/1jH2WAX2ePxOfI4JuiVK9nPlesDMiyg67xzLwhpR7wTQ/edit?tab=t.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sameerdattav We briefly talked about it at the latest call: https://youtu.be/zwPVfB8tTuY?t=2254
Initially, let's just verify that ConfigMap with the appropriate info (e.g. kubeflow_trainer_version) exists.
In the future, we can discuss how we can verify the appropriate version in TrainerClient.
| if data and isinstance(data, dict): | ||
| server_version = data.get("kubeflow_trainer_api_version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the server_version == dev, you should skip the check.
Signed-off-by: Surya Sameer Datta Vaddadi <f20220373@goa.bits-pilani.ac.in>
Signed-off-by: Surya Sameer Datta Vaddadi <137607947+sameerdattav@users.noreply.github.com>
|
@andreyvelich, there was some issue with git and line endings, so i went ahead and opened up a new pr with a clean slate. |
Fixes #221.
Summary
Adds a strict control-plane version compatibility check for the Trainer
Kubernetes backend. The check is fail-fast, user-friendly, opt-out, and
covered by unit tests.
Approach
KubernetesBackend.__init__afterCoreV1ApicreationKUBEFLOW_SYSTEM_NAMESPACE(default:kubeflow)kubeflow-trainer-publicConfigMap (kubeflow_trainer_api_version)importlib.metadataRuntimeErroron mismatch or missing control planeKUBEFLOW_TRAINER_SKIP_VERSION_CHECKTests
pytestpasses locallyThanks @RavinduWeerakoon for the initial approach and discussion.