-
Notifications
You must be signed in to change notification settings - Fork 80
Add data plane fields w/ renamed Worker Versioning concepts #514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General concerns about deprecating so many things a few weeks after merging it. I just want to make sure we understand the consequences of ever removing API.
But only change requested is to add deprecated options.
| // The deployment that completed this task. May or may not be set for unversioned workers, | ||
| // depending on whether a value is sent by the SDK. This value updates workflow execution's | ||
| // `versioning_info.deployment`. | ||
| // Deprecated. Replaced with `deployment_name` and `deployment_version`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit I am a bit concerned about the idea of removing this later. This was just merged a couple months ago and now it's being deprecated.
It is very important we try our best not to make backwards incompatible changes to released API, even if we think it is unused. Customers build proxies and have code generators and in general have stability expectations on our API. We have broken users in the past by making incompatible changes to API (again, even if unused by our SDKs as our SDKs are not the only ones that expect API stability). This strive for API stability includes pre-release/public-preview API because unlike SDKs, we cannot make assumptions about downstream consumption.
Also, if this field is ever set, removing this will make history JSON incompatible and we strive to make history always backwards (and forwards) compatible even pre-release/public-preview histories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised by this take because our Temporal product release stages guide guidance on API stability is "Experimental; API is subject to change" for Pre-release and "API breaking changes are kept to a minimum" for Public Preview (and then "API is stable" for GA). It seems appropriate for all parties to refer to that as their source of truth for how to plan around API changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a difference between SDK API contract stability, API behavior stability, and proto API contract stability. They are all different forms of API stability. Regardless of what we're technically allowed to do, we should always try for stability at every level. And we should be aware of the consequences of breaking compatibility to our protos, even if we technically say we're allowed to do so. You can hurt proxy users and general API proto consumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're intentionally taking an iterative approach to worker versioning. Insisting that there are no breaking changes for pre-releases ever is a recipe for stasis and overthinking because every decision will seem to carry years of weight, even at the pre-release stage.
I'm okay with very cheap mechanisms to ensure backward compatibility here but overall it's a non-goal. Happy to chat offline to sort through any strategic misalignments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just bringing up that there are consequences for removing something in the public API and we should strive to avoid this, especially in such short turnarounds. Sometimes for others making API changes to these protos, the consequences are non-obvious until complaints come about. We have had issues in the past where people treat master as their implementation playground to make changes a couple months apart on the same project. Granted this PR doesn't remove anything anyways, so just making sure the consequences are understood (I was waiting for the other comment to be addressed before approving).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not a world where it churn never happens (but there is a never-remove-only-deprecate world, but even that is too strict).
Let's just make sure we call it out clearly in the API release notes if/when we remove the fields, and let's also make sure that we have a Go SDK release at about the same time as that API release so at least our latest Go SDK works with the latest Go API (which is released at the same time as this repo). We have had issues in the past where compat-breaking changes to this repo were released in api-go with no associated Go SDK release causing those doing a go get -u to have compile breaks.
I am not sure we can ever remove history fields, we should have an internal discussion on that. Some people get workflow JSON including unset fields (unfortunately) and expect that JSON to work with replayers in perpetuity. But it may be safe in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also make sure that we have a Go SDK release at about the same time as that API release so at least our latest Go SDK works with the latest Go API
@cretz to clarify, you mean the release in which the deprecated fields are removed, not this one because it is not breaking anything, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just make sure we call it out clearly in the API release notes if/when we remove the fields
For if/when I'd like to wait until we're close to GA. We can then add a heads up to one of the release notes and specify a plan, and when the specified time comes after that, delete the fields with proper release note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-sdk users that may use automated tools to generate proxies, UI interfaces that interact directly with the API
Doing this on pre-release APIs in such a way that they cannot change should not be a supported scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify, you mean the release in which the deprecated fields are removed, not this one because it is not breaking anything, right?
Correct
Doing this on pre-release APIs in such a way that they cannot change should not be a supported scenario.
Automated protoc does not differentiate unstable API and stable API right next to each other in a stable version release. But the other issue is that we are using these to-be-deleted APIs in our SDKs. So even for people that don't care about versioning, once we remove the fields, if they upgrade go.temporal.io/api to latest but choose not to upgrade their Go SDK (which they can totally do with our version constraints), all of their compilations start breaking. So we just tell them "sorry, even though that SDK says it's compatible with API >= some-version, it's not" and ask them to upgrade both SDK and API.
| // The deployment that completed this task. May or may not be set for unversioned workers, | ||
| // depending on whether a value is sent by the SDK. This value updates workflow execution's | ||
| // `versioning_info.deployment`. | ||
| // Deprecated. Replaced with `deployment_name` and `deployment_version`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add [deprecated = true]; to all deprecated fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
carlydf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved with a few wording suggestions
| // execution. Must be present if `behavior` is set. Absent value means no workflow task is | ||
| // completed, or the last workflow task was completed by a legacy worker who does not pass | ||
| // deployment_version. | ||
| // Note that if `versioning_override.behavior`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to write more information here?
Co-authored-by: Carly de Frondeville <[email protected]>
Co-authored-by: Carly de Frondeville <[email protected]>
Co-authored-by: Carly de Frondeville <[email protected]>
antlai-temporal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before we had a temporal.api.deployment.v1.Deployment struct that encapsulated the name and the "version".
Now we are splitting in two separate fields deployment_name/deployment_version and we are inconsistent when we pass one or both, i.e., making assumptions that version is all it matters in some places.
I think we should use a DeploymentVersion object (or something like that) that always passes both, because in the future we may decide that the deployment_name is also important for routing too...
temporal/api/enums/v1/workflow.proto
Outdated
| // Version, in which case the activity will be sent to a different Deployment Version according | ||
| // to the activity's Task Queue situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too vague, the previous comment talked about the "current deployment", giving a hint how the Deployment version is chosen in that case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| // Note that if `versioning_override.behavior`. | ||
| string deployment_version = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is incomplete
| // The target version of the transition. Absent means a so-far-versioned workflow is | ||
| // transitioning to unversioned workers. | ||
| string target_version = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the deployment_name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we allow transitions between deployments, which it sounds like we do:
// When present, indicates the workflow is transitioning to a different deployment version
// (which may belong to the same deployment name or another).
Then it seems like we should have deployment_name there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since versions are unique within a namespace, I don't think specifying the deployment_name becomes necessary - I also am not sure if we are allowing deployment transitions to happen across namespaces but if we are not, I think having just the target_version should suffice
happy to be pointed in the right direction if my understanding is wrong!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops I forgot that version is unique within a namespace, nevermind what I said
| string pinned_version = 3; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the deployment_name here, and not just the version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only thought of use cases where someone would want to override the version, not the deployment, but I think if we only allow overriding the version within the same deployment (as we do here), then the comment above VersioningOverride needs to reflect that, so I suggested a change there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only allow overriding the version within the same deployment
@carlydf Is this what we want though? I was thinking we'd allow to pin to any version in case users want to move pinned workflows to other deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that user only passes the version and that's enough because server can lookup deployment name based on the version. IMHO, It'd be additional friction and more room for error if user has to pass both values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think moving to other deployments is good! I forgot that version was unique within a namespace and was just trying to think flexibly
| // `WorkerDeploymentOptions`. | ||
| string deployment_version = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the deployment_name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't strictly need it but now I'm passing a struct containing both version and name.
| // Deprecated. | ||
| temporal.api.deployment.v1.Deployment deployment = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if for future-proofing we should add deployment_options on the activity task responses/failures even if
the current server does not use it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had this since V1 always thinking "it might become useful" but it only added churn to the APIs during the iterations. I think if want to add them in future for a known purpose we'd then add them to the next SDK release. Better not to be speculative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added depoyment_version now.
| temporal.api.enums.v1.VersioningBehavior behavior = 1; | ||
|
|
||
| // Required if behavior is `PINNED`. Must be null if behavior is `AUTO_UPGRADE`. | ||
| // Identifies the worker deployment to pin the workflow to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Identifies the worker deployment to pin the workflow to. | |
| // Identifies the worker deployment version to pin the workflow to. |
|
Can we wait on the implementation being basically complete and nearer to a server release (even RC/cloud release) before merging this to |
Shivs11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no blockers - just some nit comments
| // Worker Deployment options set in SDK that need to be sent to server in every poll. | ||
| message WorkerDeploymentOptions { | ||
| // Required. Worker deployment name. | ||
| string name = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we call this deployment_name? Might be too pedantic, I fear, but I like the extra clarity it brings - just my two cents!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the host message name makes it clear: WorkerDeploymentOptions.
| // Deployment Versions with this mode can be set as the Current or Ramping | ||
| // Version of their Deployment and hence are the best option for Blue/Green | ||
| // and Rainbow strategies (but typically not suitable for Rolling upgrades.) | ||
| WORKFLOW_VERSIONING_MODE_VERSIONING_BEHAVIORS = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of WORKFLOW_VERSIONING_MODE_UNVERSIONED, the comments speak about the use-case of users requiring to use patching to keep things non-breaking. For long running unpinned workflows going through changes, users still are required to use patching to keep things non-breaking. The lack of a mention of patching requirements for WORKFLOW_VERSIONING_MODE_VERSIONING_BEHAVIORS almost made me think if I don't need to patch my workflows if I use this behaviour - Should we mention the fact that patching is still required for AutoUpgrade long running wfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point I clarified
temporal/api/enums/v1/workflow.proto
Outdated
| // execution to another Deployment Version. | ||
| // Activities of `PINNED` workflows are sent to the same Deployment Version. Exception to this | ||
| // would be when the activity Task Queue workers are not present in the workflow's Deployment | ||
| // Version, in which case the activity will be sent to a different Deployment Version according |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in which case the activity will be sent to the Deployment Version containing the activity's task-queues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
temporal/api/enums/v1/workflow.proto
Outdated
| // present in the workflow's Deployment Version, in which case, the activity will be sent to a | ||
| // different Deployment Version according to the activity's Task Queue situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in which case the activity will be sent to the Deployment Version containing the activity's task-queues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
| // The target version of the transition. Absent means a so-far-versioned workflow is | ||
| // transitioning to unversioned workers. | ||
| string target_version = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since versions are unique within a namespace, I don't think specifying the deployment_name becomes necessary - I also am not sure if we are allowing deployment transitions to happen across namespaces but if we are not, I think having just the target_version should suffice
happy to be pointed in the right direction if my understanding is wrong!
|
@cretz OK, I changed the base to a feature branch for now. |
|
@antlai-temporal I added the WorkerDeploymentVersion struct and addressed you comments. |
…PI's (#515) _**READ BEFORE MERGING:** All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted._ <!-- Describe what has changed in this PR --> **What changed?** - Updated protos to reflect changes based on the recent conclusion regarding naming conventions for versioning-3 - `Deployment` -> `WorkerDeploymentVersion` - Added comments to remove pre-release versioning-3 code. Note: This API does not update pre-release versioning-3 changes related to dataplane. That currently is being worked in separate [PR](#514). The aim of this PR is to update those protos which are required by the deployment entity workflows so that work on those can commence there ASAP. Additionally, the updates to the proto are not finalized and will bring in more additions. Right now, they contain the bare minimum fields/structs required to make the deployment entity workflows run. With that being said, care has been taken to not add fields here and delete them in the near future. In other words, future changes will only bring in additions to these new protos. <!-- Tell your future self why have you made these changes --> **Why?** - Versioning-3 <!-- Are there any breaking changes on binary or code level? --> **Breaking changes** <!-- If this breaks the Server, please provide the Server PR to merge right after this PR was merged. --> **Server PR** - The Server PR is currently being built. Note, the server PR will only contain renamings (to agree with the namings of this PR) so I don't think that should serve as a blocker for reviewing this. --------- Co-authored-by: Shahab Tajik <[email protected]>
…oyment-version # Conflicts: # openapi/openapiv2.json # openapi/openapiv3.yaml
READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.
What changed?
The following concepts in Worker Versioning V3 is being renamed. This PR adds the data plane fields with new names so SDK can populate them like the old ones.
DeploymentSeries->WorkerDeploymentDeployment->WorkerDeploymentVersionAlso
WorkerVersionCapabilitiesand itsuse_versioningboolean are replaced withWorkerDeploymentOptionsandWorkflowVersioningModeenum to be more compatible with V3 changes and also extensible in the future.RespondActivityTask*Requestused to senddeploymentbut that is now deprecated. Since Server does not have any anticipated use for these values, I did not add a replacement for them at the moment.Why?
See ^^
Breaking changes
Not breaking, some things are deprecated that will be cleaned up at a later time with other old Versioning stuff.
Server PR
Server PR will follow soon. But these changes should not break server compile because methods are not touched.