Skip to content

Conversation

WhitWaldo
Copy link

In light of the discussion around variations of workflow re-runs and the community ask for versioning, I wanted to introduce a new proposal that marries @JoshVanL 's Rerun from Activity proposal to a versioning migration scheme that addresses the issue I raised in one of the comments regarding base type inconsistency in running ad-hoc workflows against versioned underlying types.

I've gone through a couple iterations of this with others and this is the most recent version of the idea. Because the runtime isn't familiar with the type implementations of the workflows themselves in the application, this instead leans more on minor state changes on the workflow actor and changes to workflow resolution at the SDKs themselves than the runtime. At some point, following some feedback on this, I'd like to move ahead with a POC to prove out the concept, but I wanted to toss it out there first to see if I'm missing anything really obvious.

I've written this proposal assuming that we're wanting to move forward with both the Multi-App Workflows and the Rerun from Activity proposals to accommodate both in a way that enables all three concepts.

Thank you for the consideration!

@WhitWaldo WhitWaldo self-assigned this Apr 25, 2025
@olitomlinson
Copy link

Generally support this proposal as it seems to offer some basic but useful versioning capabilities that people can easily grok.

Great to see the actual end-user Developer experience thoughtfully considered.

@acroca
Copy link
Member

acroca commented Sep 17, 2025

Thanks for the proposal! A few thoughts and questions:

  1. The draft shows versioning being switched on via options.WithVersioning() and relying on numeric suffixes in type names. Would you consider making the capability available by default (no explicit enable step)? Also we could be explicit with the workflow name and version number in the register call in case a user needs to use different function naming conventions. Something like this:
options.RegisterWorkflow<MyWorkflow>("MyWorkflow", 1);
options.RegisterWorkflow<MyWorkflow_Final>("MyWorkflow", 2);
  1. If the initial type is just TestWorkflow (no numeric suffix), what’s the expected way to introduce a new version? Is the intended path to add a new implementation TestWorkflow2 (keeping the original available in-flight runs) and let base-name calls resolve to the highest version? In this case, TestWorkflow would be the old version and TestWorkflow2 would be the new version. Any new invocation would use always the latest version, so invocations to TestWorkflow would invoke TestWorkflow2.
  2. Given Multi-App Workflows is already merged I assume remote apps don't need to know about the versioned workflows, so what is the recommended invocation shape from another app for a versioned workflow? base name (e.g., "TestWorkflow") + appId, with the host app resolving to the highest version?. If so, is MapMultiAppTypes required?
  3. I don't understand the need of ExcludeTypes. Could you elaborate more on that?
  4. We’ll need a way to tell when a given version is safe to delete. Maybe some tooling? Or an API Endpoint?

@WhitWaldo
Copy link
Author

Thank you for reviewing my proposal and giving it some consideration. I certainly appreciate your time!

As a key differentiator to this proposal over most others provided in dapr/proposals, this is a feature I propose should be added to the SDKs and I provide a sample approach that I would follow in the .NET SDK (and eventually in the JS SDK). There are no changes needed in the runtime except to make it clear to the runtime maintainers that caution should be used when making arbitrary changes to that API (which should be limited anyway now that it's stable) because of the assumption being made by the SDKs.

To that end then, the precise particulars of how this works would be entirely up to the SDK maintainers and contributors. That said, In the effort to bring some clarity to the proposal, I've answered your questions in light of how I would build this in the .NET SDK.

Thanks for the proposal! A few thoughts and questions:

  1. The draft shows versioning being switched on via options.WithVersioning() and relying on numeric suffixes in type names. Would you consider making the capability available by default (no explicit enable step)? Also we could be explicit with the workflow name and version number in the register call in case a user needs to use different function naming conventions. Something like this:
options.RegisterWorkflow<MyWorkflow>("MyWorkflow", 1);
options.RegisterWorkflow<MyWorkflow_Final>("MyWorkflow", 2);

This is entirely up to each SDK to handle as their maintainers see fit though my guidance would be no, this should never be a default. I would consider it to be a potential breaking change (as I have no idea what anyone has named their workflows to date or if they have some sort of versioning implementation in place already).

I would, however, feel fine adding a new method for options.RegisterVersionedWorkflow<MyWorkflow>(new VersioningNumericSuffixOptions()) for type-specific version registration where the user wants to approach versioning different than opting in entirely via WithVersioning() that features different overloads to support various versioning schemes anyone might propose using. To be clear, this entirely happens at the SDK level and is simply used in the workflow processor to understand what type to route the inbound type-only requests to from the runtime.

I've modified the example in the "SDK Example" section to reflect this.

  1. If the initial type is just TestWorkflow (no numeric suffix), what’s the expected way to introduce a new version? Is the intended path to add a new implementation TestWorkflow2 (keeping the original available in-flight runs) and let base-name calls resolve to the highest version? In this case, TestWorkflow would be the old version and TestWorkflow2 would be the new version.

Because versioning requires an explicit opt-in (either at the application or per-type basis), the developer would need to add the appropriate annotation to the workflow registration accordingly. At that point, it depends on the specified strategy. If the developer has opted into the NumericalSuffix versioning strategy and has defined a TestWorkflow2, at build time, the source generator should identify all types starting with the registered TestWorkflow and register each, specifying the highest number as the version that requests should be routed to.

Any new invocation would use always the latest version, so invocations to TestWorkflow would invoke TestWorkflow2.

Exactly.

  1. Given Multi-App Workflows is already merged I assume remote apps don't need to know about the versioned workflows, so what is the recommended invocation shape from another app for a versioned workflow? base name (e.g., "TestWorkflow") + appId, with the host app resolving to the highest version?. If so, is MapMultiAppTypes required?

There's no shape necessary (or possible) as that call is made to the runtime to invoke WorkflowType on AppId. When that request is received by the app SDK, it handles versioning precisely as it was configured to do at the app startup in a way entirely invisible to the caller since none of this information is made available to the runtime and only exists in the SDK implementation.

In other words, another app would never call into TestWorkflow4 as they don't need to know about it. They would simply call into TestWorkflow on app Abc123 as they might today and if that app has opted into versioning for that type, it'll route the inbound request to the appropriate type, here, TestWorkflow4.

I've updated the whole proposal to be a little clearer on this point.

  1. I don't understand the need of ExcludeTypes. Could you elaborate more on that?

This is a helper mechanism that allows a developer to opt into versioning for all types (e.g., options.WithVersioning(), but also opt-out on a per-type basis. Even if an explicit options.RegisterVersionedType<MyWorkflow>() is specified for the type, ExcludeTypes should override and always treat the type as though it were registered only with options.RegisterWorkflow<MyWorkflow>().

  1. We’ll need a way to tell when a given version is safe to delete. Maybe some tooling? Or an API Endpoint?
    This is not possible in today's Dapr Workflow. There's no mechanism on the runtime by which we can query registered but dormant workflows (e.g. a workflow that was started in the past and is (indefinitely?) waiting for either user input or a timer), so the SDK would have no visibility into this. Today's guidance would be that all types should be persisted indefinitely and changes to the workflows should be done on the latest types so that whenever these dormant (or new) workflows do activate, the registration is live to accommodate them.

Deletion safety aside, I do think there are many excellent opportunities to use source generators and Roslyn Analyzers throughout this in the .NET implementation, but again, the availability of such tooling would vary by SDK:

  • Identify intermediate type versions that have been rendered obsolete (e.g. if the developer specifies MyWorkflow with a NumericalSuffix strategy, but has MyWorkflow4 and MyWorkflow8 defined, there's no need to retain MyWorkflow4 as it'll never be invoked - all requests would route to MyWorkflow8, so this would be an opportunity for deleting some types.
  • Call out a build warning if a developer has multiple available versions for a type, but have added it to ExcludeTypes, just so it's obvious why versioning isn't happening.

@acroca
Copy link
Member

acroca commented Sep 23, 2025

Thank you for the responses. I would like to hear your thoughts about an alternative approach to workflow registration with the goal of making simpler the transition of a workflow to another version, as well as to provide freedom of naming conventions to the SDK users.

I was thinking we could expose something along this lines:

// Existing registration function.
options.RegisterWorkflow<MyWorkflow>();

// Explicit registration function. Registers the `MyWorkflow` function as workflow with name "MyWorkflow" and version 1.
options.RegisterWorkflow<MyWorkflow>("MyWorkflow", 1);

// Registration of a workflow with two versions.
options.RegisterWorkflow<MyWorkflow>("MyWorkflow", 1);
options.RegisterWorkflow<MyWorkflow_v1>("MyWorkflow", 2);

Both workflow name and version would be optional. The name would be taken from the function name, and the version will default to 1.

This would allow the user use any naming they prefer, and to have multiple versions of the same workflow.

Even they could rename the old workflow to something like MyWorkflowOld, and then register the new workflow as MyWorkflow

In terms of implementation, I can think of three situations we need to handle:

  • A workflow is called and has no events history. We need to find the latest version of the workflow and run it. Also store the version in the events history.
  • A workflow is called and has events history, but doesn't have the version in the events history. Run the latest version of the workflow and don't change the history, we'll need to keep running this version until the workflow finishes.
  • A workflow is called and has events history, and the version is in the events history. We use the version specified in the events history.

@WhitWaldo
Copy link
Author

WhitWaldo commented Sep 23, 2025

@acroca

My key objection to your proposal is the requirement that all versioned types be registered directly with the runtime and the implication that it's strictly up to the runtime to decide which type to invoke and when.

My approach centers on the following high-level observations:

  1. I think that one of the weakest parts of how Dapr Workflows operates is the requirement that all workflows and activities are known at app startup as this precludes any opportunity for dynamically created workflows.
  2. We have a small number of dedicated maintainers/contributors working on the runtime and the amount of time they can dedicate to any given feature initially and over time is limited due to organizational priorities.
  3. Without the replay capability in Dapr Workflows (e.g., not Josh's re-run feature), I'd argue there's no need for the runtime to be involved in this whatsoever. All the types involved are strictly invoked within the context of a single app, putting it in the purview of the SDK it's connecting through.
  4. Not everyone knows Go or has the expertise to contribute to the runtime directly. By making this a feature largely implemented at the SDK level, it opens the door for meaningful community contributions in the language most familiar to the people using it.

As such, my approach goes out of its way to avoid making this require any substantial changes to the runtime. Your approach offloads versioning to the runtime.

There are precious few benefits to having the runtime manage versioning logic. The only one that comes to mind is the centralization (and thus standardization) of the capability, but building on my observations, I see several meaningful cons to this approach:

  • The runtime memory usage has only grown with every release. Is this the best of use resource budget to store and process versioning in daprd (extrapolating workflow placement to every possible version) as opposed to putting that responsibility on the app/SDK as requests come in?
  • New versioning strategies would require explicit runtime support. Due to the limited number of active and knowledgeable runtime contributors, this would effectively render minority interest indefinitely deferred (if not effectively dead-on-arrival). For example, look at the parity between supported pluggable components and the current Dapr API surface. This isn't intended as a slight against the runtime team; I'm just pointing out that they're a small group with limited time resources to keep going back to iterate on already-released capabilities. By instead putting this responsibility on the SDK maintainers/contributors, we encourage wider ideation and contribution of useful strategies, and get support for those that might be more comfortably suited to one language or another without being at the mercy of runtime support.
  • Every language is different. I think this is something that can be better solved in .NET using source generators and those aren't available to other languages, which may need to explicitly invoke a method instead or tag something with metadata. While they could all implement a uniform runtime-based approach, I think Dapr should aim to be a first-class platform for developers in their language, not be polyglot-first and experience-second.

All that to say that with regards to the situations you cited, here would be my preferred SDK-first approach:

  • A workflow is called as has no events history. Dapr calls into the app/SDK precisely as it does today and the completion event includes the name (with the version) of the workflow type used.
  • A workflow is called and has events history and hasn't been completed. Dapr calls into the app/SDK with the name/version of the workflow type assigned in the last step so it can be run on the same type (e.g. whether it's just another step of the same workflow or the app crashed and we're in a replay scenario)

But that's it - in any other situation that doesn't involve replay, Dapr simply calls the registered type as it does today an it's up to the SDK to route accordingly and respond with the name of the type it selected. This way daprd needn't know:

  • Any of the versioning strategies available on each SDK
  • What versions are available of what types
    ... and perhaps more importantly, the runtime changes are fully compatible with existing SDKs without requiring any other work. If an SDK lacks a maintainer to build this functionality, the protos needn't even change because the versioned type would never be reported (and thus never sent back in for in-process workflows).

@acroca
Copy link
Member

acroca commented Sep 24, 2025

Thanks for your response.

My approach is also SDK-only. When I talk about workflow registration, that would be at the SDK level. The SDK would get registered all the versions of a workflow, but will only register with the runtime once.

The runtime would not need to change at all, the version will be added to the durabletask-protos and, using the rules I mentioned previously, the SDK will know which implementation function to invoke (depending on the version and the history).

So, if we have the following:

options.RegisterWorkflow<MyWorkflow_v1>("MyWorkflow", 1);
options.RegisterWorkflow<MyWorkflow_v2>("MyWorkflow", 2);
options.RegisterWorkflow<MyWorkflow_v3>("MyWorkflow", 3);

Only MyWorkflow will be registered with the runtime, and the SDK will take care of the rest.

Please re-read my previous comment with this in mind :)

@WhitWaldo
Copy link
Author

Thanks for your response.

My approach is also SDK-only. When I talk about workflow registration, that would be at the SDK level. The SDK would get registered all the versions of a workflow, but will only register with the runtime once.

The runtime would not need to change at all, the version will be added to the durabletask-protos and, using the rules I mentioned previously, the SDK will know which implementation function to invoke (depending on the version and the history).

So, if we have the following:

options.RegisterWorkflow<MyWorkflow_v1>("MyWorkflow", 1);
options.RegisterWorkflow<MyWorkflow_v2>("MyWorkflow", 2);
options.RegisterWorkflow<MyWorkflow_v3>("MyWorkflow", 3);

Only MyWorkflow will be registered with the runtime, and the SDK will take care of the rest.

Please re-read my previous comment with this in mind :)

Following our sync on Discord, I think you might understand better why I'm pushing for the version I am. By using a string value when starting a new workflow (and thus receiving it from runtime for any in-progress workflow to ensure the app runs the consistent type each time under a terminal state), the logic becomes simple:

  1. If the workflow version type is not specified when a workflow is started, defer to versioning as defined in the SDK and pass that resolved name to the runtime when the workflow is started (primarily for replays).
  2. If the workflow version type is specified when a workflow is started, it means that the SDK should opt to use that specified type directly (either because it's the latest type because the workflow is just resuming following an activity or because the app crashed, a new latest type was added (or not), but we're replaying the workflow from the event source log and that requires we run the same type as it started lest we enter a corrupted state.

Please let me know if you have any other questions in light of our conversation this morning!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants