diff --git a/proposals/README.md b/proposals/README.md index 9ab09c21dd6a3..02d036ebe5d2d 100644 --- a/proposals/README.md +++ b/proposals/README.md @@ -66,5 +66,6 @@ request: - [0253 - 2021 Roadmap](p0253.md) - [0253 - Decision](p0253_decision.md) - [0285 - if/else](p0285.md) +- [0423 - Evolution strategies](p0423.md) diff --git a/proposals/p0423.md b/proposals/p0423.md new file mode 100644 index 0000000000000..cfd6a112a5156 --- /dev/null +++ b/proposals/p0423.md @@ -0,0 +1,361 @@ +# Evolution strategies + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/423) + + + +## Table of contents + +- [Problem](#problem) +- [Background](#background) + - [Example: lexical structure](#example-lexical-structure) + - [Example: interfaces](#example-interfaces) +- [Proposal](#proposal) +- [Details](#details) + - [Strategy: point change with transparent migration](#strategy-point-change-with-transparent-migration) + - [Summary](#summary) + - [Details](#details-1) + - [Timeline](#timeline) + - [Applicability](#applicability) + - [Advantages](#advantages) + - [Disadvantages](#disadvantages) + - [Example](#example) + - [Strategy: incremental change](#strategy-incremental-change) + - [Summary](#summary-1) + - [Details](#details-2) + - [Timeline](#timeline-1) + - [Applicability](#applicability-1) + - [Advantages](#advantages-1) + - [Disadvantages](#disadvantages-1) + - [Example](#example-1) + - [Guidance](#guidance) + - [Consequences](#consequences) +- [Alternatives considered](#alternatives-considered) + - [Non-strategy: simultaneous migration](#non-strategy-simultaneous-migration) + + + +## Problem + +Carbon aims to support language evolution. From the language goals: + +> _Support maintaining and evolving the language itself for decades._ We will +> not get the design of most language features correct on our first, second, or +> 73rd try. As a consequence, there must be a built-in plan and ability to move +> Carbon forward at a reasonable pace and with a reasonable cost. +> Simultaneously, an evolving language must not leave software behind to +> languish, but bring software forward. This requirement should not imply +> compatibility, but instead some migratability, likely tool-assisted. + +However, the specifics of how this migration will work have not been +established, and having an idea of how evolutionary changes will be made is +necessary in order to design the language to accommodate such changes. + +## Background + +### Example: lexical structure + +We expect the lexical structure of the Carbon language to change over time, in +various ways. For example: + +- New kinds of tokens might be added, such as regular expression literals. +- New tokens of existing kinds might be added, such as new keywords or new + operators. +- Existing character sequences might be split into tokens differently. For + example, if a `<-` token is added, the expression `x<-3` might form a + different token sequence. + +The Carbon philosophy is to evolve towards the best language Carbon can be, +rather than compromising for compatibility, so we should assume that we will +sometimes want to make lexical changes that affect a large amount of existing +code. + +There are choices we could make now that would make anticipated lexical +extensions easier. For example, we could require that all sequences of +operator-like characters are always lexed as a single operator token, even if +that token is meaningless, and that would allow us to add operators in the +future as a non-breaking change. + +### Example: interfaces + +The set of methods on an interface should be expected to change over time. If a +method were to be added with no evolution strategy in mind, existing +implementations will initially not implement it, meaning they no longer conform +to the interface; if we permit such types to conform to the interface +regardless, then users of the interface risk calling a method that is not +actually implemented. + +In order to allow for Carbon code to evolve, we need to provide a path by which +such evolution can occur. + +## Proposal + +This proposal presents a collection of concrete strategies for making changes to +the language and to libraries, along with basic guidance for when to use which +strategy, and how to design language features to minimize evolutionary problems. +The list in this proposal is not intended to be exhaustive, but is instead +intended to provide a baseline set of approved strategies. + +I propose the creation of a new Principle document based on the contents of this +document. In addition, some further minor changes to course-correct prior +proposals are given in the [Consequences](#consequences) section below. + +## Details + +### Strategy: point change with transparent migration + +#### Summary + +- Simultaneously make a change and provide a correct and fast migration tool. +- Builds of an un-migrated package perform a migration to temporary files and + then build the resulting migrated package. +- Package maintainers run the migration tool and check in the result, + including a marker to say the package has been migrated, when they're ready. + +#### Details + +This strategy allows Carbon sources to adopt changes at their own pace, within +reason, by permitting un-migrated and migrated source files to coexist in the +same build. Some state would be tracked in the package configuration file(s) to +indicate which migrations have already been performed. + +Because migration is performed transparently as part of a build, the toolchain +never sees unmigrated source code; as far as it is concerned, all input source +code is written in the latest language using the latest interfaces. + +As with regular build actions, migration of dependency packages can be cached, +so the cost of performing the migration is only paid when updating the +dependency, not on every build. + +#### Timeline + +A language change would progress as follows: + +- At time T-1, the Carbon toolchain does not support the new language feature, + and Carbon packages do not indicate they have been migrated to use it. +- At time T, the Carbon toolchain introduces support for the new feature. All + existing code continues to build by way of an implicit auto-upgrade. +- At time T+X, a package migrates to the new version and performs a release. + Dependent packages continue to build with Carbon toolchains from time T + onwards, but earlier toolchains no longer work. +- At time T+Y, once the Carbon ecosystem has largely migrated, the Carbon + toolchain removes the automigration support. This may be months or years + later. + +Note that under this model, new features can be used as soon as they are +implemented, but doing so imposes downstream constraints on acceptable toolchain +versions. + +#### Applicability + +This approach is only applicable if a migration tool can be built that is both +correct in all cases and acceptably fast. It is unlikely to be acceptable for +the sequence of migration steps performed on a package to substantially slow +down the build of that package. However, this will likely cover all lexical +changes, most syntactic changes, and also many semantic changes where the old +semantics can be recovered by different syntax. + +We should make the facilities of this approach available to user code, by +allowing a package to expose automigration tools that will be transparently +applied to its dependents. + +#### Advantages + +- New functionality can be provided and adopted with no delay. +- The timeframe for adopting a change is very loose. +- There is no required ordering between a package adopting a change and its + dependents adopting the change. +- There is no need to make language changes to prepare for this strategy, + beyond ensuring that all existing code can be automatically migrated. + +#### Disadvantages + +- Build-time diagnostics and runtime semantics will reflect the result of the + migration tool, which may be surprising when relating diagnostics or + behavior back to the original source of an un-migrated package. For example, + source snippets in diagnostics may refer to code that doesn't match the + original source, and debug information may refer to generated files instead + of originals. +- Migration tools may not work correctly on invalid code, such as code under + active development, potentially resulting in build errors that are unrelated + to any source errors, and potentially surprising output from tooling. For + example, after a language syntax change, an autocomplete tool may suggest + completions using the new language syntax even when editing an unmigrated + source file. +- If a change is released with an incorrect migration tool, builds may break. + This is somewhat different from the expected fragility of new compiler + features, because unchanged code is expected to be affected more frequently. + +Most of the disadvantages can be mitigated by ensuring that packages under +maintenance are migrated early. + +#### Example + +We decide that we want to replace `var type name` with `var name : type`. A +migration tool is built to perform the refactoring, and the toolchain is updated +to parse the new syntax instead of the old syntax. The updated toolchain and +migration tool are released together. + +All subsequent builds using the new toolchain first migrate the source code to +the new syntax, and then pass it to the new toolchain, which only understands +the new syntax. + +### Strategy: incremental change + +#### Summary + +- Make step-by-step progress, alternating between making a change that is + compatible with current usage and updating current usage to avoid removed + functionality and adopt added functionality. +- Changes that modify the meaning of existing code may result in several such + steps. + +#### Details + +In this approach, we avoid making backwards-incompatible changes immediately. +Instead, every backwards-incompatible change has a transition period in which we +expect Carbon source to be migrated. The backwards-incompatible change is then +only made once the transition period has elapsed. + +We divide the change up into a sequence of steps, where each step is one of the +following: + +- An _addition_, that strictly increases the set of valid input programs, + without changing the meaning of any program already in the set. For example, + this might include recognizing a new token that was previously invalid. +- A _removal_, that strictly decreases the set of valid input programs, + without changing the meaning of any program in the set. + +Additions are performed directly, with no transition period required. Removals +are performed by announcing the intent to remove, introducing diagnostic +messages for uses of functionality that is pending removal, producing tools to +transition uses of the removed functionality, and then after a suitable +transition time, performing the removal. + +In order to navigate from the current state to the desired end state by a +sequence of additions and removals, intermediate scaffolding functionality that +is present in neither state may be necessary. For example, when changing the +meaning of a function parameter, it may be necessary to temporarily add a +scaffolding function with a new name, migrate some or all existing callers to +the new function, change the original function, and then migrate back. + +#### Timeline + +A language addition would progress as follows: + +- At time T-1, the Carbon toolchain does not support the new language feature. +- At time T, the Carbon toolchain supports the new feature, and source code + can start to use it. + +Library additions would follow a similar path, with the change being made in the +library rather than in the toolchain. + +Use of an added feature imposes a version constraint: once a package uses a +feature, anyone compiling it or its dependents would need a suitably recent +version of the toolchain or the package introducing the change. + +A language removal would progress as follows: + +- At time T, the intent to remove the feature is announced, and the Carbon + toolchain starts producing warnings when encountering uses of the feature. + Over subsequent releases, the severity of these warnings increases. +- At time T+K, the feature is removed from the Carbon toolchain. + +Library removals would follow a similar path, with the change being made in the +library rather than in the toolchain; this necessitates there being a mechanism +by which library authors can request diagnostics for use of certain +functionality. + +Note that removals take time before they become active under this model. If we +can anticipate such changes and prepare for them, we can in some cases avoid the +need for the first step and the scaffolding feature. + +#### Applicability + +This approach is applicable to most -- or perhaps all -- changes, but may +require multiple steps for certain kinds of change, requiring a long time for a +migration to complete. + +#### Advantages + +- At every stage, all source code across all packages is written using the + same language rules and the same library interfaces. +- The code being built and run is exactly the code in the source files. +- This strategy has wide applicability. + +#### Disadvantages + +- Changes in which a removal must complete before some addition is performed + can potentially take a long time when following this strategy. +- Introducing and removing scaffolding requires additional work that is not + fundamental to the change being made. +- Manual migration will be required in some cases. + +#### Example + +In order to support changes to an interface, we allow newly-added methods to be +marked as `upcoming`. This indicates that the method is not required, and indeed +cannot be called (except by other `upcoming` functionality), but can be +implemented. Then the addition of an interface method can be staged as follows: + +- A method is introduced, declared `upcoming`. This is an addition, as + strictly more programs become valid. +- The intent to remove the `upcoming` marker is announced -- in this case, + implicitly, as all `upcoming` markers indicate an intent to remove the + marker. The removal period for this `upcoming` marker begins. +- Over time, the method is implemented by all implementers of the interface. +- The `upcoming` marker is removed. This is a removal, as it results in + strictly fewer programs being valid. +- Once the removal is complete, the new method can be used. This is an + addition, that in this instance occurs concurrently with the completion of + the removal phase and the removal of the `upcoming` marker. + +### Guidance + +The primary driver for any change should be the intended end state. While the +migration path to a goal should be a consideration, and may sway our decision +between options that otherwise provide similar value, we should prefer using +more expensive migration strategies over selecting an inferior end state. + +When a choice of strategies is available, purely additive changes should be +preferred over point changes, and point changes should be preferred over +incremental changes. + +Language features should, where possible, be designed to reduce the necessity of +incremental changes for anticipated future evolution. For example, if the +spelling of an identifier is visible through reflection, then adding a keyword +may require use of the incremental strategy to rename existing uses, as a +fully-correct migration tool can't be built in general. However, if a raw +identifier syntax is introduced, then the same change can be a point change, +where the migration tool replaces all existing uses with semantically-identical +raw identifiers. + +### Consequences + +We anticipate that all lexical changes can be accommodated by the point change +strategy. Therefore there is no requirement to reserve any lexical space to +prepare for future changes. + +Therefore, we will no longer require whitespace after the `//` introducing a +comment, nor will we disallow decimal digits to follow a `\0` escape sequence. + +This strategy also subsumes the approach described in +[proposal 93](https://github.com/carbon-language/carbon-lang/pull/93), with +package-wide migration instead of file-at-a-time migration, leaving only the +addition of raw identifier syntax, which is still justified both as a vehicle +for ensuring that correct migration is always possible and that identifiers that +are keywords in Carbon but not keywords in C++ can be expressed. + +## Alternatives considered + +### Non-strategy: simultaneous migration + +A number of strategies that require making simultaneous changes to multiple +packages, or to the toolchain and third-party packages, are possible. We +consider such strategies to be untenable.