-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Evolution strategies #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evolution strategies #423
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,361 @@ | ||||||
# Evolution strategies | ||||||
|
||||||
<!-- | ||||||
Part of the Carbon Language project, under the Apache License v2.0 with LLVM | ||||||
Exceptions. See /LICENSE for license information. | ||||||
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||||||
--> | ||||||
|
||||||
[Pull request](https://github.com/carbon-language/carbon-lang/pull/423) | ||||||
|
||||||
<!-- toc --> | ||||||
|
||||||
## Table of contents | ||||||
|
||||||
- [Problem](#problem) | ||||||
- [Background](#background) | ||||||
- [Example: lexical structure](#example-lexical-structure) | ||||||
- [Example: interfaces](#example-interfaces) | ||||||
- [Proposal](#proposal) | ||||||
- [Details](#details) | ||||||
- [Strategy: point change with transparent migration](#strategy-point-change-with-transparent-migration) | ||||||
- [Summary](#summary) | ||||||
- [Details](#details-1) | ||||||
- [Timeline](#timeline) | ||||||
- [Applicability](#applicability) | ||||||
- [Advantages](#advantages) | ||||||
- [Disadvantages](#disadvantages) | ||||||
- [Example](#example) | ||||||
- [Strategy: incremental change](#strategy-incremental-change) | ||||||
- [Summary](#summary-1) | ||||||
- [Details](#details-2) | ||||||
- [Timeline](#timeline-1) | ||||||
- [Applicability](#applicability-1) | ||||||
- [Advantages](#advantages-1) | ||||||
- [Disadvantages](#disadvantages-1) | ||||||
- [Example](#example-1) | ||||||
- [Guidance](#guidance) | ||||||
- [Consequences](#consequences) | ||||||
- [Alternatives considered](#alternatives-considered) | ||||||
- [Non-strategy: simultaneous migration](#non-strategy-simultaneous-migration) | ||||||
|
||||||
<!-- tocstop --> | ||||||
|
||||||
## Problem | ||||||
|
||||||
Carbon aims to support language evolution. From the language goals: | ||||||
|
||||||
> _Support maintaining and evolving the language itself for decades._ We will | ||||||
> not get the design of most language features correct on our first, second, or | ||||||
> 73rd try. As a consequence, there must be a built-in plan and ability to move | ||||||
> Carbon forward at a reasonable pace and with a reasonable cost. | ||||||
> Simultaneously, an evolving language must not leave software behind to | ||||||
> languish, but bring software forward. This requirement should not imply | ||||||
> compatibility, but instead some migratability, likely tool-assisted. | ||||||
|
||||||
However, the specifics of how this migration will work have not been | ||||||
established, and having an idea of how evolutionary changes will be made is | ||||||
necessary in order to design the language to accommodate such changes. | ||||||
|
||||||
## Background | ||||||
|
||||||
### Example: lexical structure | ||||||
|
||||||
We expect the lexical structure of the Carbon language to change over time, in | ||||||
various ways. For example: | ||||||
|
||||||
- New kinds of tokens might be added, such as regular expression literals. | ||||||
- New tokens of existing kinds might be added, such as new keywords or new | ||||||
operators. | ||||||
- Existing character sequences might be split into tokens differently. For | ||||||
example, if a `<-` token is added, the expression `x<-3` might form a | ||||||
different token sequence. | ||||||
|
||||||
The Carbon philosophy is to evolve towards the best language Carbon can be, | ||||||
rather than compromising for compatibility, so we should assume that we will | ||||||
sometimes want to make lexical changes that affect a large amount of existing | ||||||
code. | ||||||
|
||||||
There are choices we could make now that would make anticipated lexical | ||||||
extensions easier. For example, we could require that all sequences of | ||||||
operator-like characters are always lexed as a single operator token, even if | ||||||
that token is meaningless, and that would allow us to add operators in the | ||||||
future as a non-breaking change. | ||||||
|
||||||
### Example: interfaces | ||||||
|
||||||
The set of methods on an interface should be expected to change over time. If a | ||||||
method were to be added with no evolution strategy in mind, existing | ||||||
implementations will initially not implement it, meaning they no longer conform | ||||||
to the interface; if we permit such types to conform to the interface | ||||||
regardless, then users of the interface risk calling a method that is not | ||||||
actually implemented. | ||||||
|
||||||
In order to allow for Carbon code to evolve, we need to provide a path by which | ||||||
such evolution can occur. | ||||||
|
||||||
## Proposal | ||||||
|
||||||
This proposal presents a collection of concrete strategies for making changes to | ||||||
the language and to libraries, along with basic guidance for when to use which | ||||||
strategy, and how to design language features to minimize evolutionary problems. | ||||||
The list in this proposal is not intended to be exhaustive, but is instead | ||||||
intended to provide a baseline set of approved strategies. | ||||||
|
||||||
I propose the creation of a new Principle document based on the contents of this | ||||||
document. In addition, some further minor changes to course-correct prior | ||||||
proposals are given in the [Consequences](#consequences) section below. | ||||||
|
||||||
## Details | ||||||
|
||||||
### Strategy: point change with transparent migration | ||||||
|
||||||
#### Summary | ||||||
|
||||||
- Simultaneously make a change and provide a correct and fast migration tool. | ||||||
- Builds of an un-migrated package perform a migration to temporary files and | ||||||
then build the resulting migrated package. | ||||||
- Package maintainers run the migration tool and check in the result, | ||||||
including a marker to say the package has been migrated, when they're ready. | ||||||
|
||||||
#### Details | ||||||
|
||||||
This strategy allows Carbon sources to adopt changes at their own pace, within | ||||||
reason, by permitting un-migrated and migrated source files to coexist in the | ||||||
same build. Some state would be tracked in the package configuration file(s) to | ||||||
indicate which migrations have already been performed. | ||||||
|
||||||
Because migration is performed transparently as part of a build, the toolchain | ||||||
never sees unmigrated source code; as far as it is concerned, all input source | ||||||
code is written in the latest language using the latest interfaces. | ||||||
|
||||||
As with regular build actions, migration of dependency packages can be cached, | ||||||
so the cost of performing the migration is only paid when updating the | ||||||
dependency, not on every build. | ||||||
|
||||||
#### Timeline | ||||||
|
||||||
A language change would progress as follows: | ||||||
|
||||||
- At time T-1, the Carbon toolchain does not support the new language feature, | ||||||
and Carbon packages do not indicate they have been migrated to use it. | ||||||
- At time T, the Carbon toolchain introduces support for the new feature. All | ||||||
existing code continues to build by way of an implicit auto-upgrade. | ||||||
- At time T+X, a package migrates to the new version and performs a release. | ||||||
Dependent packages continue to build with Carbon toolchains from time T | ||||||
onwards, but earlier toolchains no longer work. | ||||||
- At time T+Y, once the Carbon ecosystem has largely migrated, the Carbon | ||||||
toolchain removes the automigration support. This may be months or years | ||||||
later. | ||||||
|
||||||
Note that under this model, new features can be used as soon as they are | ||||||
implemented, but doing so imposes downstream constraints on acceptable toolchain | ||||||
versions. | ||||||
|
||||||
#### Applicability | ||||||
|
||||||
This approach is only applicable if a migration tool can be built that is both | ||||||
correct in all cases and acceptably fast. It is unlikely to be acceptable for | ||||||
the sequence of migration steps performed on a package to substantially slow | ||||||
down the build of that package. However, this will likely cover all lexical | ||||||
changes, most syntactic changes, and also many semantic changes where the old | ||||||
semantics can be recovered by different syntax. | ||||||
|
||||||
We should make the facilities of this approach available to user code, by | ||||||
allowing a package to expose automigration tools that will be transparently | ||||||
applied to its dependents. | ||||||
|
||||||
#### Advantages | ||||||
|
||||||
- New functionality can be provided and adopted with no delay. | ||||||
- The timeframe for adopting a change is very loose. | ||||||
- There is no required ordering between a package adopting a change and its | ||||||
dependents adopting the change. | ||||||
- There is no need to make language changes to prepare for this strategy, | ||||||
beyond ensuring that all existing code can be automatically migrated. | ||||||
|
||||||
#### Disadvantages | ||||||
|
||||||
- Build-time diagnostics and runtime semantics will reflect the result of the | ||||||
migration tool, which may be surprising when relating diagnostics or | ||||||
behavior back to the original source of an un-migrated package. For example, | ||||||
source snippets in diagnostics may refer to code that doesn't match the | ||||||
original source, and debug information may refer to generated files instead | ||||||
of originals. | ||||||
- Migration tools may not work correctly on invalid code, such as code under | ||||||
active development, potentially resulting in build errors that are unrelated | ||||||
to any source errors, and potentially surprising output from tooling. For | ||||||
example, after a language syntax change, an autocomplete tool may suggest | ||||||
completions using the new language syntax even when editing an unmigrated | ||||||
source file. | ||||||
- If a change is released with an incorrect migration tool, builds may break. | ||||||
This is somewhat different from the expected fragility of new compiler | ||||||
features, because unchanged code is expected to be affected more frequently. | ||||||
|
||||||
Most of the disadvantages can be mitigated by ensuring that packages under | ||||||
maintenance are migrated early. | ||||||
|
||||||
#### Example | ||||||
|
||||||
We decide that we want to replace `var type name` with `var name : type`. A | ||||||
migration tool is built to perform the refactoring, and the toolchain is updated | ||||||
to parse the new syntax instead of the old syntax. The updated toolchain and | ||||||
migration tool are released together. | ||||||
|
||||||
All subsequent builds using the new toolchain first migrate the source code to | ||||||
the new syntax, and then pass it to the new toolchain, which only understands | ||||||
the new syntax. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do you see this migration tool to be implemented and released in practice? It would need to be built on T-1 grammar and semantics. IIUC, the migration tool is separated from the compiler so that the compiler (and hence the grammar at time T) only needs to handle the new syntax. However, the compiler T needs to ship with a migration tool that understands T-1 syntax and semantics, and such handling is no longer available in current compiler libraries. So the migration tool can't use the T compiler as a library, it needs T-1. It seems to me that either the migration tool will need to be built from a different branch than the compiler, or they can be built from the same source code, but with different feature flags enabled. I think branch-based development of the migrator will be a non-starter for the Carbon toolchain development process. If we use feature flags, we could "as well" for many migrations allow the new compiler to understand both kinds of syntax (package migration flags will determine whether the old or new syntax is actually accepted). While this distinction might look like an implementation detail, I think it is user-visible, as it mitigates a number of disadvantages described above. Rust's editions are very similar to this flag-based model, for example, the RFC for Rust 2021 says:
Note that editions allow for removals following a deprecation cycle (see RFC 2052):
In a different place Nico clarifies:
I think it is very similar to our goals and our migration strategy. The only real differences I'd propose:
For migrating users of libraries over API changes we have the same issue. If libUiFramework releases v2 that requires a migration from v1, then to migrate a libCustomWidget we need libUiFramework v1 just to do semantic analysis of libCustomWidget before the migration, and v2 immediately after the migration to actually compile it. I think this is going to be difficult without a widely adopted package manager and build system; it might be more practical to see if all the necessary migration information can be included into just the libUiFramework v2. |
||||||
|
||||||
### Strategy: incremental change | ||||||
|
||||||
#### Summary | ||||||
|
||||||
- Make step-by-step progress, alternating between making a change that is | ||||||
compatible with current usage and updating current usage to avoid removed | ||||||
functionality and adopt added functionality. | ||||||
- Changes that modify the meaning of existing code may result in several such | ||||||
steps. | ||||||
|
||||||
#### Details | ||||||
|
||||||
In this approach, we avoid making backwards-incompatible changes immediately. | ||||||
Instead, every backwards-incompatible change has a transition period in which we | ||||||
expect Carbon source to be migrated. The backwards-incompatible change is then | ||||||
only made once the transition period has elapsed. | ||||||
|
||||||
We divide the change up into a sequence of steps, where each step is one of the | ||||||
following: | ||||||
|
||||||
- An _addition_, that strictly increases the set of valid input programs, | ||||||
without changing the meaning of any program already in the set. For example, | ||||||
this might include recognizing a new token that was previously invalid. | ||||||
- A _removal_, that strictly decreases the set of valid input programs, | ||||||
without changing the meaning of any program in the set. | ||||||
|
||||||
Additions are performed directly, with no transition period required. Removals | ||||||
are performed by announcing the intent to remove, introducing diagnostic | ||||||
messages for uses of functionality that is pending removal, producing tools to | ||||||
transition uses of the removed functionality, and then after a suitable | ||||||
transition time, performing the removal. | ||||||
|
||||||
In order to navigate from the current state to the desired end state by a | ||||||
sequence of additions and removals, intermediate scaffolding functionality that | ||||||
is present in neither state may be necessary. For example, when changing the | ||||||
meaning of a function parameter, it may be necessary to temporarily add a | ||||||
scaffolding function with a new name, migrate some or all existing callers to | ||||||
the new function, change the original function, and then migrate back. | ||||||
|
||||||
#### Timeline | ||||||
|
||||||
A language addition would progress as follows: | ||||||
|
||||||
- At time T-1, the Carbon toolchain does not support the new language feature. | ||||||
- At time T, the Carbon toolchain supports the new feature, and source code | ||||||
can start to use it. | ||||||
|
||||||
Library additions would follow a similar path, with the change being made in the | ||||||
library rather than in the toolchain. | ||||||
|
||||||
Use of an added feature imposes a version constraint: once a package uses a | ||||||
feature, anyone compiling it or its dependents would need a suitably recent | ||||||
version of the toolchain or the package introducing the change. | ||||||
|
||||||
A language removal would progress as follows: | ||||||
|
||||||
- At time T, the intent to remove the feature is announced, and the Carbon | ||||||
toolchain starts producing warnings when encountering uses of the feature. | ||||||
Over subsequent releases, the severity of these warnings increases. | ||||||
- At time T+K, the feature is removed from the Carbon toolchain. | ||||||
|
||||||
Library removals would follow a similar path, with the change being made in the | ||||||
library rather than in the toolchain; this necessitates there being a mechanism | ||||||
by which library authors can request diagnostics for use of certain | ||||||
functionality. | ||||||
|
||||||
Note that removals take time before they become active under this model. If we | ||||||
can anticipate such changes and prepare for them, we can in some cases avoid the | ||||||
need for the first step and the scaffolding feature. | ||||||
|
||||||
#### Applicability | ||||||
|
||||||
This approach is applicable to most -- or perhaps all -- changes, but may | ||||||
require multiple steps for certain kinds of change, requiring a long time for a | ||||||
migration to complete. | ||||||
|
||||||
#### Advantages | ||||||
|
||||||
- At every stage, all source code across all packages is written using the | ||||||
same language rules and the same library interfaces. | ||||||
- The code being built and run is exactly the code in the source files. | ||||||
- This strategy has wide applicability. | ||||||
|
||||||
#### Disadvantages | ||||||
|
||||||
- Changes in which a removal must complete before some addition is performed | ||||||
can potentially take a long time when following this strategy. | ||||||
- Introducing and removing scaffolding requires additional work that is not | ||||||
fundamental to the change being made. | ||||||
- Manual migration will be required in some cases. | ||||||
|
||||||
#### Example | ||||||
|
||||||
In order to support changes to an interface, we allow newly-added methods to be | ||||||
marked as `upcoming`. This indicates that the method is not required, and indeed | ||||||
cannot be called (except by other `upcoming` functionality), but can be | ||||||
implemented. Then the addition of an interface method can be staged as follows: | ||||||
Comment on lines
+302
to
+305
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this about a change to the language, or a feature to allow evolution of user-defined interfaces? It feels like the document is mostly talking about the former, but this seems to be about the latter. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My primary focus when writing this document was about evolving the language and its standard library, but my intent was to cover both that and the needs of people evolving non-leaf packages implemented in Carbon. That said, I'd expect that things that people evolving Carbon software need are also things that we need to evolve the standard library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
- A method is introduced, declared `upcoming`. This is an addition, as | ||||||
strictly more programs become valid. | ||||||
- The intent to remove the `upcoming` marker is announced -- in this case, | ||||||
implicitly, as all `upcoming` markers indicate an intent to remove the | ||||||
marker. The removal period for this `upcoming` marker begins. | ||||||
- Over time, the method is implemented by all implementers of the interface. | ||||||
- The `upcoming` marker is removed. This is a removal, as it results in | ||||||
strictly fewer programs being valid. | ||||||
- Once the removal is complete, the new method can be used. This is an | ||||||
addition, that in this instance occurs concurrently with the completion of | ||||||
the removal phase and the removal of the `upcoming` marker. | ||||||
Comment on lines
+307
to
+317
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This example doesn't use a default implementation of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there is a correct and generally-applicable default, the transition can be done as a point change, or perhaps even as a pure addition. I'm happy to switch to a different example; this one might be unhelpful by being similar to something we've been considering but with somewhat different details. |
||||||
|
||||||
### Guidance | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Structural comment, not necessary to address in this proposal, but from a BLUF writing perspective the guidance feels like the bottom-line of this proposal, and thus how it should begin, rather than at the tail end. |
||||||
|
||||||
The primary driver for any change should be the intended end state. While the | ||||||
migration path to a goal should be a consideration, and may sway our decision | ||||||
between options that otherwise provide similar value, we should prefer using | ||||||
more expensive migration strategies over selecting an inferior end state. | ||||||
|
||||||
When a choice of strategies is available, purely additive changes should be | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know that I believe that purely additive changes should be preferred. There is value in having a small core There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My intent is that this only applies subject to the "primary driver is the intended end state" above: only when the choices are largely equal on other merits should we consider this factor. I think I can express that more clearly. |
||||||
preferred over point changes, and point changes should be preferred over | ||||||
incremental changes. | ||||||
|
||||||
Language features should, where possible, be designed to reduce the necessity of | ||||||
incremental changes for anticipated future evolution. For example, if the | ||||||
spelling of an identifier is visible through reflection, then adding a keyword | ||||||
may require use of the incremental strategy to rename existing uses, as a | ||||||
fully-correct migration tool can't be built in general. However, if a raw | ||||||
identifier syntax is introduced, then the same change can be a point change, | ||||||
where the migration tool replaces all existing uses with semantically-identical | ||||||
raw identifiers. | ||||||
|
||||||
### Consequences | ||||||
|
||||||
We anticipate that all lexical changes can be accommodated by the point change | ||||||
strategy. Therefore there is no requirement to reserve any lexical space to | ||||||
prepare for future changes. | ||||||
|
||||||
Therefore, we will no longer require whitespace after the `//` introducing a | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we were to add such an operator, we could migrate all existing uses of I think, broadly, if we can model an anticipated direction of evolution as point changes, we shouldn't try to guess what changes we'll want to make, because the cost of making those changes is sufficiently small. (For example, let's not proactively reserve a bunch of words that we think might be keywords, if we think the cost of reclaiming an identifier as a keyword is small.) If, on the other hand, an anticipated direction of evolution would require an incremental migration in response to changes, then we should be thinking about how to make such future changes easier. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, I think I may believe a little more than you that we should encourage reserving lexical space so that more future changes can be purely additive changes, even if we choose not to pursue them. Right now it may not be worth guessing what changes we'll want to make -- Carbon is small, and if we added a token So yeah, right now, don't reserve There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm. I think I'd got too anchored to point changes being substantially cheaper than incremental changes and I'd lost sight of additive changes being substantially cheaper than point changes (a point change still churns the entire Carbon ecosystem as the migration tool is applied, in addition to the disadvantages listed in this proposal, whereas an additive change does not). Reserving lexical space to turn point changes into additive changes makes a lot of sense to me, but I agree that we don't need to do so now. |
||||||
comment, nor will we disallow decimal digits to follow a `\0` escape sequence. | ||||||
|
||||||
This strategy also subsumes the approach described in | ||||||
[proposal 93](https://github.com/carbon-language/carbon-lang/pull/93), with | ||||||
package-wide migration instead of file-at-a-time migration, leaving only the | ||||||
addition of raw identifier syntax, which is still justified both as a vehicle | ||||||
for ensuring that correct migration is always possible and that identifiers that | ||||||
are keywords in Carbon but not keywords in C++ can be expressed. | ||||||
|
||||||
## Alternatives considered | ||||||
|
||||||
### Non-strategy: simultaneous migration | ||||||
|
||||||
A number of strategies that require making simultaneous changes to multiple | ||||||
packages, or to the toolchain and third-party packages, are possible. We | ||||||
consider such strategies to be untenable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowing users access to writing these tools runs the significant risk of them producing versions that are not sufficiently correct
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. How much should we be worried about that? In some sense, this is "your dependencies can release a new version that breaks you", which I think will be the case regardless, but it does seem like the problem has a different character given that they can break anything in the dependent projects by performing completely arbitrary rewrites.
There's also the aspect that people will be building, executing, and deploying code that literally no-one has ever code-reviewed. I don't think that's abnormal, either -- there are lots of systems that generate code that (most of the time) no-one and nothing looks at the output of other than a compiler -- but again this would be happening at a larger scale than is common.