Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions proposals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,5 +66,6 @@ request:
- [0253 - 2021 Roadmap](p0253.md)
- [0253 - Decision](p0253_decision.md)
- [0285 - if/else](p0285.md)
- [0423 - Evolution strategies](p0423.md)

<!-- endproposals -->
361 changes: 361 additions & 0 deletions proposals/p0423.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,361 @@
# Evolution strategies

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/423)

<!-- toc -->

## Table of contents

- [Problem](#problem)
- [Background](#background)
- [Example: lexical structure](#example-lexical-structure)
- [Example: interfaces](#example-interfaces)
- [Proposal](#proposal)
- [Details](#details)
- [Strategy: point change with transparent migration](#strategy-point-change-with-transparent-migration)
- [Summary](#summary)
- [Details](#details-1)
- [Timeline](#timeline)
- [Applicability](#applicability)
- [Advantages](#advantages)
- [Disadvantages](#disadvantages)
- [Example](#example)
- [Strategy: incremental change](#strategy-incremental-change)
- [Summary](#summary-1)
- [Details](#details-2)
- [Timeline](#timeline-1)
- [Applicability](#applicability-1)
- [Advantages](#advantages-1)
- [Disadvantages](#disadvantages-1)
- [Example](#example-1)
- [Guidance](#guidance)
- [Consequences](#consequences)
- [Alternatives considered](#alternatives-considered)
- [Non-strategy: simultaneous migration](#non-strategy-simultaneous-migration)

<!-- tocstop -->

## Problem

Carbon aims to support language evolution. From the language goals:

> _Support maintaining and evolving the language itself for decades._ We will
> not get the design of most language features correct on our first, second, or
> 73rd try. As a consequence, there must be a built-in plan and ability to move
> Carbon forward at a reasonable pace and with a reasonable cost.
> Simultaneously, an evolving language must not leave software behind to
> languish, but bring software forward. This requirement should not imply
> compatibility, but instead some migratability, likely tool-assisted.

However, the specifics of how this migration will work have not been
established, and having an idea of how evolutionary changes will be made is
necessary in order to design the language to accommodate such changes.

## Background

### Example: lexical structure

We expect the lexical structure of the Carbon language to change over time, in
various ways. For example:

- New kinds of tokens might be added, such as regular expression literals.
- New tokens of existing kinds might be added, such as new keywords or new
operators.
- Existing character sequences might be split into tokens differently. For
example, if a `<-` token is added, the expression `x<-3` might form a
different token sequence.

The Carbon philosophy is to evolve towards the best language Carbon can be,
rather than compromising for compatibility, so we should assume that we will
sometimes want to make lexical changes that affect a large amount of existing
code.

There are choices we could make now that would make anticipated lexical
extensions easier. For example, we could require that all sequences of
operator-like characters are always lexed as a single operator token, even if
that token is meaningless, and that would allow us to add operators in the
future as a non-breaking change.

### Example: interfaces

The set of methods on an interface should be expected to change over time. If a
method were to be added with no evolution strategy in mind, existing
implementations will initially not implement it, meaning they no longer conform
to the interface; if we permit such types to conform to the interface
regardless, then users of the interface risk calling a method that is not
actually implemented.

In order to allow for Carbon code to evolve, we need to provide a path by which
such evolution can occur.

## Proposal

This proposal presents a collection of concrete strategies for making changes to
the language and to libraries, along with basic guidance for when to use which
strategy, and how to design language features to minimize evolutionary problems.
The list in this proposal is not intended to be exhaustive, but is instead
intended to provide a baseline set of approved strategies.

I propose the creation of a new Principle document based on the contents of this
document. In addition, some further minor changes to course-correct prior
proposals are given in the [Consequences](#consequences) section below.

## Details

### Strategy: point change with transparent migration

#### Summary

- Simultaneously make a change and provide a correct and fast migration tool.
- Builds of an un-migrated package perform a migration to temporary files and
then build the resulting migrated package.
- Package maintainers run the migration tool and check in the result,
including a marker to say the package has been migrated, when they're ready.

#### Details

This strategy allows Carbon sources to adopt changes at their own pace, within
reason, by permitting un-migrated and migrated source files to coexist in the
same build. Some state would be tracked in the package configuration file(s) to
indicate which migrations have already been performed.

Because migration is performed transparently as part of a build, the toolchain
never sees unmigrated source code; as far as it is concerned, all input source
code is written in the latest language using the latest interfaces.

As with regular build actions, migration of dependency packages can be cached,
so the cost of performing the migration is only paid when updating the
dependency, not on every build.

#### Timeline

A language change would progress as follows:

- At time T-1, the Carbon toolchain does not support the new language feature,
and Carbon packages do not indicate they have been migrated to use it.
- At time T, the Carbon toolchain introduces support for the new feature. All
existing code continues to build by way of an implicit auto-upgrade.
- At time T+X, a package migrates to the new version and performs a release.
Dependent packages continue to build with Carbon toolchains from time T
onwards, but earlier toolchains no longer work.
- At time T+Y, once the Carbon ecosystem has largely migrated, the Carbon
toolchain removes the automigration support. This may be months or years
later.

Note that under this model, new features can be used as soon as they are
implemented, but doing so imposes downstream constraints on acceptable toolchain
versions.

#### Applicability

This approach is only applicable if a migration tool can be built that is both
correct in all cases and acceptably fast. It is unlikely to be acceptable for
the sequence of migration steps performed on a package to substantially slow
down the build of that package. However, this will likely cover all lexical
changes, most syntactic changes, and also many semantic changes where the old
semantics can be recovered by different syntax.

We should make the facilities of this approach available to user code, by
allowing a package to expose automigration tools that will be transparently
applied to its dependents.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing users access to writing these tools runs the significant risk of them producing versions that are not sufficiently correct

Copy link
Contributor Author

@zygoloid zygoloid Apr 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. How much should we be worried about that? In some sense, this is "your dependencies can release a new version that breaks you", which I think will be the case regardless, but it does seem like the problem has a different character given that they can break anything in the dependent projects by performing completely arbitrary rewrites.

There's also the aspect that people will be building, executing, and deploying code that literally no-one has ever code-reviewed. I don't think that's abnormal, either -- there are lots of systems that generate code that (most of the time) no-one and nothing looks at the output of other than a compiler -- but again this would be happening at a larger scale than is common.


#### Advantages

- New functionality can be provided and adopted with no delay.
- The timeframe for adopting a change is very loose.
- There is no required ordering between a package adopting a change and its
dependents adopting the change.
- There is no need to make language changes to prepare for this strategy,
beyond ensuring that all existing code can be automatically migrated.

#### Disadvantages

- Build-time diagnostics and runtime semantics will reflect the result of the
migration tool, which may be surprising when relating diagnostics or
behavior back to the original source of an un-migrated package. For example,
source snippets in diagnostics may refer to code that doesn't match the
original source, and debug information may refer to generated files instead
of originals.
- Migration tools may not work correctly on invalid code, such as code under
active development, potentially resulting in build errors that are unrelated
to any source errors, and potentially surprising output from tooling. For
example, after a language syntax change, an autocomplete tool may suggest
completions using the new language syntax even when editing an unmigrated
source file.
- If a change is released with an incorrect migration tool, builds may break.
This is somewhat different from the expected fragility of new compiler
features, because unchanged code is expected to be affected more frequently.

Most of the disadvantages can be mitigated by ensuring that packages under
maintenance are migrated early.

#### Example

We decide that we want to replace `var type name` with `var name : type`. A
migration tool is built to perform the refactoring, and the toolchain is updated
to parse the new syntax instead of the old syntax. The updated toolchain and
migration tool are released together.

All subsequent builds using the new toolchain first migrate the source code to
the new syntax, and then pass it to the new toolchain, which only understands
the new syntax.
Copy link
Contributor

@gribozavr gribozavr Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you see this migration tool to be implemented and released in practice? It would need to be built on T-1 grammar and semantics. IIUC, the migration tool is separated from the compiler so that the compiler (and hence the grammar at time T) only needs to handle the new syntax. However, the compiler T needs to ship with a migration tool that understands T-1 syntax and semantics, and such handling is no longer available in current compiler libraries. So the migration tool can't use the T compiler as a library, it needs T-1.

It seems to me that either the migration tool will need to be built from a different branch than the compiler, or they can be built from the same source code, but with different feature flags enabled. I think branch-based development of the migrator will be a non-starter for the Carbon toolchain development process. If we use feature flags, we could "as well" for many migrations allow the new compiler to understand both kinds of syntax (package migration flags will determine whether the old or new syntax is actually accepted).

While this distinction might look like an implementation detail, I think it is user-visible, as it mitigates a number of disadvantages described above. Rust's editions are very similar to this flag-based model, for example, the RFC for Rust 2021 says:

  • Editions are used to introduce changes into the language that would otherwise have the potential to break existing code, such as the introduction of a new keyword.
  • Editions are never allowed to split the ecosystem. We only permit changes that still allow crates in different editions to interoperate.
  • Editions are named after the year in which they occur (e.g., Rust 2015, Rust 2018, Rust 2021).
  • When we release a new edition, we also release tooling to automate the migration of crates. Some manual work may be required but that should be uncommon.
  • The nightly toolchain offers "preview" access to upcoming editions, so that we can land work that targets future editions at any time.
  • We maintain an Edition Migration Guide that offers guidance on how to migrate to the next edition.
  • Whenever possible, new features should be made to work across all editions.

Note that editions allow for removals following a deprecation cycle (see RFC 2052):

When opting in to a new edition, existing deprecations may turn into hard errors, and the compiler may take advantage of that fact to repurpose existing usage, e.g. by introducing a new keyword. This is the only kind of breaking change a edition opt-in can make.

In a different place Nico clarifies:

The language of the RFC was very clear that you should get warnings in the latest compiler release. This basically means that so long as we have the migration lints, we're ok. It's not require that the warnings are there for the entire edition or anything.

I think it is very similar to our goals and our migration strategy. The only real differences I'd propose:

  • Carbon should name editions after the month of the release date (e.g., 2021.4) to allow for faster evolution.
  • The Carbon toolchain will support the latest edition, and non-latest editions for at least for a certain amount of time (e.g., 6 months). Support for older editions will be dropped depending on the maintenance cost and user demand.

For migrating users of libraries over API changes we have the same issue. If libUiFramework releases v2 that requires a migration from v1, then to migrate a libCustomWidget we need libUiFramework v1 just to do semantic analysis of libCustomWidget before the migration, and v2 immediately after the migration to actually compile it. I think this is going to be difficult without a widely adopted package manager and build system; it might be more practical to see if all the necessary migration information can be included into just the libUiFramework v2.


### Strategy: incremental change

#### Summary

- Make step-by-step progress, alternating between making a change that is
compatible with current usage and updating current usage to avoid removed
functionality and adopt added functionality.
- Changes that modify the meaning of existing code may result in several such
steps.

#### Details

In this approach, we avoid making backwards-incompatible changes immediately.
Instead, every backwards-incompatible change has a transition period in which we
expect Carbon source to be migrated. The backwards-incompatible change is then
only made once the transition period has elapsed.

We divide the change up into a sequence of steps, where each step is one of the
following:

- An _addition_, that strictly increases the set of valid input programs,
without changing the meaning of any program already in the set. For example,
this might include recognizing a new token that was previously invalid.
- A _removal_, that strictly decreases the set of valid input programs,
without changing the meaning of any program in the set.

Additions are performed directly, with no transition period required. Removals
are performed by announcing the intent to remove, introducing diagnostic
messages for uses of functionality that is pending removal, producing tools to
transition uses of the removed functionality, and then after a suitable
transition time, performing the removal.

In order to navigate from the current state to the desired end state by a
sequence of additions and removals, intermediate scaffolding functionality that
is present in neither state may be necessary. For example, when changing the
meaning of a function parameter, it may be necessary to temporarily add a
scaffolding function with a new name, migrate some or all existing callers to
the new function, change the original function, and then migrate back.

#### Timeline

A language addition would progress as follows:

- At time T-1, the Carbon toolchain does not support the new language feature.
- At time T, the Carbon toolchain supports the new feature, and source code
can start to use it.

Library additions would follow a similar path, with the change being made in the
library rather than in the toolchain.

Use of an added feature imposes a version constraint: once a package uses a
feature, anyone compiling it or its dependents would need a suitably recent
version of the toolchain or the package introducing the change.

A language removal would progress as follows:

- At time T, the intent to remove the feature is announced, and the Carbon
toolchain starts producing warnings when encountering uses of the feature.
Over subsequent releases, the severity of these warnings increases.
- At time T+K, the feature is removed from the Carbon toolchain.

Library removals would follow a similar path, with the change being made in the
library rather than in the toolchain; this necessitates there being a mechanism
by which library authors can request diagnostics for use of certain
functionality.

Note that removals take time before they become active under this model. If we
can anticipate such changes and prepare for them, we can in some cases avoid the
need for the first step and the scaffolding feature.

#### Applicability

This approach is applicable to most -- or perhaps all -- changes, but may
require multiple steps for certain kinds of change, requiring a long time for a
migration to complete.

#### Advantages

- At every stage, all source code across all packages is written using the
same language rules and the same library interfaces.
- The code being built and run is exactly the code in the source files.
- This strategy has wide applicability.

#### Disadvantages

- Changes in which a removal must complete before some addition is performed
can potentially take a long time when following this strategy.
- Introducing and removing scaffolding requires additional work that is not
fundamental to the change being made.
- Manual migration will be required in some cases.

#### Example

In order to support changes to an interface, we allow newly-added methods to be
marked as `upcoming`. This indicates that the method is not required, and indeed
cannot be called (except by other `upcoming` functionality), but can be
implemented. Then the addition of an interface method can be staged as follows:
Comment on lines +302 to +305
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this about a change to the language, or a feature to allow evolution of user-defined interfaces? It feels like the document is mostly talking about the former, but this seems to be about the latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary focus when writing this document was about evolving the language and its standard library, but my intent was to cover both that and the needs of people evolving non-leaf packages implemented in Carbon. That said, I'd expect that things that people evolving Carbon software need are also things that we need to evolve the standard library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
implemented. Then the addition of an interface method can be staged as follows:
implemented. Then the addition of an interface method for which no default implementation is possible can be staged as follows:


- A method is introduced, declared `upcoming`. This is an addition, as
strictly more programs become valid.
- The intent to remove the `upcoming` marker is announced -- in this case,
implicitly, as all `upcoming` markers indicate an intent to remove the
marker. The removal period for this `upcoming` marker begins.
- Over time, the method is implemented by all implementers of the interface.
- The `upcoming` marker is removed. This is a removal, as it results in
strictly fewer programs being valid.
- Once the removal is complete, the new method can be used. This is an
addition, that in this instance occurs concurrently with the completion of
the removal phase and the removal of the `upcoming` marker.
Comment on lines +307 to +317
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example doesn't use a default implementation of the upcoming method. With a default, the new function can be used with much less latency. This may be painting incremental changes in an unfair light.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a correct and generally-applicable default, the transition can be done as a point change, or perhaps even as a pure addition. I'm happy to switch to a different example; this one might be unhelpful by being similar to something we've been considering but with somewhat different details.


### Guidance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structural comment, not necessary to address in this proposal, but from a BLUF writing perspective the guidance feels like the bottom-line of this proposal, and thus how it should begin, rather than at the tail end.


The primary driver for any change should be the intended end state. While the
migration path to a goal should be a consideration, and may sway our decision
between options that otherwise provide similar value, we should prefer using
more expensive migration strategies over selecting an inferior end state.

When a choice of strategies is available, purely additive changes should be
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I believe that purely additive changes should be preferred. There is value in having a small core

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent is that this only applies subject to the "primary driver is the intended end state" above: only when the choices are largely equal on other merits should we consider this factor. I think I can express that more clearly.

preferred over point changes, and point changes should be preferred over
incremental changes.

Language features should, where possible, be designed to reduce the necessity of
incremental changes for anticipated future evolution. For example, if the
spelling of an identifier is visible through reflection, then adding a keyword
may require use of the incremental strategy to rename existing uses, as a
fully-correct migration tool can't be built in general. However, if a raw
identifier syntax is introduced, then the same change can be a point change,
where the migration tool replaces all existing uses with semantically-identical
raw identifiers.

### Consequences

We anticipate that all lexical changes can be accommodated by the point change
strategy. Therefore there is no requirement to reserve any lexical space to
prepare for future changes.

Therefore, we will no longer require whitespace after the `//` introducing a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about //* or //-? Do we not need to leave that open as potentially new operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to add such an operator, we could migrate all existing uses of //* or //- that introduce a comment to add whitespace after the //, as a point change. (I think we'd probably want to do something smarter, like looking for the enclosing sequence of consecutive comment lines and adding whitespace after the comment introducer across all of them. But in any case I think this can be handled as a point change.)

I think, broadly, if we can model an anticipated direction of evolution as point changes, we shouldn't try to guess what changes we'll want to make, because the cost of making those changes is sufficiently small. (For example, let's not proactively reserve a bunch of words that we think might be keywords, if we think the cost of reclaiming an identifier as a keyword is small.) If, on the other hand, an anticipated direction of evolution would require an incremental migration in response to changes, then we should be thinking about how to make such future changes easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think I may believe a little more than you that we should encourage reserving lexical space so that more future changes can be purely additive changes, even if we choose not to pursue them.

Right now it may not be worth guessing what changes we'll want to make -- Carbon is small, and if we added a token //* probably nothing would be broken, so we wouldn't really make a tool. However, as Carbon grows, those costs shift -- I think reserving lexical space is going to be cheaper than writing and running migrations (note this is also a burden to users who need to update their code). Thus point changes actually have a more significant cost long-term, pushing more for reserving lexical space.

So yeah, right now, don't reserve //*. If Carbon goes public and we still haven't really decided, reserve //*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think I'd got too anchored to point changes being substantially cheaper than incremental changes and I'd lost sight of additive changes being substantially cheaper than point changes (a point change still churns the entire Carbon ecosystem as the migration tool is applied, in addition to the disadvantages listed in this proposal, whereas an additive change does not). Reserving lexical space to turn point changes into additive changes makes a lot of sense to me, but I agree that we don't need to do so now.

comment, nor will we disallow decimal digits to follow a `\0` escape sequence.

This strategy also subsumes the approach described in
[proposal 93](https://github.com/carbon-language/carbon-lang/pull/93), with
package-wide migration instead of file-at-a-time migration, leaving only the
addition of raw identifier syntax, which is still justified both as a vehicle
for ensuring that correct migration is always possible and that identifiers that
are keywords in Carbon but not keywords in C++ can be expressed.

## Alternatives considered

### Non-strategy: simultaneous migration

A number of strategies that require making simultaneous changes to multiple
packages, or to the toolchain and third-party packages, are possible. We
consider such strategies to be untenable.