diff --git a/docs/design/expressions/indexing.md b/docs/design/expressions/indexing.md index c0327ecceed5d..1fb95b7c1ef2b 100644 --- a/docs/design/expressions/indexing.md +++ b/docs/design/expressions/indexing.md @@ -40,18 +40,8 @@ implement that interface, or to method calls on `IndexWith` otherwise. `IndirectIndexWith` provides a final blanket `impl` of `IndexWith`, so a type can implement at most one of those two interfaces. -The `Addr` methods of these interfaces, which are used to form durable reference -expressions on indexing, must return a pointer and work similarly to the -[pointer dereference customization interface](/docs/design/values.md#dereferencing-customization). -The returned pointer is then dereferenced by the language to form the reference -expression referring to the pointed-to object. These methods must return a raw -pointer, and do not automatically chain with customized dereference interfaces. - -**Open question:** It's not clear that the lack of chaining is necessary, and it -might be more expressive for the pointer type returned by the `Addr` methods to -be an associated facet with a default to allow types to produce custom -pointer-like types on their indexing boundary and have them still be -automatically dereferenced. +The `Ref` methods of these interfaces, which are used to form durable reference +expressions on indexing, must return by `ref`. ## Details @@ -64,13 +54,13 @@ Its semantics are defined in terms of the following interfaces: ``` interface IndexWith(SubscriptType:! type) { let ElementType:! type; - fn At[self: Self](subscript: SubscriptType) -> ElementType; - fn Addr[addr self: Self*](subscript: SubscriptType) -> ElementType*; + fn At[bound self: Self](subscript: SubscriptType) -> val ElementType; + fn Ref[bound ref self: Self](subscript: SubscriptType) -> ref ElementType; } interface IndirectIndexWith(SubscriptType:! type) { require Self impls IndexWith(SubscriptType); - fn Addr[self: Self](subscript: SubscriptType) -> ElementType*; + fn Ref[bound self: Self](subscript: SubscriptType) -> ref ElementType; } ``` @@ -79,11 +69,11 @@ rewritten based on the expression category of _lhs_ and whether `T` is known to implement `IndirectIndexWith(I)`: - If `T` implements `IndirectIndexWith(I)`, the expression is rewritten to - "`*((` _lhs_ `).(IndirectIndexWith(I).Addr)(` _index_ `))`". + "`(` _lhs_ `).(IndirectIndexWith(I).Ref)(` _index_ `)`". - Otherwise, if _lhs_ is a [_durable reference expression_](/docs/design/values.md#durable-reference-expressions), - the expression is rewritten to "`*((` _lhs_ `).(IndexWith(I).Addr)(` _index_ - `))`". + the expression is rewritten to "`(` _lhs_ `).(IndexWith(I).Ref)(` _index_ + `)`". - Otherwise, the expression is rewritten to "`(` _lhs_ `).(IndexWith(I).At)(` _index_ `)`". @@ -93,18 +83,18 @@ implement `IndirectIndexWith(I)`: final impl forall [SubscriptType:! type, T:! IndirectIndexWith(SubscriptType)] T as IndexWith(SubscriptType) { - let ElementType:! type = T.(IndirectIndexWith(SubscriptType)).ElementType; - fn At[self: Self](subscript: SubscriptType) -> ElementType { - return *(self.(IndirectIndexWith(SubscriptType).Addr)(index)); + where ElementType = T.(IndirectIndexWith(SubscriptType).ElementType); + fn At[bound self: Self](subscript: SubscriptType) -> val ElementType { + return self.(IndirectIndexWith(SubscriptType).Ref)(index); } - fn Addr[addr self: Self*](subscript: SubscriptType) -> ElementType* { - return self->(IndirectIndexWith(SubscriptType).Addr)(index); + fn Ref[bound ref self: Self](subscript: SubscriptType) -> ref ElementType { + return self.(IndirectIndexWith(SubscriptType).Ref)(index); } } ``` Thus, a type that implements `IndirectIndexWith` need not, and cannot, provide -its own definitions of `IndexWith.At` and `IndexWith.Addr`. +its own definitions of `IndexWith.At` and `IndexWith.Ref`. ### Examples @@ -114,8 +104,8 @@ An array type could implement subscripting like so: class Array(template T:! type) { impl as IndexWith(like i64) { let ElementType:! type = T; - fn At[self: Self](subscript: i64) -> T; - fn Addr[addr self: Self*](subscript: i64) -> T*; + fn At[bound self: Self](subscript: i64) -> val T; + fn Ref[bound ref self: Self](subscript: i64) -> ref T; } } ``` @@ -126,7 +116,7 @@ And a type such as `std::span` could look like this: class Span(T:! type) { impl as IndirectIndexWith(like i64) { let ElementType:! type = T; - fn Addr[self: Self](subscript: i64) -> T*; + fn Ref[bound ref self: Self](subscript: i64) -> ref T; } } ``` diff --git a/docs/design/values.md b/docs/design/values.md index b1a148913c1d7..5704149b4130c 100644 --- a/docs/design/values.md +++ b/docs/design/values.md @@ -414,9 +414,9 @@ the available implementation strategies. > **Future work:** The interaction between a > [custom value representation](#value-representation-and-customization) and a > value expression used with a polymorphic type needs to be fully captured. -> Either it needs to restrict to a `const Self*` style representation (to -> prevent slicing) or it needs to have a model for the semantics when a -> different value representation is used. +> Either it needs to restrict to a `const ref` style representation (to prevent +> slicing) or it needs to have a model for the semantics when a different value +> representation is used. ### Interop with C++ `const &` and `const` methods @@ -555,6 +555,12 @@ functions with a `()` return type for the purpose of expression categories. #### Deferred initialization from values and references +TODO: This section needs to be updated to reflect the addition of `-> val` +returns in [proposal #5434](/proposals/p5434.md). This section could be replaced +by a statement that initializing returns may be replaced by value returns when +that is safe and correct, moving much of this content into a description of how +value returns works. + Carbon also makes the evaluation of function calls and return statements tightly linked in order to enable more efficiency improvements. It allows the actual initialization performed by the `return` statement with its expression to be @@ -640,6 +646,9 @@ specialized constructs given the specialized nature of these operations. ### Reference types +TODO: This section needs to be updated to reflect +[proposal #5434](/proposals/p5434.md). + Unlike C++, Carbon does not currently have reference types. The only form of indirect access are pointers. There are a few aspects to this decision that need to be separated carefully from each other as the motivations and considerations @@ -883,12 +892,12 @@ keyword. It isn't final at all and likely will need to change to read well. The provided representation type must be one of the following: - `const Self` -- this forces the use of a _copy_ of the object. -- `const Self *` -- this forces the use of a [_pointer_](#pointers) to the - original object. +- `const ref` -- this forces the use of a [_pointer_](#pointers) to the + original object, but with the `const` API subset. - A custom type that is not `Self`, `const Self`, or a pointer to either. -If the representation is `const Self` or `const Self *`, then the type fields -will be accessible as [_value expressions_](#value-expressions) using the normal +If the representation is `const Self` or `const ref`, then the type fields will +be accessible as [_value expressions_](#value-expressions) using the normal member access syntax for value expressions of a type. These will be implemented by either accessing a copy of the object in the non-pointer case or a pointer to the original object in the pointer case. A representation of `const Self` @@ -899,7 +908,7 @@ used. If no customization is provided, the implementation will select one based on a set of heuristics. Some examples: -- Non-copyable types and polymorphic types would use a `const Self*`. +- Non-copyable types and polymorphic types would use a `const ref`. - Small objects that are trivially copied in a machine register would use `const Self`. diff --git a/proposals/p5434.md b/proposals/p5434.md new file mode 100644 index 0000000000000..d2d7cbec91fef --- /dev/null +++ b/proposals/p5434.md @@ -0,0 +1,1667 @@ +# `ref` parameters, arguments, returns and `val` returns + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/5434) + + + +## Table of contents + +- [Abstract](#abstract) +- [Problem](#problem) +- [Background](#background) +- [Proposal](#proposal) + - [`ref` bindings](#ref-bindings) + - [`ref` and `val` returns](#ref-and-val-returns) +- [Details](#details) + - [Compound return forms and patterns](#compound-return-forms-and-patterns) + - [Nested binding patterns](#nested-binding-patterns) + - [Mututation restriction on objects bound to a value](#mututation-restriction-on-objects-bound-to-a-value) + - [No optimization on erroneous behavior](#no-optimization-on-erroneous-behavior) + - [`bound` parameters](#bound-parameters) + - [How addresses interact with `ref`](#how-addresses-interact-with-ref) + - [Improved interop and migration with C++ references](#improved-interop-and-migration-with-c-references) + - [Part of the expression type system, not object types](#part-of-the-expression-type-system-not-object-types) + - [Interaction with `returned var`](#interaction-with-returned-var) + - [Use case: `Deref` interface](#use-case-deref-interface) + - [Use case: indexing interfaces](#use-case-indexing-interfaces) + - [Use case: member binding interfaces](#use-case-member-binding-interfaces) + - [Use case: class accessors](#use-case-class-accessors) + - [Type completeness](#type-completeness) + - [Pointer value representation](#pointer-value-representation) +- [Future work](#future-work) + - [Temporary lifetimes](#temporary-lifetimes) + - [`ref` bindings in lambdas](#ref-bindings-in-lambdas) + - [Interaction with effects](#interaction-with-effects) + - [More precise lifetimes](#more-precise-lifetimes) + - [Combining with compile-time](#combining-with-compile-time) + - [Interaction with `Call` or other interfaces](#interaction-with-call-or-other-interfaces) + - [Destructuring assignment](#destructuring-assignment) + - [Restore `addr`](#restore-addr) +- [Rationale](#rationale) +- [Alternatives considered](#alternatives-considered) + - [No `ref`, only pointers](#no-ref-only-pointers) + - [Remove pointers after adding references](#remove-pointers-after-adding-references) + - [Allow `ref` bindings in the fields of classes](#allow-ref-bindings-in-the-fields-of-classes) + - [No call-site annotation](#no-call-site-annotation) + - [Top-level `ref` introducer](#top-level-ref-introducer) + - [Allow immutable value semantic bindings nested within variable patterns](#allow-immutable-value-semantic-bindings-nested-within-variable-patterns) + - [Remove `var` as a top-level statement introducer](#remove-var-as-a-top-level-statement-introducer) + - [`ref` as a type qualifier](#ref-as-a-type-qualifier) + - [`bound` would change the default return to `val`](#bound-would-change-the-default-return-to-val) + - [Other return conventions](#other-return-conventions) + - [`return var` with compound return forms](#return-var-with-compound-return-forms) + - [Other syntax for compound return forms](#other-syntax-for-compound-return-forms) + - [`ref` parameters allow aliasing](#ref-parameters-allow-aliasing) + - [`let` to mark value returns instead of `val`](#let-to-mark-value-returns-instead-of-val) + - [`=>` infers form, not just type](#-infers-form-not-just-type) + + + +## Abstract + +- A parameter binding can be marked `ref` instead of `var` or the default. It + will bind to reference argument expressions in the caller and produces a + reference expression in the callee. + - Unlike pointers, a `ref` binding can't be rebound to a different object. + - This replaces `addr`, and is not restricted to the `self` parameter. + - A `ref` binding, like a value binding, can't be used in fields of + classes or structs. + - When calling functions, arguments to non-`self` `ref` parameters are + also marked with `ref`. +- The return of a function can optionally be marked `ref`, `val`, or `var`. + These control the category of the call expression invoking the function, and + how the return expression is returned. + - These may be mixed for functions returning tuple or struct forms. +- Any parameters whose lifetime needs to contain the lifetime of the return + must be marked `bound`. +- The address of a `ref` binding is `nocapture` and `noalias`. +- We mark parameters of a function that may be referenced by the return value + with `bound`. + +## Problem + +Reference bindings have come up multiple times: + +- as a better alternative to `addr self: Self*`; +- for use in [lambda captures](/docs/design/lambdas.md); +- to model different kinds of C++ references for interop and migration; +- to support nested bindings within a destructured `var`, see + [issue #5250](https://github.com/carbon-language/carbon-lang/issues/5250) + and + [proposal #5164](https://github.com/carbon-language/carbon-lang/pull/5164); +- for forwarding arguments while preserving + [expression category](/docs/design/README.md#expression-categories); +- to add a feature to pattern matching to modify things after they have been + matched; +- to support refactoring code without changing all the uses of a name, a + problem we are already seeing with `self` and `addr self`, and would be a + point of friction in local pattern matching in the future; and +- to support breaking up an expression into pieces without altering the + expression category of individual pieces. + +Reference returns have also come up before, particularly to support operators +such as indexing `[`...`]` and other functions that should produce a reference +expression. It is desirable, though, that this not introduce new memory unsafety +concerns, due to returning a reference to something with insufficient lifetime. + +In addition, we have been interested in adding other return mechanisms that +support returning values in registers in cases that our current convention +won't. + +## Background + +- Carbon has + [reference expressions](/docs/design/values.md#reference-expressions). +- Using + [the `addr` keyword on mutating methods to get a `self` with a pointer type](/docs/design/classes.md#methods) + was introduced in + [proposal #722: "Nominal classes and methods"](/proposals/p0722.md#keyword-to-indicate-pass-by-address). +- [Leads issue #5261: "We should add `ref` bindings to Carbon, paralleling reference expressions"](https://github.com/carbon-language/carbon-lang/issues/5261) + supports adding `ref` bindings to Carbon. +- [LLVM's `noalias` attribute](https://llvm.org/docs/LangRef.html#function-attributes) + is used to mark a pointer as being aliased in only limited ways to enable + optimization. Also see + [LLVM's pointer aliasing rules](https://llvm.org/docs/LangRef.html#pointer-aliasing-rules). +- Marking a pointer as not captured, to allow optimizations, was originally + done with + [LLVM's `nocapture` attribute](https://releases.llvm.org/11.0.0/docs/LangRef.html#parameter-attributes), + which has become + [`captures(none)` and `captures(ret: address, provenance)`](https://llvm.org/docs/LangRef.html#function-attributes), + which is governed by + [pointer capture rules](https://llvm.org/docs/LangRef.html#pointer-capture). +- Clang allows C++ code to use the + [`clang::lifetimebound` attribute](https://clang.llvm.org/docs/AttributeReference.html#lifetimebound) + to mark parameters that may be referenced by the return value, in order to + detect some classes of use-after-free memory-safety bugs. +- [C++ has reference types](https://en.cppreference.com/w/cpp/language/reference). + +## Proposal + +### `ref` bindings + +We introduce a new keyword `ref`. This may be added to a `:` binding to mark it +as binding to a reference expression, as in: + +```carbon +fn F(ptr: i32*) { + // A reference binding `x`. + let ref x: i32 = *ptr; + + // Use of `x` is a reference expression that + // refers to the same object as `*ptr`. + Assert(&x == ptr); + + // Equivalent to `*ptr += 1;`. + x += 1; +} + +fn G() { + var y: i32 = 2; + F(&y); + Assert(y == 3); +} +``` + +The use of the name (`x` in the example) of a `ref` binding forms a durable +reference expression. We ensure that reference expressions formed by way of +reference bindings _do not dangle_. A `ref` binding may only bind to a durable +reference expression or an expression that can be converted to one. The bound +durable reference expression must outlive the `ref` binding. + +The address of a `ref` bound name gives the address of the bound object, so +`&x == ptr` above. The reference itself does not have an address, and unlike a +pointer can't be rebound to reference a different object. + +We remove `addr`, and use instead use `ref` for the `self` parameter when an +object is required. Note that the type will change from `Self*` to `Self` in +this case. In the future, we might +[re-add `addr` back if needed](#restore-addr). + +```carbon +class C { + // ❌ No longer valid. + fn OldMethod[addr self: Self*]() { + // Previously would dereference `self` in + // the body of the method. + self->x += 3; + } + + // ✅ Now valid. + fn NewMethod[ref self: Self]() { + // Now `self` is a reference expression, + // and is not dereferenced. + self.x += 3; + } + + // ✅ Other uses are unchanged. + fn Get[self: Self]() -> i32 { + return self.x; + } + + var x: i32; +} +``` + +Potentially abbreviating the syntax further (to allow `ref self` as a short form +of `ref self: Self`) is left as future work. + +The `ref` modifier is allowed on `:` bindings that are not: + +- inside a `var` pattern, +- a field of a `class` type, or +- a field of a struct type. + +```carbon +fn AddTwoToRef(ref x: i32) { + x += 1; + let ref y: i32 = x; + y += 1; +} + +// Equivalent to: +fn AddTwoToRef(ref x: i32) { + x += 1; + let y_ptr: i32* = &x; + *y_ptr += 1; +} +``` + +We add support for `ref` and `var` in a +[struct pattern](/docs/design/pattern_matching.md#struct-patterns) when using +the shorthand `a: T` syntax for `.a = a: T`: + +```carbon +let {var a: i32, ref b: i32} = ...; + +// Now equivalent to: +let {.a = var a: i32, .b = ref b: i32} = ...; +``` + +> Note: This takes us one step closer to `{` ambiguity. Previously we could +> distinguish between a struct literal/pattern and a non-empty block with only +> up to two tokens of lookahead (the struct cases start with `.` or `_` or +> identifier followed by `:`, and the block cases don't). Now we have things +> like: +> +> ```carbon +> fn F() -> X { var a: i32 = 0; } +> ``` +> +> ... where we're getting incrementally closer to ambiguity. We've got a few +> more steps before we get there, though, since we don't have an `X{...}` +> expression yet, and `var ...` is only allowed in struct patterns rather than +> struct expressions. So we're still fine, but this is cutting down our options +> for future syntactic expansion a little. + +The `ref` modifier is forbidden on the bindings in `class` or struct type +fields. + +``` +var outer_size: i32 = 123; + +class Invalid { + // ❌ Invalid. We don't currently have runtime `let` bindings in classes, + // or `ref` on `var`s, but the intent is to not have `ref` bindings as fields. + let ref invalid_ref_field: i32 = outer_size; +} + +// ❌ Invalid. +var invalid_struct_type_field: + {ref .invalid: i32} = {.invalid = outer_size}; +``` + +In a function argument list, arguments to non-`self` `ref` parameters are also +marked with `ref`. Continuing the example: + +```carbon +var z: i32 = 3; +AddTwoToRef(ref z); +Assert(z == 5); + +// No `ref` though on the `self` argument. +var c: C = {.x = 4}; +c.NewMethod(); +Assert(c.Get() == 7); +``` + +> **Note:** It is important that this restriction is syntactic, not just +> semantic, because it means that `ref` is never the first token of a full +> expression, and so we know without lookahead that a `ref` in a pattern context +> must be the start of a binding pattern, not the start of an expression +> pattern. + +Normally an argument to a non-`ref` parameter should not be marked `ref`, but it +is allowed in a generic context where the parameter may sometimes be `ref`. + +Expression operators will mostly not take `ref` parameters, with these +exceptions: + +- [the address-of operator](/docs/design/expressions/pointer_operators.md) + `&`; +- the first operand of + [the indexing operator](/docs/design/expressions/indexing.md) `[`...`]`; and +- [the member access operator](#use-case-member-binding-interfaces) introduced + in + [proposal #3720: "Member binding operators"](https://github.com/carbon-language/carbon-lang/pull/3720) + `.`. + +The statement operators now use `ref` instead of pointers: + +- the left-hand operand of [assignment operators](/docs/design/assignment.md) + such as `=` and `+=`. +- [the `++` and `--` operators](/docs/design/assignment.md). + +Even in these cases, the arguments will not be marked with `ref` at the call +site. (Generally the `ref` parameter is the `self` parameter, and so wouldn't be +marked. The exception is `BindToRef`, but we don't want to mark its argument +with `ref`.) + +As an _experiment_, we are saying a pointer formed by taking the address of a +`ref` bound name is LLVM-`captures(none)` and LLVM-`noalias`. This means that +while a `ref` parameter could be passed into a function by address, the +restrictions also allow a "move-in-move-out" approach (once we define the move +operation), assuming it is not [marked `bound`](#bound-parameters). The intent +here is to leave the door open to a calling convention using registers and less +indirection for small-enough objects. + +This means that the following code is invalid: + +```carbon +fn F(ref a: i32, ref b: i32) -> bool; + +fn G() -> bool { + var v: i32 = 1; + return F(ref v, ref v); +} +``` + +Enforcing this restriction will be part of the memory safety story. Until then, +doing this is erroneous behavior. This +[means](#no-optimization-on-erroneous-behavior) that the compiler won't use +those LLVM attributes unless the compiler can itself prove that the restrictions +hold. + +### `ref` and `val` returns + +The return of a function can optionally be marked `ref` or `val`. These control +the category of the call expression invoking the function, and how the return +expression is returned. + +```carbon +var global: i32 = 2; +fn ReturnRef() -> ref i32 { + // ❌ Invalid: return 2; + + // ✅ Valid: return a reference expression with + // sufficient lifetime. + return global; +} +// Call `ReturnRef` and use the resulting reference. +ReturnRef() += 3; +Assert(global == 5); + +// Result of `ReturnRef` can be bound using a `ref` +// binding. +fn AddFive() { + let ref r: i32 = ReturnRef(); + r += 5; +} +AddFive(); +Assert(global == 10); + +fn ReturnVal() -> val i32 { + return 2; +} +// ReturnVal() is a value expression. +let l: i32 = ReturnVal(); + +// Returning an initializing expression is the +// default. +fn ReturnDefault() -> i32 { + return 2; +} +// ReturnDefault() is an initializing expression. +var j: i32 = ReturnDefault(); + +// Use `var` to explicitly specify returning an +// initializing expression. +fn ReturnVar() -> var i32 { + return 2; +} +// `ReturnVar()` is the same as `ReturnDefault()`. +``` + +- A call to a function declared `-> ref T` is a durable reference expression. + The generated code for that function will return the address of a `T` + object. +- A call to a function declared `-> val T` is a value expression. The function + will return the value representation of `T`. Since values have no address, + the value representation may be returned in registers. +- The behavior of a call to a function declared `-> T` is unchanged. It is an + initializing expression, returning in place or by copy depending on the + initializing representation of `T`. This is the same behavior as `-> var T`. +- The behavior of `auto` as the return type is unchanged, but now supports an + optional `ref`, `val`, or `var` between the `->` and `auto`. `-> auto` + continues to return an initializing expression, as does `-> var auto`. + `-> val auto` returns a value expression, and `-> ref auto` returns a + durable reference expression. +- Using `=>` to specify a return continues to return an initializing + expression, as before. See + [this relevant alternative considered](#-infers-form-not-just-type). + +A function may have multiple returns, each with their own marker, by using a +tuple or struct compound return form. + +```carbon +fn TupleReturn() -> (val bool, ref i32, C) { + return (true, global, {.x = 3}); +} + +fn StructReturn() + -> {.a: val bool, + .b: ref i32, + .c: C} { + return {.a = true, + .b = global, + .c = {.x = 3}}; +} +``` + +If the return of a function may reference the storage of one or more parameters +to the function, those parameters must be marked `bound`. This allows the +compiler to diagnose if the function's return is used after the lifetime of the +`bound` parameter ends. The semantics of `bound` are intended to match the +[`clang::lifetimebound` attribute](https://clang.llvm.org/docs/AttributeReference.html#id8). + +```carbon +fn Member(bound ref c: C) -> ref i32 { + return c.x; +} + +// Lifetime of a pointer includes the lifetime +// of what it points to. +fn Deref(bound p: i32*) -> ref i32 { + return *p; +} + +fn Both(bound pc: C*) -> ref i32 { + return p->x; +} + +fn Invalid1() -> ref i32 { + var x: i32 = 4; + // ❌ Error: returning reference to `x` + // whose lifetime ends when this function + // returns. + return x; +} + +fn Invalid2() -> ref i32 { + var c: C = {.x = 1}; + // ❌ Error: returning reference bound to `c` + // whose lifetime ends when this function + // returns. + return Member(c) +} +``` + +The address of a `bound ref` parameter is the +[LLVM attribute `captures(ret: address, provenance)`](#background) instead of +[`captures(none)`](#background). + +## Details + +The intent is that we would encourage using references instead of pointers when +possible. Their benefits are related to their limitations, so to get those +benefits we should use them when a use is restricted enough to be within those +limitations. + +### Compound return forms and patterns + +Mirroring the [tuple](/docs/design/pattern_matching.md#tuple-patterns) and +[struct](/docs/design/pattern_matching.md#struct-patterns) pattern forms, we +also support tuple and struct return forms. + +```carbon +// `->` begins a "return form" +fn F() -> ; + +// Within any return form, if the first token is +// `val`, `ref`, `var`, `(`, or `{`, it is not +// treated as type expression: + +// Value return, with type as specified +fn Val() -> val + +// Reference return, with type as specified +fn Ref() -> ref + +// Initializing return, with type as specified +fn Var() -> var + +// Tuple compound return, with a list of +// return forms. +fn TupleCompound() -> ( , ... ) +// Tuple return, with a list of type +// expressions. Used if all members of the +// list are type expressions. +fn Tuple() -> ( , ... ) + +// Struct compound return, with a mapping from +// designators to return forms. +fn StructCompound() + -> { .: , ... } +// Struct return, used if all of the members +// are type expressions. +fn Struct() -> { .: , ... } + +// Otherwise, implicit `var` means returns +// an initializing expression. +fn Other() -> +``` + +Note that in the absence of `val`, `ref`, and `var` keywords, the implicit `var` +is placed in the outermost position, minimizing the number of primitive forms +returned. So `fn F() -> (i32, i32)` means `fn F() -> var (i32, i32)` not +`fn F() -> (var i32, var i32)`. Generally the `var` is left off if not required, +and so will be rare in return forms, to minimize confusion with `val`. + +```carbon +fn TupleReturn(...) + -> (val bool, ref i32, C); +// Equivalent to: +// -> (val bool, ref i32, var C); + +let (a: bool, ref b: i32, var c: C) + = TupleReturn(...); + +fn StructReturn(...) + -> {.a: val bool, + .b: ref i32, + .c: C}; +// Equivalent to: +// -> {.a: val bool, +// .b: ref i32, +// .c: var C}; + +// Binds to the names `x`, `y`, `z`: +let {.a = x: bool, + .b = ref y: i32, + .c = var z: C} = StructReturn(...); + +// Binds to the names `a`, `b`, `c`: +let {a: bool, + ref b: i32, + var c: C} = StructReturn(...); + +// Above two can be mixed, binding to +// names `a`, `y`, `z`. +let {a: bool, + .b = ref y: i32 + .c = var z: C} = StructReturn(...); +``` + +Only types are allowed after a `-> val`, `-> ref`, or `-> var`, not a compound +return form. Examples: + +```carbon +// Returns a tuple of type +// `(bool, f32, C, i32)`. +fn OneTupleReturn(...) + -> (bool, f32, C, i32); + +// Returns a compound tuple form +fn CompoundReturn(...) + -> (bool, val f32); +// Equivalent to: +// -> (var bool, val f32); + +// ❌ Invalid, can't specify `ref` inside +// of `val`. +fn Invalid(...) -> val (bool, ref f32); +``` + +The compound return forms may be nested, as in: + +```carbon +fn CompoundInParens(...) + -> ({.a: bool, .b: val f32}, C, ref i32); +// Equivalent to: +// -> ({.a: var bool, .b: val f32}, var C, ref i32); + +let ({.a = var x: bool, .b = val y: f32}, + var c: C, ref d: i32) = CompoundInParens(...); +// or without renaming: +let ({var a: bool, b: f32}, + var c: C, ref d: i32) = CompoundInParens(...); + +// Contrast with a compound tuple form containing +// a struct type (not compound): +fn StructInParens(...) + -> ({.a: bool, .b: f32}, C, ref i32); +// Equivalent to: +// -> (var {.a: bool, .b: f32}, var C, ref i32); + +fn CompoundInBraces(...) + -> {.a: bool, .b: (val f32, C), .c: ref i32}; +// Equivalent to: + -> {.a: var bool, .b: (val f32, var C), .c: ref i32}; + +let {a: bool, + .b = (x: f32, var y: C), + ref c: i32} = ParensInBraces(...); + +// Contrast with a compound struct form containing +// a tuple type (not compound): +fn TupleInBraces(...) + -> {.a: bool, .b: (f32, C), .c: ref i32}; +// Equivalent to: + -> {.a: var bool, .b: var (f32, C), .c: ref i32}; +``` + +This feature is intended to support cases like `enumerate` that will want to +return a value for the index but a reference to the element of the sequence +being enumerated. + +### Nested binding patterns + +Since a `ref` binding may only bind to a durable reference expression, it can't +be used to bind the result of a function returning an initializing expression. +However, if the initializing expression is bound to a `var`, any nested patterns +are reference binding patterns bound to the subobject, following +[proposal #5164: "Updates to pattern matching for objects"](https://github.com/carbon-language/carbon-lang/pull/5164). + +For example: + +```carbon +fn F() -> (bool, (C, i32)); +let (b: bool, var (c: C, i: i32)) = F(); +``` + +is equivalent to: + +```carbon +fn F() -> (bool, (C, i32)); +let (b: bool, var v: (C, i32)) = F(); +let ref c: C = v.0; +let ref i: i32 = v.1; +``` + +Note that `ref` is disallowed inside `var` since that would be redundant. + +### Mututation restriction on objects bound to a value + +Mutation of objects with a non-copy-value representation in an active value +binding ("borrowed objects") is erroneous behavior. + +- Our plan is to prevent mutation of borrowed objects in Carbon's strict safe + dialect. + - We should only relax our stance here and consider making such mutation + _allowed_ if we discover difficulty with this that we cannot overcome. + - But we _should_ revisit the underlying idea of mutation being erroneous + if enforcing it in the strict mode proves fundamentally untenable due to + ergonomic costs. +- There will always be the potential for unchecked code, either unsafe Carbon + code or C++ code by way of interop, to mutate a borrowed object, hence the + need to define it as erroneous behavior. +- There is no need to ever make it anything more than erroneous behavior, see + below. +- If we can prove the mutation doesn't occur, then we can use that to optimize + under "as-if", and we don't need anything else. +- We are deferring the decision of whether strict enforcement is enabled in + Carbon's current C++-friendly mode, when not explicitly marking the code as + "unsafe." + +This was +[discussed 2025-05-22](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.uot97ukynlsi) +and then made the subject of +[leads issue #5524](https://github.com/carbon-language/carbon-lang/issues/5524). + +### No optimization on erroneous behavior + +The Carbon compiler should not optimize on erroneous behavior, ever, unless the +compiler literally proves that it does not occur, ever. In which case, we don't +need any license to start optimizing on this, as it falls under "as-if". + +The fact that undefined behavior ("UB", +[cppreference](https://en.cppreference.com/w/cpp/language/ub.html), +[wikipedia](https://en.wikipedia.org/wiki/Undefined_behavior)) provides "as-if" +_without_ a proof is precisely the risk of using UB for any semantics, and why +we don't use it here and elsewhere we use erroneous behavior. + +### `bound` parameters + +It is erroneous behavior to return something that references a local object that +won't live once the function returns, even if it is a parameter marked `bound`. +Local objects include local variables, temporary objects, and `var` parameters. + +```carbon +fn Invalid1() -> i32* { + var x: i32 = 4; + // ❌ Invalid. + return &x; +} + +fn Invalid2(bound x: i32) -> ref i32 { + var y: i32 = x; + // ❌ Invalid. + return y; +} + +// ✅ Valid +fn Valid1(bound p: i32*) -> ref i32 { + return *p; +} + +fn Invalid3(bound var x: i32) -> ref i32 { + // ❌ Invalid: lifetime of `var` parameter + // ends when function returns. + return x; +} + +class ReturnMember { + // ✅ Valid + fn ValidRef[bound ref self: Self]() -> ref i32 { + return self.m; + } + + // ❌ Invalid: can't return reference to value. + fn InvalidVal[bound self: Self]() -> ref i32 { + return self.m; + } + + // ❌ Invalid: `var self` lifetime ends. + fn InvalidVar[bound var self: Self]() + -> ref i32 { return self.m; } + + var m: i32; +} + +class DerefPointerMember { + // ✅ Valid + fn ValidRef[bound ref self: Self]() -> ref i32 { + return *self.pm; + } + + // ✅ Valid + fn ValidVal[bound self: Self]() -> ref i32 { + return *self.pm; + } + + // ✅ Valid + fn ValidVar[bound var self: Self]() + -> ref i32 { return *self.pm; } + + var pm: i32*; +} +``` + +Otherwise, `bound` parameters and global variables are the sources of storage +that can be referenced by a return, but need not be referenced, particularly not +on every code path. + +```carbon +// Result references `r` if `b` is true, and `p` +// otherwise. Valid as long as both are marked `bound`. +fn Conditional(b: bool, bound ref r: C, bound p: C*) + -> ref C { + if (b) { + return r; + } else { + return *p; + } +} +``` + +The parameters of functions defined in an interface may also be marked as +`bound`. The `impl` of that interface for a type can omit occurrences of `bound` +from the interface, but cannot add new ones. + +```carbon +interface I { + fn F[bound ref self: Self] + (a: Self, bound b: Self, bound c: Self*) + -> ref Self; +} + +impl C1 as I { + // ✅ Valid: matches interface + fn F[bound ref self: Self] + (a: Self, bound b: Self, bound c: Self*) + -> ref Self; +} + +impl C2 as I { + // ✅ Valid: proper subset of `bound` params + fn F[ref self: Self] + (a: Self, bound b: Self, c: Self*) + -> ref Self; +} + +impl C2 as I { + // ❌ Invalid: `a` is not bound in `I.F`. + fn F[ref self: Self] + (bound a: Self, b: Self, c: Self*) + -> ref Self; +} +``` + +Like `[[clang::lifetimebound]]` in C++, `bound` does not affect semantics or +calling conventions, just what code is legal. This helps avoid mismatches +between typechecking against the signatures in an interface when the `impl` +functions are different. Exception: the question of whether `bound` affects the +lifetime of temporaries is [future work](#temporary-lifetimes). + +Note that all combinations of a `val`/`ref`/default return can be bound to a +value/`ref`/`var` parameter. Examples: + +```carbon +fn RefToVal(bound ref x: C) -> val D { return x.d; } +fn ValToRef(bound y: C) -> ref D { return *y.ptr; } +fn VarToRef(bound var p: i32*) -> ref i32 { return *p; } +fn VarToDefault(bound var p: i32*) -> i32* { return p; } +``` + +For full safety, we need each bound parameter to be immutable for the duration +of the lifetime of the returned result. However, the objective for now is only +matching `[[clang::lifetimebound]]`, which has the goal of preventing some +classes of bugs, not full memory safety. We will reconsider this with the memory +safety design. + +Clang's `lifetimebound` attribute also only applies to the immediately pointed +to objects (by pointers or reference parameters, or pointers or reference +subobjects of an aggregate parameter). We suggest a simpler, transitive model +here that is more restrictive but should be compatible. That said, pinning down +the exact and firm semantics of `bound`, especially in these complex cases, is +deferred to the full memory safety design as well. + +### How addresses interact with `ref` + +The address of a `ref` binding is `noalias` and either `captures(none)` or +`captures(ret: address, provenance)`, depending on whether the binding is marked +`bound`. + +- `noalias` means like C `restrict`; you can't observe mutations through + aliases; mutation through a restricted pointer is not observable through + another pointer +- `captures(none)` means there is no transitive escape: you can pass a + nocapture pointer to another nocapture function, but you can't store to + memory or return +- `captures(ret: address, provenance)`: is like `captures(none)` but may be + referenced by a return. + +The combination of `noalias` and `captures(none)` semantics are the minimum for +the "move-in-move-out" optimization. But this condition is hard to check, so +safe code will use a stricter criteria. Unsafe code will be required to adhere +to just the `noalias` restrictions, but will not be checked (except possibly by +a sanitizer at runtime). The details here will be tackled as part of the memory +safety design. + +Optimizations will only be performed based on information that is enforced or +checked by the compiler, so these attributes won't be passed to LLVM unless +their requirements can be established. This avoids introducing undefined +behavior, which we particularly don't want to do in situations where C++ +doesn't. + +The goal of these rules it to nudge us towards function boundaries that don't +constructively create aliasing in their API boundary and don't capture pointers +unnecessarily. + +These restrictions are experimental, and we should keep track of everything we +end up needing to do to work around these restrictions so any reconsideration +can be properly informed. + +### Improved interop and migration with C++ references + +We expect this to improve interop and migration by allowing significantly more +interface similarity between Carbon and C++. Previously, many things in C++ that +used references on interface boundaries would be forced to switch to pointers. +This adds ergonomic friction both at a basic level because of the forced change +but also a deeper level because it will make it significantly harder to see the +parallel usage across the boundary between C++ and Carbon. With reference +bindings, the vast majority of this dissonance will be removed. + +This does create a migration concern, raised in +[open discussion on 2025-05-01](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.dffumsu6wzlc), +that the `nocapture` and `noalias` modifiers don't match C++ restrictions, +particularly on the `this` parameter that we are going to require migrate to +`ref self`. We may have to add back in `addr` to allow a different pointer type +for those cases. + +> **Future work:** Addressing how we model the various kinds of C++ references +> that Carbon code may need to interact with is +> [something we are actively considering](https://docs.google.com/document/d/1l5TbNuwZEcwm96ejGPLn9GdoQO1fByUW0tFRLU9BqXE/edit?tab=t.0) +> and will be tackled in a future proposal. + +### Part of the expression type system, not object types + +Much like value/`val` and `var` bindings, `ref` binding and the new return forms +are are part of the type system, but only through expression categories, +patterns (function parameters and so on), and returns. Specifically, we don't +expect them to be part of the _object types_ in Carbon. Like value bindings, we +retain a great deal of implementation flexibility around layout, and the +specifics of how they are lowered. + +This specifically means we will need to incorporate `ref` bindings into the +`Call` interface and we will be adding complexity there that will need to be +handled by overloading. The changes to the `Call` interface is future work, and +overloading, once we add support, will need to carry additional complexity to +handle `ref`. + +### Interaction with `returned var` + +The rule is: `returned var` may only be used when there is a single atomic +return form, and it is the default `var` category. + +```carbon +// ✅ Allowed +fn F(...) -> V { + returned var v: V = ...; + // ... + return var; +} + +fn F(...) -> {var .a : T} { + // ❌ Invalid: composite form + returned var ret: T = ...; + // ... +} + +fn F(...) -> val T { + // ❌ Invalid: value return + returned var ret: T = ...; + // ... +} +``` + +We can revisit and expand this later if this does not handle use cases we would +like to support. + +### Use case: `Deref` interface + +To support customization of the prefix-`*` dereferencing operator, we introduce +the `Deref` interface. + +```carbon +interface Deref { + let Result:! type; + fn Op[bound ref self: Self]() -> ref Result; +} + +final impl forall [T:! type] T* as Deref { + where Result = T; + fn Op[bound self: Self]() -> ref T + = "builtin.deref"; +} +``` + +Then `*p` is rewritten to `p.(Deref.Op)()`, and `p->m` is rewritten to +`p.(Deref.Op)().m`. For example, this might be used by a smart pointer: + +``` +class SmartPtr(T:! type) { + fn Make(p: T*) -> Self { return {.ptr = p}; } + impl as Deref { + where Result = T; + fn Op[bound ref self: Self]() -> ref Result { + return *self.ptr; + } + } + private var ptr: T*; +} +``` + +### Use case: indexing interfaces + +[Proposal #2274: "Subscript syntax and semantics"](https://github.com/carbon-language/carbon-lang/pull/2274) +added the interfaces used to support indexing with the subscripting operator +`[`...`]`. We change these in the following ways: + +- The `addr self` parameters are changed to `bound ref self`, to allow the + result to reference the `self` object. +- The `At` method returns by `val`. +- The `Addr` methods are renamed `Ref` and return a reference instead of a + pointer that is automatically dereferenced. + +[This proposal's PR](https://github.com/carbon-language/carbon-lang/pull/5434) +makes those changes to the +[indexing design](/docs/design/expressions/indexing.md). + +### Use case: member binding interfaces + +The member binding interface used for reference expressions from +[proposal #3720](https://github.com/carbon-language/carbon-lang/pull/3720) can +now be changed to use references instead of pointers. + +Before: + +```carbon +// For a reference expression `x` with type `T` +// and an expression `y` of type `U`, `x.(y)` is +// `*y.((U as BindToRef(T)).Op)(&x)` +interface BindToRef(T:! type) { + extend impl as Bind(T); + fn Op[self: Self](p: T*) -> Result*; +} +``` + +After: + +```carbon +// For a reference expression `x` with type `T` +// and an expression `y` of type `U`, `x.(y)` is +// `y.((U as BindToRef(T)).Op)(x)` +interface BindToRef(T:! type) { + extend impl as Bind(T); + fn Op[self: Self](bound ref p: T) -> ref Result; +} +``` + +Similarly, the `BindToValue` interface is changed to use a `val`/value return, +potentially avoiding a copy of large objects. + +Before: + +```carbon +interface BindToValue(T:! type) { + extend Bind(T); + fn Op[self: Self](x: T) -> Result; +} +``` + +After: + +```carbon +interface BindToValue(T:! type) { + extend Bind(T); + fn Op[self: Self](bound x: T) -> val Result; +} +``` + +### Use case: class accessors + +A `ref` return can be used to expose the state of an object in a way that can be +mutated: + +```carbon +class Four { + fn Get[self: Self](i: i32) -> i32 { + Assert(i >= 0 and i < 4); + return self.m[i]; + } + fn GetMut[bound ref self: Self](i: i32) -> ref i32 { + Assert(i >= 0 and i < 4); + return self.m[i]; + } + private var m: array(i32, 4); +} + +var x: HasMember = {.m = (0, 2, 4, 6)}; +x.GetMut(2) += 1; +fn Check(y: Four) { + Assert(y.Get(2) == 5); +} +Check(x); +``` + +**Future work**: this will in the future often be done with an overloaded +method, as in: + +```carbon +class Four { + overload Access { + fn [bound ref self: Self](i: i32) -> ref i32 { + Assert(i >= 0 and i < 4); + return self.m[i]; + } + fn [self: Self](i: i32) -> i32 { + Assert(i >= 0 and i < 4); + return self.m[i]; + } + } + private var m: array(i32, 4); +} + +var x: HasMember = {.m = (0, 2, 4, 6)}; +x.Access(2) += 1; +fn Check(y: Four) { + Assert(y.Access(2) == 5); +} +Check(x); +``` + +This may be a common enough use case that we will want to introduce a dedicated +syntax: + +```carbon +class HasMember { + fn Access[bound ref? self: Self](i: i32) -> ref? i32 { + Assert(i >= 0 and i < 4); + return self.m[i]; + } + private var m: array(i32, 4); +} +``` + +### Type completeness + +Not a change by this proposal, but note that our existing rules will require the +type in a `ref` binding to be complete in situations where it would not need to +be if you were using a value binding with a pointer type instead. We may need to +change this in the future to match C++ which treats reference types more like +pointer types for completeness purposes. + +After this change, a `ref` binding to type `T` will require `T` to be complete +in the same situations that other bindings to type `T` require `T` to be +complete. + +### Pointer value representation + +Purely as a change in syntax, the way to specify that +[the value representation of a class](/docs/design/values.md#value-representation-and-customization) +uses a pointer is changed from writing `const Self *` to `const ref`. + +## Future work + +### Temporary lifetimes + +For safety, we need bindings and returns that reference storage to only be used +while that storage remains valid. When the referenced storage is owned by a +temporary, we have a choice to either control the lifetime of the temporary or +diagnose when the lifetime of the temporary is insufficient. Deciding on our +policy is future work. + +Note that in many cases we can explicitly provide storage in a variable instead +of referencing a temporary. For example, using +`var x: ... = ReturnsATemporary();` instead of +`let ref x: ... = ReturnsATemporary();`. This won't apply in all situations, +though, such as temporaries that are reachable transitively through pointers. + +### `ref` bindings in lambdas + +We have already identified +[future work to support reference captures in lambdas as part of proposal #3848](/proposals/p3848.md#future-work-reference-captures). +This might be a reason to support `ref` bindings as fields of objects, with all +the restrictions that comes with that. + +### Interaction with effects + +We still need to determine how references and the other return types interact +with effects, like `Optional`, errors, co-routines, and so on. For example, we +don't want to give up the benefits of being able to directly return a reference +when a function has an error path. + +It is unclear if this will mean putting references into the object type system, +but we may be able to handle this with additional types or the ability to +customize return representations. For example, we might have an alternate +version of the `Optional` type that holds a reference: + +```carbon +class OptionalRef(T:! type) { + fn Make(bound ref r: T) -> Self { + return {.p = &r}; + } + fn MakeEmpty() -> Self { + return {.p = Optional(T*).MakeEmpty()}; + } + fn HasValue[self: Self]() -> bool { + return p.HasValue(); + } + fn Get[bound ref self: Self]() -> ref Result { + Assert(self.HasValue()); + return *self.p.Get(); + } + + private var p: Optional(T*); +} +``` + +### More precise lifetimes + +More precise lifetime tracking will be considered with the memory safety design. +For example, the `bound` approach does not distinguish different components of a +compound return, or different parts of a parameter object that might have +different lifetimes. + +### Combining with compile-time + +We plan to support references to compile-time state when executing a function at +compile time. That will be part of a future proposal. + +### Interaction with `Call` or other interfaces + +For now, `ref` is not represented in the `Call` interface introduced in +[proposal #2875: Functions, function types, and function calls](https://github.com/carbon-language/carbon-lang/pull/2875). +This will be tackled together in a future proposal with other aspects of +bindings not represented by the type, such as `var` and compile-time, along with +being generic across these aspects of bindings. + +### Destructuring assignment + +Having more support for multiple returns from a function opens the question of +how to do different things with the different returns. We may want a syntax for +saying some of the returns are bound to new names, and some are used in +assignments to existing variables. One possibility would be to have some pattern +syntax for re-initializing an existing object, as in: + +```carbon +fn F() -> (-> bool, ->T); +fn G() { + var x: T = ...; + Consume(~x); + let (b: bool, init x) = F(); + // Continue to use `x`... +} +``` + +This was discussed in +[the #syntax channel on Discord on 05-19-2025](https://discord.com/channels/655572317891461132/709488742942900284/1374126123595727000). + +### Restore `addr` + +There are two reasons we might restore `addr` as an alternative to `ref` for the +implicit `self` parameter: + +- As a way to express uses of `self` that we want to disallow for `ref` + bindings generally but need an escape hatch for migrated C++ code that + relies on these patterns. +- To allow the `self` parameter to use one of the pointer semantics we create + as part of the upcoming memory safety design, that can't be achieved with + `ref`. + +## Rationale + +This proposal tries to advance these [Carbon goals](/docs/project/goals.md): + +- [Performance-critical software](/docs/project/goals.md#performance-critical-software) + - Having a "move-in-move-out" option as a calling convention is a + potential performance improvement for using `ref` parameters instead of + pointers. + - Giving additional options for the return convention gives opportunities + for improved performance. Having this set by explicit return markings is + about giving control and predictability to the code author. +- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) + - `ref` bindings and returns avoid the ceremony of round-tripping through + pointers. +- [Practical safety and testing mechanisms](/docs/project/goals.md#practical-safety-and-testing-mechanisms) + - Checking that reference bindings are not dangling is important for + avoiding use-after-free bugs. + - `bound` markings on parameters to allow safety equivalent to Clang's + `[[clang::lifetimebound]]` when returning a reference. +- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) + - Including references in Carbon allows for less mismatch for C++ code + using references. + +## Alternatives considered + +These ideas were discussed in open discussion on: + +- [2025-05-01](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.dffumsu6wzlc) +- [2025-05-06](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.s42g5iv67d3c) +- 2025-05-07 + [a](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.sfx9d7ltud5) + [b](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.4zbo49wg5rmk) +- [2025-05-08](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.vdognq1upsf5) +- [2025-05-12](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.1mjh6unumnwu) +- [2025-05-13](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.bdznj2d0by2g) +- [2025-05-14](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.52tb7l2he343) + +They were also discussed in the +[#pointers-and-references channel in Discord starting 2025-05-05](https://discord.com/channels/655572317891461132/753021843459538996/1369085231038074901), +and +[#syntax on 2025-05-14](https://discord.com/channels/655572317891461132/709488742942900284/1372285365162872943). + +### No `ref`, only pointers + +The rationale to add `ref` instead of staying with pointers was discussed on +[2025-05-01](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.dffumsu6wzlc). +In addition to the motivating problems given in +[the "Problem" section](#problem), that discussion included some additional +depth to the reasons to add reference bindings. + +There is a tension between wanting to have mutating expressions and only having +pointers. You need some concept like a reference in order to mutate an object +with an expression. The question is how small a box the references are +restricted to, and where the line is drawn. C has lvalues, which contain +references but are restricted to a quite small box. Reference bindings +specifically are about keeping a small box around references while still adding +enough expressivity to support our use cases. We have started with a model +similar to C, but it fell down when it comes down to composition. Decomposing an +expression into pieces loses the tools the expression provided to you. The +missing tool for that was reference bindings. + +We saw how much we were leaning on value bindings. The asymmetry between having +value binding but not reference bindings when have value expressions and +reference expressions was creating pressure. For example, when accessing members +of an object, we had to escape to pointers in that operator. + +One downside of this change is that before indirections were more visible in the +code. + +Also, this does fundamentally mean that we now have another kind of "pointer", +potentially adding complexity to any memory-safety story. However, this ship +already sailed to some extent with value bindings. Fundamentally, bindings are +allowed to have pointer-like semantics from a lifetime perspective, and so will +need to be considered as a pointer-like thing as we build out lifetime safety. + +### Remove pointers after adding references + +If we removed pointers after adding references, we would need something +rebindable for assignable objects. The viable path forward without separate +pointers and references is to have something rebindable like pointers but +automatically dereferenced like references, which is the approach Rust takes. +See +[this comment on issue #5261](https://github.com/carbon-language/carbon-lang/issues/5261#issuecomment-2786462775). + +One of the features of a reference is what it cannot do, and we would have to +remove those restrictions to be able to satisfy the pointer use cases with +references. + +### Allow `ref` bindings in the fields of classes + +A type with reference binding fields would need a lot of restrictions since +reference bindings are not assignable. We did not see enough motivation to put +references into objects, given the complexity that it would introduce, so we are +keeping references out of types for now. This could change to support lambda +reference captures. + +### No call-site annotation + +This question was discussed on +[2025-05-07](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.sfx9d7ltud5). + +We decided that the marking is not about lifetime, but ability to mutate. A +`val` may reference an object in a similar way to a `ref`, restricting +operations on the original object, but we are not going to mark `val`s since +those restrictions are enforced by the compiler. We thought the ability to +mutate, though, was something important enough to highlight to readers of the +code, even at the expense of extra work for the writer. + +Swift `inout` parameters are +[marked at caller with an `&` before the argument](https://docs.swift.org/swift-book/documentation/the-swift-programming-language/functions/#In-Out-Parameters). +Jordon Rose has published +[a regret](https://belkadan.com/blog/2021/12/Swift-Regret-inout-Syntax/) that +they didn't use `inout` to mark the argument instead of `&`. + +On the other hand, not marking is not known to be a source of bugs. + +This is a "try it and see how well it works" sort of decision. + +### Top-level `ref` introducer + +For now, we don't believe `let ref` to be so common as to need a shorter way to +write, unlike what we do for `var`. This was considered in +[leads issue #5523](https://github.com/carbon-language/carbon-lang/issues/5523), +which provided this rationale: + +> I feel like this would often be used for non-local mutation due to it +> fundamentally deporting mutable value semantics and instead having reference +> semantics. Unlike the local mutation, that seems more worthwhile to have +> incentives around minimizing. +> +> However, this seems the easiest of all to revisit later if we discover that +> the added verbosity in practice is costing more than is worth any improvements +> from explicitly flagging mutable reference semantics, or if we find code is +> reaching for antipatterns due to the incentive. + +In addition, +[this comment on leads issue #5522](https://github.com/carbon-language/carbon-lang/issues/5522#issuecomment-2972029100) +argued that it would be more consistent for `ref` and `val` to only apply to +bindings, and not introduce patterns, like `let` and `var`. + +### Allow immutable value semantic bindings nested within variable patterns + +This was considered in +[leads issue #5523](https://github.com/carbon-language/carbon-lang/issues/5523), +which provided this rationale: + +> While it may be an obvious point of orthogonality, I think it adds choice +> without sufficient motivation, and even _having_ that choice does add some +> complexity to the language. +> +> It also seems like we could add this later if there is sufficient demand when +> we have larger usage experience body to pull from with the rest of the Carbon +> language. Currently, the affordance that feels more natural to me is what we +> have. + +> I think we're happy to see motivating use cases and revisit this. At the +> moment, we've just not seen motivating use cases -- everything has seemed a +> bit too contrived. + +### Remove `var` as a top-level statement introducer + +This was considered in +[leads issue #5523](https://github.com/carbon-language/carbon-lang/issues/5523), +which provided this rationale: + +> Locals are important, frequent, and frequently mutable. I don't think forcing +> varying locals to go through `let var` for orthogonality aids readability +> enough to offset the verbosity cost of added keywords on a reasonably common +> pattern. +> +> I'm still currently in the position that locally owned objects being mutable +> should not be "discouraged" or "disincentivized" by the language. And I think +> adding artificial incentives to try and avoid needing a mutable local variable +> would either have no effect beyond verbosity, or if it _did_ have effect, it +> wouldn't be a net positive effect due to code being written in a less +> straightforward manner in order to avoid mutation. +> +> To be clear, this is based on intuition and judgement based on my experience, +> not in any way based on data or specific motivating examples. I can imagine +> data or evidence or even a new perspective changing my position here, but so +> far the discussion we've had haven't done that. + +### `ref` as a type qualifier + +The big concern is that any effect that is represented by a type, like +`Optional` or `Result`, will want to compose with reference returns. This could +be done by allowing `ref` to create an object type that could be used as a +parameter to those, as in `Optional(ref T)`, but +[we are trying to avoid going down that path](https://github.com/carbon-language/carbon-lang/issues/5261#issuecomment-2790421894). +We have [future work](#interaction-with-effects) to tackle this problem +specifically. + +There was also +[a concern that we might need `ref` types to represent argument lists with tuples](https://github.com/carbon-language/carbon-lang/issues/5261#issuecomment-2790506515), +but tuples already can't represent `var` or compile-time parameters. We have +other plans for this, instead of trying to stretch tuples to encompass these use +cases. + +We also noted that including references in the type system led to a number of +inconsistencies in C++, such as no there not being references of references. + +### `bound` would change the default return to `val` + +We considered saying that `bound` would change the default return to use the +`->val` return convention. This was discussed on +[2025-05-01](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.dffumsu6wzlc) +and +[2025-05-08](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.vdognq1upsf5). +The idea is that `val` is expected to be efficient, so we should encourage using +it, but we can't always use `val`, since some types have a reference value +representation, but `bound` alleviates that concern. + +Once we realized that `bound` is relevant for all return conventions, we +reconsidered that approach, since has a number of concerns: + +- Changing defaults is action at a distance, changing the behavior without + changing the code in the relevant location. +- We don't want to have to make changes to the return category of copy-paste + of a function from an interface to an `impl` of it when removing `bound`. +- Lifetimes in Rust and Clang's `[[clang::lifetimebound]]` don't change + calling conventions, only what code is valid. + +Going with an approach where less depends on `bound` makes sense for now, since +we are going to reconsider these issues as part of our upcoming memory safety +work. + +### Other return conventions + +We also considered other conventions for returning from functions, in +[a comment on #5434](https://github.com/carbon-language/carbon-lang/pull/5434/files#r2099145225), +on +[2025-05-08](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.vdognq1upsf5) +and on +[2025-05-12](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.1mjh6unumnwu), +most notably: + +- **in place**: This convention was like `->`, but always using the "in place" + convention where the caller allocates storage and provides the callee with + its address, the callee initializes the storage at that address, and the + caller is responsible for destroying after the return. +- **var without storage**: The callee returns a pointer to the storage of a + subobject of a `bound var` parameter, that caller is then responsible for + destroying. A call to this function is reference expression, but with + additional responsibility to destroy. +- **hybrid**: If the type has a copy value representation or trivial + destructive move then return the object representation directly; otherwise + caller passes a pointer and callee initializes it. + +There were also some variations on what the conditions for returning in +registers using the default return convention. + +We seriously considered "var without storage", but the fact that it couldn't +reliably be used to initialize a variable, particularly in the middle of an +object, meant it did not seem valuable enough to include. + +It seemed more valuable to support the "in place" return convention. That return +form allows you to guarantee knowing the address of the object being +constructed, and was a good match for `returned var`. However, we realized that +`var` declarations shouldn't always be associated with in-memory storage, in +particular for types that may be trivially moved. For example, a `var` parameter +with the C++ type `std::unique_ptr` should be passed in registers. A function +returning a `std::unique_ptr` in place would not be as efficient as returning +it by moving it into registers. + +### `return var` with compound return forms + +We considered various syntax options on +[2025-05-12](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.1mjh6unumnwu), +but none of them seemed good enough to justify inclusion at this time: + +```carbon +fn F(...) -> (ref R, val L, V) { + // No longer a `var` being returned. Ideally these + // shouldn't have to be initialized together. + returned ??? (ref r: R, val l: L, var v: V) = ...? + return var; +} + +// We could restrict to one `var` return component, +// but this is a lot of machinery for a small increase +// in expressiveness and applicability. +fn F(...) -> (ref R, val L, V) { + returned var v: V = ...; + let l: L = ...; + return (*r, l, var); +} + +fn F(...) -> (ref R, val L, V) { + // These don't have the right category, and ideally + // shouldn't have to be initialized together. + returned var ret: (R, L, V) = ...? + return var; +} + +fn F(...) -> (ref R, val L, V) { + returned var (_, _, var v: V) = ; +} +``` + +There was another approach we considered for `returned var` originally: + +```carbon +fn F(...) -> (ref R, val L, v1: V1, var v2: V2) { + // ... + // Must use the same names for the `var` (implicit or explicit) returns + // with bound names. + return (r, l, v1, v2); +} +``` + +But this had downsides that still apply: + +- Requires `V1` and `V2` to have unformed states. Otherwise, `v1` and `v2` + would need be initialized when they are declared. +- This does not support only having some branches use `return var`. + +Our current approach handles our main use case for `returned var`: factory +functions. + +We could support an "only `var`s" approach in the future if we want: + +```carbon +fn F(...) -> (var V1, var V2, var V3) { + returned (var v1: V1, var v2: V2, var v3: V3) = ...; + // ... + return var; +} +``` + +### Other syntax for compound return forms + +We considered other options for the syntax of compound return forms on +[2025-05-13](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.bdznj2d0by2g), +[#syntax in Discord on 2025-05-14](https://discord.com/channels/655572317891461132/709488742942900284/1372285365162872943), +and +[2025-05-14](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.52tb7l2he343). +The option of omitting the `->` in each component did not distinguish tuples +from tuple return forms sufficiently: + +```carbon +-> (ref i32, var i32) +-> (bool, ref i32) + +// Is this a single return of a tuple, or a triple +// return using the default return convention? +-> (i32, i32, i32) +``` + +We also considered an approach where compound return forms would start with +`->?`, but this raised concerns about what the meaning of that syntax would be +and whether we want to expose users to that in cases we might be able to avoid +it. + +The original proposed syntax used an arrow `->` in each component of a compound +form. + +```carbon +fn TupleReturn(...) + -> (->val bool, ->ref i32, -> C); + +fn StructReturn(...) + -> {->val .a: bool, + ->ref .b: i32, + -> .c: C}; +``` + +This avoided ambiguity, but was verbose and visually noisy. An alternative was +suggested in +[discussion on 2025-06-13](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.52ru7ner80b4). +This alternative used a default of compound returns for paren `(`...`)` and +brace `{`...`}` expressions, which you could opt out of by using one of the +three category keywords such as `var` to introduce a type expression that would +not be considered a compound return. + +This had the downside that `-> T` would be interpreted as `-> var T` even if `T` +was a tuple type like `(i32, i64)`. However, textually substituting in +`(i32, i64)` in for `T` to get `-> (i32, i64)` would instead be interpreted as +`-> (var i32, var i64)`. + +To overcome this problem, in +[discussion on 2025-06-16](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.rdbzk5jnin3x), +we switched to the approach from this proposal. + +### `ref` parameters allow aliasing + +If requiring `ref` parameters to be `noalias` ends up being too restrictive, we +could instead have the "move-in-move-out" optimization be done only when the +compiler can prove it safe. One strategy would be to generate an alternate +version of the function that is only used in the cases where the `noalias` +conditions can be shown to hold statically. + +### `let` to mark value returns instead of `val` + +This proposal initially used `let` instead of `val` to mark +[immutable value returns](#ref-and-val-returns). However, `let` is used, in +Carbon and other languages, primarily to bind names. In Carbon, the default +binding is a value binding, but that was not considered a close enough to +connect the `let` keyword to value semantics. + +There was also a concern about reusing `let` in multiple contexts to mean +different things, and having separate keywords that were only used to mark the +category of the binding was deemed better separation of concerns. + +We considered making a parallel change to use `init` instead of `var`, but this +had some problems: + +- By making initializing returns the default, there is little expected usage, + so perhaps not worth spending another keyword on. +- The `init` keyword would be particularly expensive, because C++ code + commonly use that word in APIs. + +This question was considered in +[leads issue #5522](https://github.com/carbon-language/carbon-lang/issues/5522). + +### `=>` infers form, not just type + +There was support for the idea that the `=>` return syntax introduced in +[proposal #3848](https://github.com/carbon-language/carbon-lang/pull/3848) +should deduce the form of the return, not just its type. This was discussed in +[#lambdas on 2025-05-20](https://discord.com/channels/655572317891461132/999638000126394370/1374462658723450981). + +However, trying to infer the expression category from the category of the +expression after the `=>` runs into the problem of this often requiring the +parameters to be marked `bound`. A more complicated rule, for example using +whether any parameter is marked `bound`, could be adopted in the future, if the +simple rule proves inadequate. + +Alternatively, the compiler could infer which parameters should be marked +`bound` in this case. That is something to consider with the memory safety +design.