Add "data objects" proposal

nikitabobko · strangepleasures · vanniktech · nikitabobko · commit 6a89b343e6d3 · 2024-01-02T15:50:26.000+01:00
closes #351 Co-authored-by: Pavel Mikhailovskii <pavel.mikhailovskii@gmail.com> Co-authored-by: Niklas Baudy <niklas.baudy@vanniktech.de> Co-authored-by: Marat Akhin <marat.akhin@jetbrains.com> Co-authored-by: Roman Elizarov <elizarov@gmail.com>
diff --git a/data-objects.md b/data-objects.md
@@ -0,0 +1,207 @@
+# Data objects
+
+* **Type**: Design proposal
+* **Authors**: Alexander Udalov, Roman Elizarov, Pavel Mikhailovskii, Marat Akhin
+* **Status**: Preview in 1.8.20, Release in 1.9.0
+* **Discussion and feedback**: [#317](https://github.com/Kotlin/KEEP/issues/317)
+
+# Summary
+
+This KEEP introduces ***data objects*** which fix the inconsistencies in the current Kotlin design related to how one works with immutable data and algebraic data types (ADTs) via data classes, and how to avoid the boilerplate of implementing default `toString` for objects. As one of the effects, it fixes the [KT-4107](https://youtrack.jetbrains.com/issue/KT-4107) feature request.
+
+# Motivation
+
+## Current state
+
+Currently, when working with regular data entities which follow the standard rules, one can use [data classes](https://kotlinlang.org/docs/data-classes.html) to avoid the need to manually create a number of utility functions with known standard behavior, as they can be automatically generated by the compiler. For a data class with data properties `{pi}`, we have the following functions [generated](https://kotlinlang.org/spec/declarations.html#data-class-declaration).
+
+* `equals`/`hashCode` which follow the structural equality rules and consider the pairwise equality of `{pi}`
+* `toString` containing the data class name together with its data properties’ string representations
+* `copy` to support immutability by facilitating easy data class copying while changing one or more data property
+* `componentN` for destructuring declaration
+
+This allows one of the ways you use data classes in Kotlin: as data holders with a number of convenience features. They are most useful for representing immutable value-like data, but it is not a hard constraint. If you need, you can use data classes with mutable data, or even mark your class as a data class for the single purpose of getting a generated `toString` implementation.
+
+The second way one may use data classes is an extension of their “immutable data” nature. Together with the [sealed types](https://kotlinlang.org/docs/sealed-classes.html), data classes allow to describe ADTs in a convenient way, where the sum type is represented as a sealed class/interface, and the product type is represented as a data class.
+
+```
+// An ADT representing simple arithmetic expressions
+sealed interface Expr
+
+data class Add(val lhv: Expr, val rhv: Expr) : Expr
+data class Sub(val lhv: Expr, val rhv: Expr) : Expr
+data class Mul(val lhv: Expr, val rhv: Expr) : Expr
+data class Div(val lhv: Expr, val rhv: Expr) : Expr
+data class Const(val value: Int) : Expr
+
+fun eval(e: Expr): Int = when (e) {
+    is Add -> eval(e.lhv) + eval(e.rhv)
+    is Sub -> eval(e.lhv) - eval(e.rhv)
+    is Mul -> eval(e.lhv) * eval(e.rhv)
+    is Div -> eval(e.lhv) / eval(e.rhv)
+    is Const -> e.value
+}
+```
+
+## Current problem
+
+When describing ADTs, you often have one or more of its variants as unit types.
+
+```
+// An ADT representing functional-style singly-linked list
+sealed interface FList<out T>
+
+data class Node<T>(val value: T, val next: FList<T>) : FList<T>
+object Nil : FList<Nothing>
+```
+
+For example, if you implement a functional-style singly-linked list (SLL) as an ADT, you need a `Nil` singleton value to represent the empty list. In Kotlin, such values and their unit types are usually represented as [objects](https://kotlinlang.org/docs/object-declarations.html#object-declarations-overview). And here comes the inconsistency.
+
+```
+infix fun <T> FList<T>.append(v: T): FList<T> = Node(v, this)
+
+fun main() {
+    val example = Nil append 0 append 1 append 42
+    println(example)
+    // Node(value=42, next=Node(value=1, next=Node(value=0, next=Nil@2752f6e2)))
+    //                                                           ^^^^^^^^^^^^
+    //                                                           (╯°□°)╯︵ ┻━┻
+}
+```
+
+If you want to (re)use the convenience features of data classes for your ADTs, they are not available for regular objects, the most noticeable missing feature being no generated `toString` implementation. If you want to fix this inconsistency, you have to implement the missing features manually, and this seems like an unneeded boilerplate.
+
+To address this boilerplate problem, this KEEP proposes to introduce **data objects** which bring regular objects and data classes together. They can be considered a new flavor of objects which are even more similar to immutable value-like singleton values than regular objects. Alternatively, you can view data objects as a new flavor of data classes with no data properties.
+
+# Design
+
+A data object is a special kind of [object](https://kotlinlang.org/spec/declarations.html#object-declaration), which generalizes the [data class](https://kotlinlang.org/spec/declarations.html#data-class-declaration) abstraction (product type of one or more data properties) to a case of unit type (product type of zero data properties).
+
+
+>Note: as unit type has only one possible value, it is also known as singleton type.
+
+
+Similarly to data classes, there are a number of functions with predefined behavior generated for data objects.
+
+* `equals()` / `hashCode()` functions compliant with their [contracts](https://kotlinlang.org/spec/built-in-types-and-their-semantics.html#kotlin.any-builtins).
+    * `equals(that)` returns true if and only if `that` has the same runtime type as `this`;
+    * `hashCode()` returns the same integers for values `A` and `B` if they are equal w.r.t. the generated `equals`;
+* `toString()` function which returns the data object name.
+
+`copy()` and `componentN()` functions are not generated, as they are not relevant for a unit type: `copy()` function is not needed as unit type has a single possible value, `componentN()` functions are not needed as unit type has no data properties.
+
+Additionally, we disallow providing a custom `equals` / `hashCode` implementation, by inheriting it from a superclass or overriding it in the data object itself, meaning for a data object these functions will always work as described above. This is to ensure a data object always behaves as an immutable value-like type and is inhabited by only one value from the equality point of view.
+
+As an additional effect, the introduction of data objects, similarly to data classes, allows one to get the convenience features (mostly the `toString` implementation) for their objects by marking them as data objects. This fixes the [KT-4107](https://youtrack.jetbrains.com/issue/KT-4107) feature request.
+
+## Data (ir)regular objects
+
+Besides regular objects, Kotlin supports [companion objects](https://kotlinlang.org/docs/object-declarations.html#companion-objects) and [object literals](https://kotlinlang.org/docs/object-declarations.html#object-expressions) (expressions). However, these two entities have a different meaning from regular objects and are not used as immutable values.
+
+* A companion object is used to associate data (properties) and behavior (functions) with a class itself, and not with its instances
+* An object literal is used to declare an anonymous class together with its singleton instance, which is used in a limited scope
+
+For this reason, marking companion objects and object literals as data is prohibited.
+
+## Kotlin stdlib specific design
+
+The Kotlin standard library and various kotlinx libraries have different objects which could become data objects. However, to avoid any possible problems with third-party code which might rely on the current reference equality for objects, we conservatively decided not to change any objects to data objects with the feature release. In the future, this decision may be reconsidered separately.
+
+## (De)serialization specific design
+
+With the generated `equals` implementation, two or more instances of a data object will still be considered equal by the `==` value equality operator, even after they have been (de)serialized or created via reflection. This means, if one respects the value-like nature of data objects and does not compare them using the `===` reference equality operator, their correct (de)serialization does not require any special support.
+
+## Kotlin reflection specific design
+
+With the introduction of data objects, Kotlin reflection starts returning `true` for `DataObject::class.isData` property.
+
+## Kotlin Multiplatform specific design
+
+At the current stage of Kotlin Multiplatform design, `expect` and `data` modifiers cannot be used together, as it is unclear how the “data-ness” requirement of such expected declarations should be fulfilled. Therefore, at the moment expect data objects are forbidden.
+
+## Design questions and answers
+
+### Why not use the default `equals` of regular objects for data objects?
+
+Using structural `equals` instead of referential `equals` helps to additionally ensure the immutable value-like behavior of data objects, even in cases when two or more instances of a data object are created at runtime, e.g., after (de)serialization or reflection.
+
+### Why allow custom `equals` / `hashCode` for data classes, but not for data objects?
+
+In many cases the generated `equals` / `hashCode` for the data class is sufficient and is not overridden. However, in some cases you need to refine the implementation, e.g., when your data class contains an `Array<T>` data property and you want structural equality for it, whereas the default `equals` implementation for arrays is referential. In such cases you have to provide a custom `equals` / `hashCode` implementation for your data class.
+
+As we do not have this problem with data objects (because they do not have any data properties), we decided to disallow providing such custom implementations for them.
+
+### Why no `copy` function for consistency with data classes?
+
+The `dataObject.copy()` expression can be easily misinterpreted, if one were to consider it the same way as the `dataClassInstance.copy()` expression, which creates a new data class instance structurally (but not referentially) equal to `dataClassInstance`. We have the following cases of data object’s `copy` behavior and one’s expectations of how it works.
+
+* If you actually need a new instance from `copy` (i.e., you will be comparing references to data objects somewhere in your code), you are abusing the data object abstraction, as data objects should be compared structurally.
+* If you do not need a new instance from `copy`, you do not need a call to `copy`.
+
+To avoid creating the impression we might create a new instance on `dataObject.copy()`, we decided to not support `copy` for data objects.
+
+### When to use data objects and when to use regular objects?
+
+When should you make your objects into data objects? The general recommendations are as follows.
+
+* If your object is one of the variants in an ADT, it should probably be a data object.
+* If your object needs structural equality (e.g., because of serialization) and/or generated `toString`, it should probably be a data object.
+* If your object needs referential equality, you should probably keep it as a regular object and implement `toString` if needed.
+* In other cases (i.e., when you do not need anything special from your object), you should keep it as a regular object.
+
+Of course, these are only recommendations and one can deviate from them if they feel it to be correct for their specific cases.
+
+# Related features in other languages
+
+In most non-functional-programming-based languages, ADTs are supported as some combination of features on the following two axis.
+
+* How one can describe the ADT as a combination or a hierarchy of its sum/product types
+* How one can conveniently work with individual ADT components and/or avoid boilerplate
+
+## Scala
+
+In Scala 2, ADTs are supported via two features. First is the ability to describe a closed type hierarchy via [sealed types](https://docs.scala-lang.org/tour/pattern-matching.html#sealed-types) (sum type), which gives you exhaustiveness checks in pattern matching, to ensure you handled all variants of your sealed type. Second is the convenient way to declare an ADT component as a [case class](https://docs.scala-lang.org/tour/case-classes.html) (product type) or a case object (unit type). Case classes remove boilerplate associated with immutable data (which ADTs most often are): they provide structural (not referential) equality, out-of-the-box pattern matching, easy mutation via `copy`, etc.
+
+As a result, your ADT is represented as a base sealed type which is inherited by a number of case classes and/or case objects. However, nothing forbids you from using only one of the features, e.g., you can use case classes just for their convenient immutable data representation, but not within an ADT.
+
+
+>Note: this design is an almost one-to-one match to the design of data classes and sealed types in Kotlin.
+
+
+In Scala 3, the ADT “recipe” got simplified to a separate language feature called [enumerations](https://docs.scala-lang.org/scala3/book/types-adts-gadts.html). To quote the [original design proposal](https://github.com/lampepfl/dotty/issues/1970), “`enum` class [...] is essentially a `sealed` class whose instances are given by *cases* defined in its companion object”, i.e., a syntactic sugar for the ADT declaration “boilerplate” of Scala 2. The addition of enumerations does not prevent oneself from continuing to use sealed types and case classes/objects if needed, but it offers a more convenient way to declare ADTs.
+
+## Swift / Rust
+
+[Swift](https://docs.swift.org/swift-book/LanguageGuide/Enumerations.html) and [Rust](https://doc.rust-lang.org/reference/items/enumerations.html) use the same approach as Scala 3 and support ADTs via enumerations. These provide some number of convenience features (like pattern matching), but other features (like simple copying or automatic conversion to string) in these languages are implemented independently of enumerations, as their own standalone language features, and are added to enumerations in case you need them. For example, if you want to have structural equality generated for your enumerations, you can `#[derive(PartialEq)] enum Foo` (for Rust) or `enum Foo : Equatable` (for Swift).
+
+## Java
+
+Java has a long history of feature evolution including features related to ADTs. If we are talking about Java 17, the current Long-Term-Support release, then the ADT support is similar to Scala 2 and Kotlin. First, one can use [sealed classes and interfaces](https://openjdk.org/jeps/409) to create a closed type hierarchy. Second, [records](https://openjdk.org/jeps/395) allow to declare ADT variants in a compact fashion, while also ensuring structural equality and convenient string representation with the generated `equals`/`hashCode/toString`.
+
+
+>Note: a Java record type with zero declared components (`record EmptyMessage() {}`) works very similarly to a data object type, but it does not define the associated singleton type instance, i.e., to create such instances one needs to use `new EmptyMessage()`.
+
+
+For convenience features which are not (yet) supported by Java, e.g., easy mutation of records via copying or “withers”, you can use a third-party code-generation tool like [Lombok](https://projectlombok.org/features/With).
+
+## TypeScript
+
+Being a language with a more advanced type system, ADT support in TypeScript looks a little bit different. To represent the ADT sum type one can use [union types](https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#union-types), with union members representing the ADT components. Additionally, you can use [discriminated unions](https://www.typescriptlang.org/docs/handbook/2/narrowing.html#discriminated-unions) to make working with the ADT easier.
+
+TypeScript has powerful runtime introspection abilities, and they allow you to support convenience features via libraries such as [lodash](https://lodash.com/) (e.g., [structural equality](https://lodash.com/docs/4.17.15#isEqual) or [copying](https://lodash.com/docs/4.17.15#clone)), instead of having to implement them via code generation or as a language feature.
+
+# Alternatives
+
+To solve the ADT inconsistency problem between data classes and objects, we could use one of the alternatives.
+
+* Change the way some combination of `equals` / `hashCode` / `toString` work for regular objects, e.g., make the default `toString` implementation return the object name. Such changes would introduce a major breaking change to the language and would lead to “reverse” boilerplate of needing, for example, to override `equals` if your objects require reference identity.
+* Decouple the convenience features from data classes and make them available as separate feature(s) in Kotlin (somewhat similarly to what Swift and Rust have). Such change would require introducing other “prerequisite” features, e.g., a mirror of Rust’s `derive`, once again leading to significant breaking changes to the language.
+* Borrow a page from Java’s design of records and implement immutable value-like unit types as data classes with zero data properties (`data class NoData`). This is a fine solution in and of itself, but it has the same problem (as Java records have) of needing to create an instance explicitly or implement a boilerplate singleton pattern for such data classes. For a language which has a built-in support for such types (in the form of objects), this design seems inefficient. Additionally, this also complicated the migration path for existing ADT hierarchies (which are already using objects), whereas for a `data object` design it is much more streamlined.
+* Implement convenience features via build-time code generation / IDE support / compiler plugin / etc. While in some rare cases these tools are OK to use in Kotlin, e.g., when the added feature has a complex and/or intrusive implementation such as Jetpack Compose, in general, they are “not the Kotlin way” of adding language features.
+
+# IDE support
+
+In scope of this feature, we also propose to add or extend the following IDE inspections.
+
+* Current inspection which suggests to add `data` modifier to a sealed subclass should also do this for objects.
+* New inspection which recommends to add `data` modifier to a serializable object.