On the syntax of data types

**Note**: in this issue, I'm using the functional programming terms product and sum types, which roughly correspond to `struct` and `enum` in Rust, respectively. I'm using those terms because they don't imply any syntax or choice of keywords.

Roto currently only allows for product types using the `type` keyword. Sum types are supported in the language, but you can't define them until https://github.com/NLnetLabs/roto/pull/293 lands. That PR uses (something resembling) the `enum` syntax of Rust and that's why there is a companion PR to rename `type` to `struct`.

The issue is that this uses unfamiliar terminology for many people who are not accustomed to Rust or C. So, in this issue I take a step back to see if we can find a syntax that is more intuitive.

Some things to note up front:
- Keyword choice is mostly important for familiarity and how scary Roto looks at first glance, it's not of any importance for anybody already committed to learning Roto.
- Also keyword choice becomes less important with good error messages.
- Punctuation is at least as important as keyword choice.
- Using more keywords requires people to learn more keywords, but also helps distinguishing concepts.
- We want to support algebraic data types (that is both product and sum types), since we want proper pattern matching.

## Functional languages (OCaml, F#)

Many functional languages are built around ADTs and therefore have a succinct syntax for them. So let's start there.

```ocaml
type foo = Bar | Baz of string;;
type foo = { age: i32; };;
```

Translating to Roto would look like:

```roto
type Foo = Bar | Baz(string);
type Foo = { age: i32 };
```

It's nice that both constructs use the same keyword here. However, it does now also require some more punctuation with the `=` and `;`.

An important thing to note though is that the constructors in many functional languages of these types are not scoped. That means you can write this:

```
type Foo = Bar;
let foo = Bar; # instead of Foo.Bar
```

which is convenient sometimes, but you also cannot do this:

```
type Foo = Bar;
type Boo = Bar; # would be an error!
```

Coming from Rust, I don't like that property, but maybe I'm wrong!

## Koka

Ref: https://koka-lang.github.io/koka/doc/book.html#sec-data-types

Koka is an interesting spin on FP languages, because they don't have the record types from OCaml:

```koka
type Foo
    Bar
    Baz(i32)
```

But they have `struct` as some syntax sugar:

```koka
struct tp { <fields> }

// desugars to:

type tp {
  Tp { <fields> }
}
``` 

That's interesting! I maybe would use the `record` keyword there instead. Note that this relies on the constructors not being scoped under the type, so we can write:
```
struct Person { age: i32 }

Person(age: 42)
```
If the constructors were scoped under the type, it would have to be `Person.Person`. This also relies on having multiple namespaces: one for type and one for constructors. The name `Person` in the example above refers to both a type and a constructor.

## Rust-style

```rust
enum Foo {
    Bar,
    Baz(i32),
}

struct Person {
    age: i32,
}
```

In Rust, we have to write `Foo.Bar`, instead of just `Bar`. There is also no way to desugar a `struct` to an `enum`.

We could also adopt this exact syntax with some other choice of keywords:
- `struct`: `record`, `compound`, `object`, `class`, `data`
- `enum`: `choice`, `case`, `cases`, `variant`, `union`

#### Somewhere between Rust and Koka

Say we want Koka's desugaring, but Rust's scoping of constructors. We could introduce a "default" constructor:

```roto
type Foo {
    default { bar: i32 },
    Bar(i32),
}

# can now be constructed as:

Foo { bar: 10 }
Foo.Baz(10)
```

And then we desugar `record` as follows:

```roto
record Foo { bar: i32 }

type Foo = { default { bar: i32 } }
```

That ain't too bad? The obvious downside is that it introduces an entirely new construct.

## Type expressions and naming them

So far, we have mostly considered a purely nominal typing approach: identical types with different names are not considered interchangeable. But we can go an entirely different route.

Imagine that we can write the following things anywhere we can write a type:
```
{ x: i32, y: i32 }
i32 | u32
```
The first is simply a product type, the second is a union of 2 types.

For example:
```rust
fn add_one(x: i32 | f32) -> i32 | f32 {
    match x {
        x is i32 -> x + 1
        x is f32 -> x + 1
    }
}
```

Note that this is not a generic function. The Rust version of this would look like this:

```rust
enum IntOrFloat {
    I32(i32),
    F32(f32),
}

fn add_one(x: IntOrFloat) -> IntOrFloat {
    match x {
        IntOrFloat::I32(x) -> IntOrFloat::I32(x + 1),
        IntOrFloat::F32(x) -> IntOrFloat::F32(x + 1),
    }
}
```

Now that we have those type level operators, we can extend the union operator with named tags:
```
A(i32) | B(f32)
```
And then we can finally introduce a `type` construct to name these types:
```
type Foo = { age: i32 };
type Foo = i32 | u32;
type Foo = A(i32) | B(i32);
```

This might be overly complex both in concepts and in implementation, but I wanted to write it out anyway. The result does feel kind of script-y in some ways, since it sometimes requires fewer type definitions.

## @jdonszelmann's suggestion

```
record Foo {
    a: int,
    b: string,
    c: vec<u32>
}

record Maybe[T] {
    variant Some {
        value: T
    },
    variant None,
}
```

There's a lot to like about that suggestion! I'm especially partial to the `record` and `variant` names. Those seem very intuitive!

## Conclusions

The more I think about this, the more convinced I become that a clear distinction between concepts is really important. If we have clearly different syntax for product and sum types, then each becomes easier to understand. Using the same keyword `type` multiple times is difficult because it becomes harder to document and explain. We also introduce more syntax.

I like `record` and `variant` as keywords. They seem relatively intuitive and we can adopt the terminology "record type" and "variant type".



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On the syntax of data types #296

Functional languages (OCaml, F#)

Koka

Rust-style

Somewhere between Rust and Koka

Type expressions and naming them

@jdonszelmann's suggestion

Conclusions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

On the syntax of data types #296

Description

Functional languages (OCaml, F#)

Koka

Rust-style

Somewhere between Rust and Koka

Type expressions and naming them

@jdonszelmann's suggestion

Conclusions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions