Skip to content

On the syntax of data types #296

@tertsdiepraam

Description

@tertsdiepraam

Note: in this issue, I'm using the functional programming terms product and sum types, which roughly correspond to struct and enum in Rust, respectively. I'm using those terms because they don't imply any syntax or choice of keywords.

Roto currently only allows for product types using the type keyword. Sum types are supported in the language, but you can't define them until #293 lands. That PR uses (something resembling) the enum syntax of Rust and that's why there is a companion PR to rename type to struct.

The issue is that this uses unfamiliar terminology for many people who are not accustomed to Rust or C. So, in this issue I take a step back to see if we can find a syntax that is more intuitive.

Some things to note up front:

  • Keyword choice is mostly important for familiarity and how scary Roto looks at first glance, it's not of any importance for anybody already committed to learning Roto.
  • Also keyword choice becomes less important with good error messages.
  • Punctuation is at least as important as keyword choice.
  • Using more keywords requires people to learn more keywords, but also helps distinguishing concepts.
  • We want to support algebraic data types (that is both product and sum types), since we want proper pattern matching.

Functional languages (OCaml, F#)

Many functional languages are built around ADTs and therefore have a succinct syntax for them. So let's start there.

type foo = Bar | Baz of string;;
type foo = { age: i32; };;

Translating to Roto would look like:

type Foo = Bar | Baz(string);
type Foo = { age: i32 };

It's nice that both constructs use the same keyword here. However, it does now also require some more punctuation with the = and ;.

An important thing to note though is that the constructors in many functional languages of these types are not scoped. That means you can write this:

type Foo = Bar;
let foo = Bar; # instead of Foo.Bar

which is convenient sometimes, but you also cannot do this:

type Foo = Bar;
type Boo = Bar; # would be an error!

Coming from Rust, I don't like that property, but maybe I'm wrong!

Koka

Ref: https://koka-lang.github.io/koka/doc/book.html#sec-data-types

Koka is an interesting spin on FP languages, because they don't have the record types from OCaml:

type Foo
    Bar
    Baz(i32)

But they have struct as some syntax sugar:

struct tp { <fields> }

// desugars to:

type tp {
  Tp { <fields> }
}

That's interesting! I maybe would use the record keyword there instead. Note that this relies on the constructors not being scoped under the type, so we can write:

struct Person { age: i32 }

Person(age: 42)

If the constructors were scoped under the type, it would have to be Person.Person. This also relies on having multiple namespaces: one for type and one for constructors. The name Person in the example above refers to both a type and a constructor.

Rust-style

enum Foo {
    Bar,
    Baz(i32),
}

struct Person {
    age: i32,
}

In Rust, we have to write Foo.Bar, instead of just Bar. There is also no way to desugar a struct to an enum.

We could also adopt this exact syntax with some other choice of keywords:

  • struct: record, compound, object, class, data
  • enum: choice, case, cases, variant, union

Somewhere between Rust and Koka

Say we want Koka's desugaring, but Rust's scoping of constructors. We could introduce a "default" constructor:

type Foo {
    default { bar: i32 },
    Bar(i32),
}

# can now be constructed as:

Foo { bar: 10 }
Foo.Baz(10)

And then we desugar record as follows:

record Foo { bar: i32 }

type Foo = { default { bar: i32 } }

That ain't too bad? The obvious downside is that it introduces an entirely new construct.

Type expressions and naming them

So far, we have mostly considered a purely nominal typing approach: identical types with different names are not considered interchangeable. But we can go an entirely different route.

Imagine that we can write the following things anywhere we can write a type:

{ x: i32, y: i32 }
i32 | u32

The first is simply a product type, the second is a union of 2 types.

For example:

fn add_one(x: i32 | f32) -> i32 | f32 {
    match x {
        x is i32 -> x + 1
        x is f32 -> x + 1
    }
}

Note that this is not a generic function. The Rust version of this would look like this:

enum IntOrFloat {
    I32(i32),
    F32(f32),
}

fn add_one(x: IntOrFloat) -> IntOrFloat {
    match x {
        IntOrFloat::I32(x) -> IntOrFloat::I32(x + 1),
        IntOrFloat::F32(x) -> IntOrFloat::F32(x + 1),
    }
}

Now that we have those type level operators, we can extend the union operator with named tags:

A(i32) | B(f32)

And then we can finally introduce a type construct to name these types:

type Foo = { age: i32 };
type Foo = i32 | u32;
type Foo = A(i32) | B(i32);

This might be overly complex both in concepts and in implementation, but I wanted to write it out anyway. The result does feel kind of script-y in some ways, since it sometimes requires fewer type definitions.

@jdonszelmann's suggestion

record Foo {
    a: int,
    b: string,
    c: vec<u32>
}

record Maybe[T] {
    variant Some {
        value: T
    },
    variant None,
}

There's a lot to like about that suggestion! I'm especially partial to the record and variant names. Those seem very intuitive!

Conclusions

The more I think about this, the more convinced I become that a clear distinction between concepts is really important. If we have clearly different syntax for product and sum types, then each becomes easier to understand. Using the same keyword type multiple times is difficult because it becomes harder to document and explain. We also introduce more syntax.

I like record and variant as keywords. They seem relatively intuitive and we can adopt the terminology "record type" and "variant type".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions