This document is manually generated and may be a couple commits behind.
This is not a Glint tutorial. This is documentation of how Glint, as a language, works (mainly for the purpose of LCC developers being able to understand what the Glint frontend is supposed to be doing).
For Glint learning resources, read the Glint book, or check Lens_r’s blog for tutorial posts (he is planning to release a ten-part series there). Of course, there is always YouTube videos, as well.
The Glint programming language is a high level programming language with low-level capabilities. Focused on the programmer, the Glint programming language spearheads ease of use while also letting people who know what they are doing, do it better.
A literal is defined as fixed values embedded in the source code. Also known as self-evaluating expressions.
69 420 ;; regular decimal literals
;; 042 ;; ERROR! Leading zeroes not allowed
0b0 0b01000101 ;; regular binary literals
0o0 0o377 ;; regular octal literals
0x69 0xbeef ;; regular hexadecimal literals
1'004 ;; decimal literal with separator
0b0010'0100 ;; binary literal with separator
0xb0'0b ;; hexadecimal literal with separatorA number literal is of integer type.
A string is a contained sequence of bytes in the source code that is converted into data that is embedded in the final program.
'' 'i am data?' ;; regular string literals
"" "i\'m data?" ;; escapable string literals
A string literal is of array of byte type.
Sometimes, it’s convenient to get a number representation of a byte in the source code (i.e. 0 or a). Mostly for ASCII math stuff, to be honest, but still needed often enough to be necessary.
`6` `9` `A` ;; regular byte literals
;; `` ;; ERROR! Empty byte literal
Byte literals accept basic escaped values like \n for newline, \0 for NUL, and \\ for \.
There are two of them.
true
false
A way to write a literal of more complex types (like structs).
Just a list of expressions separated by soft separators, with the extra ability to name expressions.
;; regular compound literals
!{}; !{420}; !{420, 69};
;; compound literal with named expressions
!{.value 420}; !{.x 420, .y 69};
A declaration is a way for the programmer to name something; usually in a way that restricts what something is, or how it can be used.
In Glint, declarations let the programmer name locations in memory, known as variables, as well as language constructs like types, or templates.
A variable is a location in the program’s memory, at runtime (when it is being executed). You may assign to a variable, setting the value stored at it’s location in memory. A variable is a language construct, but a language construct isn’t necessarily a variable.
foo : int;
bar : int = 69;
baz : int 69;
How large that location in memory is is determined by the variable’s type. A type is a way of separating values into different kinds. Different kinds of values may be treated differently semantically. For example, two number types may be added together. For a number type and a string type, that operation doesn’t quite make sense (or, it’s up to interpretation what it means; either way, it’s unclear). The programmer declares a variable as having a specific type, such that the compiler knows A.) how much memory is at the location this name refers to, and B.) what operations are semantically valid on this location in memory.
Type declarations let the programmer use a name to refer to a (usually) complex type; this way, they don’t have to re-enter the entire type every time they need to use it. (More on type expressions later.)
foo ::: struct { x : int; };
Template declarations let the programmer use a name to refer to a template, allowing easy and efficient re-use. (More on template expressions later.)
dereference :: template(x : expr) @x;
There are two forms of declaration in Glint:
- Typed
name ":" type [ [ "=" ] value ]
- Type-inferred
name "::" value
If you see a typed declaration, you know that referencing name gives you a value of type type.
If you see a type-inferred declaration, you know that referencing name gives you value (or a value of the same type, if assignment occurs).
You may often see :::; this is simply a combination of a type-inferred declaration and the : prefix for an explicit type expression, resulting in a name with a value of a type expression (a named type).
A typed declaration does not require an initializer. Accessing an uninitialized variable either provides a default-initialized value, or zero. A type-inferred declaration requires an initializer.
A module may export a declaration, such that it is made available in any Glint program that imports it.
;; foo_mod.g
module foo_mod;
export foo : int 69;
;; foo_exe.g
import foo_mod;
foo; ;; <- accessed value: 69
In order for type information to be relayed across module borders, the Glint compiler produces metadata that describes the module, all exports, etc. This metadata is either implanted in the built file (in a separate section of an object or assembly file), or emitted as a standalone file with the .gmeta extension. Ensuring the built artifacts or the metadata files of a module are findable within an include directory is required to build a Glint program that imports said module.
If you declare a variable, but do not define an initializing expression for that variable, the compiler will insert default initialization expressions.
Arbitrary Integer, Array, Built-in, Enum, FFI, Struct, Sum, Union, View, and Pointer types are all zero-initialized.
Dynamic arrays have special initialization, involving setting size, capacity, and allocating memory.
Struct members are default initialized after the struct is initialized, with the exception that zero-initialization will not be performed (since the struct zero-initialization will already cover that).
A reference type is not valid for default initialization, as it would require keeping a temporary around for the reference to bind to before it is explicitly bound, and this just feels like it would cause more (if-not as-much) trouble as leaving it uninitialized, so what would be the point?
It’s a vector.
The dynamic array type attempts to make a generic container type that may be used for all or most non-performance applications.
A dynamic array is basically a fancy way to make a struct, and a fancy way to operate on structs created in that way…
[byte] is equivalent to struct { data: byte.ptr; size: uint; capacity: uint; }. You can access these members on any instance of a dynamic array.
In C++, std::vector.
In Rust, Vec.
- Binary
+= - append rhs to dynamic array on lhs (i.e.
foo:[byte]; foo+=`0`;). - Binary
~= - prepend rhs to dynamic array on lhs (i.e.
foo:[byte]; foo~=`0`;). - Binary
[(Subscript) - Rewritten to subscript of data member of dynamic array.
- Trinary
[= - insert rhs at index given at mhs into dynamic array on lhs (i.e.
foo:[byte]; foo[=0, `0`;) NOTE: If index is not within the inclusive range of 0 to size, the program will crash. If it didn’t, iteration may expose values that have not been initialized properly.
Subscript of the dynamic array itself acts as a bounds-checked subscript of the data member of the dynamic array.
out : [byte];
out += 69;
out[0] ;; out.data[0]
@out[0]; ;; 69
Dereference the data pointer, basically.
out : [byte];
out += 69;
@out.data[0]; ;; 69
Glint’s for keyword works on any type with data and size members, which includes dynamic arrays. You can iterate every element of a dynamic array using for.
As you may already know, a dynamic array/vector allocates memory at runtime to store values in. These allocations are implicit in Glint. That is, the act of creating a dynamic array has the side effect of allocating memory.
So, uh, when does it get freed?
The programmer is responsible for freeing the allocated memory using the unary minus operator -. It is an error in a Glint program for a dynamic array to be created and never be freed. This means, for the most part, that Glint programs are statically checked to be memory safe regarding use-after-free errors.
The only time the programmer is not responsible for freeing the allocated memory of a dynamic array is when that dynamic array is automatically inserted by the compiler. In that case, the compiler is also required to insert it’s de-allocation.
Let’s get loopy!
for <identifier> in <container> <body>;
This form of the for loop mimics C++’s ranged for loops. It’s purpose is to easily loop over built-in containers like dynamic arrays and views.
In effect, it binds <identifier> to a reference to each element within <container>.
Emits code in the form of:
NOTE: Should we cache <container> expression, or evaluate it multiple times?
cfor
{
iter :: <container>.data;
end :: <container>.data[<container>.size];
};
iter != end;
iter [= 1;
{
<identifier> :: @iter;
<body>;
};
for <initialiser>; <condition>; <increment>; <body>;
This form of the for loop mimics a C for loop exactly.
Emits code in the form of:
{
<initialiser>;
while <condition> {
<body>;
<increment>;
};
};
It’s a sum type.
You can store multiple variables in the space that a single variable takes up, and keep it type-safe the whole time.
To define a sum type:
sum {
x : int;
y : [byte];
};
In C++, std::variant.
In Rust, enum.
Let’s call this sum type foo.
foo : sum {
x : int;
y : [byte];
};
Now, any instance of this type may have EITHER the x member, OR the y member. Only one member is valid at one time. To check if a given member is valid, use the unary prefix operator has.
foo : sum {
x : int;
y : [byte];
};
bar : foo;
if has bar.x, ...
As you can imagine, this could get quite cumbersome quite quickly; adding a member to the sum type may mean tracking down long if has chains all over the codebase. If you forgot one, it could be catastrophic for your program! This is where match comes in. match lets you pick a different control flow based on the member held in any given instance of a sum type. match requires that all sum type members are handled.
foo : sum {
x : int;
y : [byte];
};
bar : foo;
out : int;
match bar {
.x: out := bar.x;
.y: out := 69;
};
out;
As you can see, sum types allow you to define generic types while still retaining type safety. You could say that sum types allow you to define a variable that is one of a group of other types.
Function overloading is the ability to define multiple functions with the same name. Now, you might be wondering, how is this useful? Well, the functions that share the same name differ in an alternate way: their type signature. The compiler delegates to the different versions of the function based on the types of the passed arguments at the call-site.
foo : bool(a : int) a < 1000;
foo : bool(a : byte) a < 100;
As you can see, the main ability function overloading allows for is executing different functionality for different parameter types.
blah : struct { x : int; };
foo : bool(a : int) a < 10;
foo : bool(a : blah) foo blah.x;
You can also use function overloading to perform type conversions, that way you don’t have to repeatedly type them at each call-site.
In the final emitted code, the name of an overloaded function has to be alterred. This is because object files are not allowed to have duplicate symbols, and because the linker needs to know which overload of the function it needs to call. So, to differentiate the functions in lieu of a type system, we alter the name of the function to include the type signature of the function itself. To do this, we have to textually encode types into an identifier-valid format, and append it to the function name. However, if the user chooses a very weird identifier in their program, that means they may accidentally use the one we generate for overloading. Because of this, we then prepend _XGlint to the name, in order to prevent symbols from clashing.
When all you want to do is see something…
Use print keyword to begin a list of expressions whose results will be printed.
print 42; will print 42 to standard out (or whatever you’ve defined the int formatter to do).
print is sort of like a fancy macro. It just gets converted into other code. Specifically, it is as if each argument is applied to one of the following templates. What template the argument is applied to depends on the type of the argument.
NOTE: Expansions may not match exactly; see sema_templates.g (embedded in Glint’s sema.cc).
For a byte argument:
putchar arg;
For a [byte] (dynamic array of byte), [byte view] (array view of byte), or [byte 4] (fixed array of byte) argument:
cfor
i :: 0;
i < arg.size;
i += 1;
putchar @arg[i];
NOTE: Still unsure if fixed array of byte should call putchar on each element or just call puts…
For a byte.ptr argument:
puts arg;
For a void argument: a void argument to print is an error.
For every other type argument, insert a call to format (defined by user), print the returned dynamic byte array, and then free the dynamic array.
{tmp :: format(arg); print tmp; -tmp;};
Let’s say you want to call a variable module. Er, that’s sort of tough, because, when you write module in the source code, it thinks it’s a ModuleExpr or something and there are errors when trying to compile.
To fix this, you can write a backslash before the token you would like to treat as an identifier. So, by writing \module in the source code, you could feasibly call a variable module.
You could also use this to call a variable any other keyword in the language, like \if for if, or \return for return. This means that, ideally, the language itself should never get in the way of the programmer.
In Glint, templates look a lot like functions, BUT THEY ARE NOT FUNCTIONS. If you only remember one thing from this file, let it be that warning. Please heed; so on–so forth.
The simplest valid template is the identity template. It effectively just applies a type constraint at compile time to whatever argument you pass to it.
template(x : int) x;
;; ends up as
;; TemplateExpr
;; |-- Body: NameRefExpr
;; `-- Parameters...: VarDecl
Now, the above source code represents a Template Expression. It is an expression that may be invoked to generate an actual, “concrete” expression. That is, the above template does not end up in the final code of the program.
You may also assign a template expression to a name.
my_template :: template(x : int) x;
my_template 69;
;; ends up as
;; IntegerLiteral
It should be known that, since template invocations are expanded at compile-time, the type of a template parameter may be a type itself. That is, a template argument may be a type expression.
foo :: template(x : type) x;
bar : foo int; ;; expands to 'bar : int'
This is useful for Glint module authors to export templates instead of concrete types.
export vector :: template(elem_t : type)
struct { data:elem_t.ptr; size:uint; capacity:uint; };
foo : vector int;
In order to actually use a template to “stamp out” code, we must invoke it (by calling it). The arguments we pass to it will be checked against the parameters declared within it.
(template(x : int) x) 69;
;; ends up as
;; IntegerLiteral
As you can see, invoking a template expression removes the template expression from the program, and leaves just the body of the template with template parameters replaced with their argument counterparts.
First, let’s look at the first couple “steps” of compilation of a Glint program.
The Glint source code is read, and separated into logical units known as tokens. The tokens are used by the parser (or, more tersely, the syntactic analyser) to form a tree structure that represents the meaning of the Glint program, or what it is meant to be doing.
SOURCE CODE
|
V
LEXICAL ANALYSIS
|
V
SYNTACTIC ANALYSIS
|
V
SEMANTIC ANALYSYS
|
V
...
Okay, cool, why did we have to learn all that just to learn about lexer macros? Well, lexer macros are a way to “reach into” the Glint compiler from the source code.
SOURCE CODE<-----.
| V
| LEXER MACROS
V ^
LEXICAL ANALYSIS-°
|
V
...
And, truthfully, once a macro has been lexed, it’s application (or expansion) is more like this (where the lexer is operating on itself).
SOURCE CODE
|
| ,-----LEXER MACROS
V V ^
LEXICAL ANALYSIS-°
|
V
...
So, why would we want to reach into the inner workings of the language? Most of the time, to do weird or stupid stuff, or to make life easier (and sometimes both!). Also, why not.
To begin a macro, we use the macro keyword.
To end a macro, we use the endmacro keyword.
The following is lexer macros in their simplest form.
macro <name> emits <output> endmacro
Note that lexer macros do not require expression separators, as expressions have not yet been formed at the time of lexical analysis. There are only tokens. So, it could be said that the macro is “eaten” by the lexer (more accurately, the tokens that make up the macro’s definition).
macro empty_macro emits endmacro
macro simple_macro emits 69 endmacro
;; macro emits endmacro; ;; invalid! no name :(
Writing simple_macro anywhere in the program following it’s definition above will macro-expand into the number literal 69.
A macro parameter is a token that is discarded upon expansion of the macro, but also enforced that it is there.
;; empty macro with '!' macro parameter
macro foo ! emits endmacro;
foo ! ;; expands to nothing
foo ;; ¡ERROR! Ill-formed macro invocation: got '', expected '!'
This doesn’t appear that useful in our little example, but it can be very powerful to enforce a syntax for something that is not supported in the language (i.e. braces wrapped around something means it is dereferenced, or something). It can also be useful when used in conjunction with macro arguments.
A macro may be given named parameters such that they may be duplicated in it’s output.
macro foo $x emits $x $x endmacro;
foo 20 ;; expands to "20 20"
The idea is that, sometimes, you want to be able to take input into your macro to expand into different code based on what the user passes to it, not just a hard-coded sequence of tokens. This does that.
macro foo + $x emits $x endmacro;
foo + 20 ;; expands to "20"
foo 20 ;; ¡ERROR! Ill-formed macro invocation: got '20', expected '+'
Macro arguments may be given a single selector following the name identifier.
$<name><selector>
:token- Captures a token. (default)
:expr- Captures a parsed expression rather than a lexed token.
:expr_once- Captures a parsed expression rather than a lexed token, and ensures that the expression is only ever evaluated once, no matter how many times it appears in the macro’s output during expansion.
This becomes very powerful, as macros may operate on parsed expressions rather than lexed tokens. This reaches another layer further into the inner workings of the language, interacting with syntactic analysis.
macro <name> defines <identifiers> emits <output> endmacro
defines allows the macro author to declare that the macro defines a variable. The compiler will give (or generate) that variable a unique name (or symbol) upon each invocation of the macro, such that weird shadowing errors do not occur. For example, if the macro user defines a variable named the same thing that the macro author uses, then the macro expansion would cause a redefinition error. Since nobody wants programs with errors, Glint provides the defines list so that any use of that defined identifier in the macro expansion will be given a unique name within that expansion.
The TL;DR is that defines allows you to create a definitely-unused name within a macro’s output to avoid redefinition errors, and things like that.
macro foo defines x emits x endmacro
foo
This would emit an error: something like Unknown symbol '__L0'. The compiler generates a unique name, __L0 in this case, to replace x with for each invocation. If we called foo again, we’d probably get __L1 for that invocation, and so on and so forth.
apply <function>, <arguments>, read as “map the function <function> to arguments <arguments>”, applies the given function to each expression in the list of expressions given. If multiple lists are given, multiple arguments are passed to the function. The arity of the function should match the amount of lists given to apply.
apply foo, ("bar", "baz", 69);
;; EQUIVALENT
foo "bar";
foo "baz";
foo 69;
apply foo, ("bar", "baz", 69), ("goo", `0`, 0x420);
;; EQUIVALENT
foo "bar", "goo";
foo "baz", `0`;
foo 69, 0x420;
This allows you to write simpler functions and still use them as if they take variadic arguments.
If you are new to Glint, keep in mind that subscript doesn’t dereference. x[0] is of pointer type, to get the value at that pointer you need to use @x[0]. The only way to achieve pointer arithmetic in Glint is through the subscript operator.
The subscript operator is also the only way to get a pointer from an array type.
On grouping expressions: A group containing one expression should be represented in a parenthesized expression (using ( and )). A group containing multiple expressions should be represented in a block expression (using { and }). A block expression also opens a new scope, whereas a parenthesized expression does not. The idea is, you won’t need a new scope if you only have a single expression, and a parenthesized expression should only ever have a single expression within it. Do note, however, that multiple expressions are allowed within a parenthesized expression, it’s just an exception-proves-the-rule sort of deal where you have multiple expressions that you need to treat as a single expression.
Congrats if you made it this far, you get a gold star.