A minimal C compiler that targets LLVM IR, built as a learning project.
- GCC
- LLVM 14 (
llvm-14,llvm-14-dev)
On Ubuntu/Debian:
sudo apt install gcc llvm-14 llvm-14-devFrom the project root:
make # build → build/cc
make -j$(nproc) # parallel build (faster)
make clean # remove build/The compiler binary lives at build/cc.
The compiler takes a .c file and produces a .o object file.
Then you link it with your system compiler to get an executable.
# compile
./build/cc input.c output.o
# link
cc output.o -o output
# run
./output
echo $? # check exit codeOr use the run script to compile, link, and run in one shot:
./run input.cWorks from any directory. Auto-builds the compiler if needed.
make testRuns all tests in tests/phase*/. Each test has:
<name>.c— source file<name>.expected— expected stdout (empty if no output)<name>.exitcode— expected exit code
src/
main.c — CLI: read file → parse → codegen
lexer.h/c — Tokenizer (hand-written)
parser.h/c — Recursive-descent parser → AST
ast.h/c — AST node types (tagged union)
codegen.h/c — AST → LLVM IR → object file
tests/
run_tests.sh — Test harness
phase1/ — Integer literals + return
phase2/ — Expressions + operators
phase3/ — Variables + statements
phase4/ — Control flow (if/else, while, for)
phase5/ — Functions, calls, recursion
build/
cc — Compiler binary (after make)
*.o — Object files
source.c → Lexer → Tokens → Parser → AST → Codegen → LLVM IR → .o → cc (linker) → executable
~~~~~~ ~~~~~~~~ ~~~~~~~~
lexer.c parser.c codegen.c
- Lexer turns source text into tokens
- Parser turns tokens into an AST (abstract syntax tree)
- Codegen walks the AST, calls LLVM-C API to build IR, emits a
.ofile - System
cclinks the.owith C runtime startup code to produce an executable
inttype (32-bit signed)- Integer literals, arithmetic (
+,-,*,/,%) - Comparisons (
<,>,<=,>=,==,!=) - Logical operators (
&&,||,!), bitwise not (~) - Local variables with assignment and block scoping
if/else(including chained else-if)whileloopsforloops (with optional init/cond/post, includingintdeclarations in init)- Nested blocks and nested control flow
- Multiple functions with
intparameters andintreturn - Function calls, nested calls, recursion
- Forward references (call a function defined later)
- Forward declarations (
int foo(int x);)
- Preprocessor (
#include,#define) - Pointers, arrays, strings
- Global variables
- Multiple types (
char,float,void, structs, enums, unions) switch,do-while,break,continue,goto- Type casts,
sizeof