Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
392 changes: 392 additions & 0 deletions C4/c4_diagrams.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,392 @@
# CPython C4 Model Architecture Diagrams

This document contains comprehensive C4 model diagrams for the CPython codebase, from high-level system context down to detailed code structure.

## Level 1: System Context Diagram

```mermaid
graph TB
subgraph "External Users"
DEV[Python Developers]
SYS[System Administrators]
APP[Application Users]
end

subgraph "External Systems"
OS[Operating System<br/>Windows/macOS/Linux]
FS[File System]
NET[Network Services]
LIB[Third-party Libraries]
EXT[C Extensions]
end

subgraph "CPython System"
CPY[CPython Interpreter<br/>Python 3.15]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't future proof.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted

end

DEV -->|"Writes Python Code"| CPY
SYS -->|"Configures & Deploys"| CPY
APP -->|"Runs Applications"| CPY

CPY -->|"System Calls"| OS
CPY -->|"File I/O"| FS
CPY -->|"Network Operations"| NET
CPY -->|"Imports Modules"| LIB
CPY -->|"Loads Extensions"| EXT

OS -->|"Process Management"| CPY
FS -->|"File Access"| CPY
NET -->|"Network Data"| CPY
LIB -->|"Standard Library"| CPY
EXT -->|"Native Code"| CPY
Comment on lines +31 to +41
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see what this adds, this is just general information?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C4 model proposes starting from the top and working its way down to specific implementations Context, Component, Container, Code.

```

## Level 2: Container Diagram

```mermaid
graph TB
subgraph "CPython Runtime System"
subgraph "Core Interpreter"
PARSER[Parser<br/>PEG Grammar<br/>AST Generation]
COMPILER[Compiler<br/>AST to Bytecode<br/>Optimization]
VM[Virtual Machine<br/>Bytecode Execution<br/>Tier 1 & Tier 2]
end

subgraph "Runtime Services"
GC[Garbage Collector<br/>Reference Counting<br/>Cycle Detection]
MEM[Memory Manager<br/>Object Allocation<br/>Free Lists]
THREAD[Threading System<br/>GIL Management<br/>Thread States]
end

subgraph "Object System"
OBJ[Object Model<br/>Type System<br/>Method Resolution]
BUILTIN[Built-in Types<br/>int, str, list, dict<br/>etc.]
end

subgraph "Module System"
IMPORT[Import System<br/>Module Loading<br/>Path Resolution]
STDLIB[Standard Library<br/>Built-in Modules<br/>Extension Modules]
end

subgraph "C API"
CAPI[C API<br/>Extension Interface<br/>Embedding Support]
end
end

subgraph "External Dependencies"
OS2[Operating System]
LIBS[System Libraries]
end

PARSER -->|"AST"| COMPILER
COMPILER -->|"Bytecode"| VM
VM -->|"Object Operations"| OBJ
VM -->|"Memory Requests"| MEM
VM -->|"GC Triggers"| GC
VM -->|"Thread Management"| THREAD

OBJ -->|"Type Operations"| BUILTIN
IMPORT -->|"Module Loading"| STDLIB
IMPORT -->|"Extension Loading"| CAPI

MEM -->|"System Calls"| OS2
THREAD -->|"Thread APIs"| OS2
STDLIB -->|"System Libraries"| LIBS
CAPI -->|"Extension Interface"| LIBS
```

## Level 3: Component Diagrams

### Parser Container Components

```mermaid
graph TB
subgraph "Parser Container"
TOKENIZER[Tokenizer<br/>Lexical Analysis<br/>Token Generation]
PEG[PEG Parser<br/>Grammar Rules<br/>Syntax Analysis]
AST[AST Builder<br/>Abstract Syntax Tree<br/>Validation]
ERROR[Error Handler<br/>Syntax Errors<br/>Diagnostics]
end

TOKENIZER -->|"Tokens"| PEG
PEG -->|"Parse Tree"| AST
PEG -->|"Error Info"| ERROR
AST -->|"Validated AST"| COMPILER
```

### Virtual Machine Container Components

```mermaid
graph TB
subgraph "Virtual Machine Container"
subgraph "Tier 1 Interpreter"
EVAL[Bytecode Evaluator<br/>Main Execution Loop<br/>Opcode Dispatch]
FRAME[Frame Management<br/>Call Stack<br/>Local Variables]
STACK[Evaluation Stack<br/>Value Storage<br/>Stack Operations]
end

subgraph "Tier 2 Interpreter"
UOP[Micro-op Interpreter<br/>Optimized Execution<br/>Superblocks]
OPT[Optimizer<br/>Bytecode Analysis<br/>Hot Path Detection]
end

subgraph "Execution Support"
BREAK[Eval Breaker<br/>Signal Handling<br/>Interruption]
TRACE[Tracing System<br/>Profiling<br/>Debugging]
end
end

EVAL -->|"Frame Operations"| FRAME
EVAL -->|"Stack Operations"| STACK
EVAL -->|"Hot Code Detection"| OPT
OPT -->|"Optimized Code"| UOP
EVAL -->|"Interruption Checks"| BREAK
EVAL -->|"Trace Events"| TRACE
```

### Object System Container Components

```mermaid
graph TB
subgraph "Object System Container"
subgraph "Core Object Model"
OBJHDR[Object Header<br/>Reference Count<br/>Type Pointer]
TYPE[Type System<br/>Metaclass<br/>Method Resolution]
DESC[Descriptor Protocol<br/>Property Access<br/>Method Binding]
end

subgraph "Built-in Types"
NUMERIC[Numeric Types<br/>int, float, complex]
SEQUENCE[Sequence Types<br/>str, list, tuple]
MAPPING[Mapping Types<br/>dict, set]
CALLABLE[Callable Types<br/>function, method]
end

subgraph "Special Objects"
MODULE[Module Objects<br/>Namespace<br/>Import State]
CLASS[Class Objects<br/>Inheritance<br/>Instance Creation]
EXCEPTION[Exception Objects<br/>Error Handling<br/>Stack Traces]
end
end

OBJHDR -->|"Type Info"| TYPE
TYPE -->|"Method Lookup"| DESC
TYPE -->|"Instance Creation"| NUMERIC
TYPE -->|"Instance Creation"| SEQUENCE
TYPE -->|"Instance Creation"| MAPPING
TYPE -->|"Instance Creation"| CALLABLE
TYPE -->|"Module Creation"| MODULE
TYPE -->|"Class Creation"| CLASS
TYPE -->|"Exception Creation"| EXCEPTION
```

### Memory Management Container Components

```mermaid
graph TB
subgraph "Memory Management Container"
subgraph "Allocation"
ALLOC[Object Allocator<br/>Memory Pools<br/>Arena Management]
FREELIST[Free Lists<br/>Object Reuse<br/>Size Classes]
ARENA[Arena Manager<br/>Memory Blocks<br/>Fragmentation Control]
end

subgraph "Garbage Collection"
REFCOUNT[Reference Counting<br/>Immediate Deallocation<br/>Cycle Detection]
GENERATIONAL[Generational GC<br/>Young/Old Generations<br/>Collection Cycles]
WEAKREF[Weak References<br/>Non-owning References<br/>Callback System]
end

subgraph "Memory Tracking"
TRACEMALLOC[Tracemalloc<br/>Memory Profiling<br/>Allocation Tracking]
DEBUG[Debug Allocator<br/>Memory Validation<br/>Leak Detection]
end
end

ALLOC -->|"Memory Requests"| ARENA
ALLOC -->|"Object Reuse"| FREELIST
REFCOUNT -->|"Cycle Detection"| GENERATIONAL
GENERATIONAL -->|"Weak References"| WEAKREF
ALLOC -->|"Allocation Events"| TRACEMALLOC
ALLOC -->|"Validation"| DEBUG
```

## Level 4: Code Diagrams

### Bytecode Evaluator Component Code Structure

```mermaid
graph TB
subgraph "Bytecode Evaluator Code"
subgraph "Core Files"
CEVAL["ceval.c<br/>Main evaluation loop<br/>Opcode dispatch"]
BYTECODES["bytecodes.c<br/>Opcode definitions<br/>Instruction semantics"]
MACROS["ceval_macros.h<br/>Evaluation macros<br/>Stack operations"]
end

subgraph "Optimization"
SPECIALIZE["specialize.c<br/>Adaptive specialization<br/>Type specialization"]
OPTIMIZER["optimizer.c<br/>Bytecode optimization<br/>Hot path analysis"]
CASES["generated_cases.c.h<br/>Generated opcode cases<br/>Fast paths"]
end

subgraph "Tier 2"
TIER2["tier2_engine.md<br/>Micro-op interpreter<br/>Superblock execution"]
UOP["optimizer_bytecodes.c<br/>Micro-op definitions<br/>Tier 2 IR"]
end
end

CEVAL -->|"Opcode Definitions"| BYTECODES
CEVAL -->|"Macro Usage"| MACROS
CEVAL -->|"Specialization"| SPECIALIZE
SPECIALIZE -->|"Optimized Code"| OPTIMIZER
OPTIMIZER -->|"Generated Cases"| CASES
OPTIMIZER -->|"Tier 2 Code"| TIER2
TIER2 -->|"Micro-ops"| UOP
```

### Parser Component Code Structure

```mermaid
graph TB
subgraph "Parser Component Code"
subgraph "Core Parser"
PEGEN["pegen.c<br/>PEG parser implementation<br/>Grammar execution"]
PEGAPI["peg_api.c<br/>Parser API<br/>AST generation"]
GRAMMAR["python.gram<br/>Grammar definition<br/>PEG rules"]
end

subgraph "Tokenization"
TOKEN["token.c<br/>Token definitions<br/>Token types"]
LEXER["lexer/<br/>Lexical analysis<br/>Token generation"]
end

subgraph "AST"
ASDL["Python.asdl<br/>AST definition<br/>Node types"]
ASTC["asdl_c.py<br/>AST code generation<br/>C structures"]
end
end

PEGEN -->|"Grammar Rules"| GRAMMAR
PEGAPI -->|"Parser Calls"| PEGEN
PEGEN -->|"Token Stream"| TOKEN
TOKEN -->|"Lexical Analysis"| LEXER
PEGEN -->|"AST Nodes"| ASDL
ASDL -->|"Code Generation"| ASTC
```

### Object System Component Code Structure

```mermaid
graph TB
subgraph "Object System Component Code"
subgraph "Core Objects"
OBJECT["object.c<br/>Base object implementation<br/>Reference counting"]
TYPEOBJ["typeobject.c<br/>Type system<br/>Metaclass implementation"]
DESCR["descrobject.c<br/>Descriptor protocol<br/>Property access"]
end

subgraph "Built-in Types"
LONG["longobject.c<br/>Arbitrary precision integers"]
UNICODE["unicodeobject.c<br/>String implementation<br/>UTF-8/UTF-16"]
DICT["dictobject.c<br/>Dictionary implementation<br/>Hash tables"]
LIST["listobject.c<br/>Dynamic arrays<br/>List operations"]
end

subgraph "Function Objects"
FUNC["funcobject.c<br/>Function objects<br/>Closure support"]
METHOD["methodobject.c<br/>Method objects<br/>Bound methods"]
CLASS["classobject.c<br/>Class objects<br/>Inheritance"]
end
end

OBJECT -->|"Base Type"| TYPEOBJ
TYPEOBJ -->|"Descriptor Access"| DESCR
TYPEOBJ -->|"Type Creation"| LONG
TYPEOBJ -->|"Type Creation"| UNICODE
TYPEOBJ -->|"Type Creation"| DICT
TYPEOBJ -->|"Type Creation"| LIST
TYPEOBJ -->|"Callable Types"| FUNC
FUNC -->|"Method Binding"| METHOD
TYPEOBJ -->|"Class Creation"| CLASS
```

### Memory Management Component Code Structure

```mermaid
graph TB
subgraph "Memory Management Component Code"
subgraph "Allocation"
PYMEM["pymem.c<br/>Memory allocator<br/>Arena management"]
OBJIMPL["objimpl.h<br/>Object allocation<br/>Type-specific allocators"]
ARENA["pyarena.c<br/>Arena allocator<br/>Block management"]
end

subgraph "Garbage Collection"
GC["gc.c<br/>Reference counting<br/>Cycle detection"]
GCTHREAD["gc_free_threading.c<br/>Free-threaded GC<br/>Concurrent collection"]
WEAKREF["weakrefobject.c<br/>Weak references<br/>Callback system"]
end

subgraph "Memory Tracking"
TRACE["tracemalloc.c<br/>Memory profiling<br/>Allocation tracking"]
DEBUG["pydebug.h<br/>Debug macros<br/>Memory validation"]
end
end

PYMEM -->|"Object Allocation"| OBJIMPL
PYMEM -->|"Arena Management"| ARENA
OBJIMPL -->|"GC Integration"| GC
GC -->|"Concurrent GC"| GCTHREAD
GC -->|"Weak References"| WEAKREF
PYMEM -->|"Allocation Tracking"| TRACE
PYMEM -->|"Debug Validation"| DEBUG
```

## Key Architectural Patterns

### 1. Layered Architecture
- **Parser Layer**: Converts source code to AST
- **Compiler Layer**: Transforms AST to bytecode
- **VM Layer**: Executes bytecode
- **Object Layer**: Manages Python objects
- **Memory Layer**: Handles allocation and garbage collection

### 2. Interpreter Pattern
- **Tier 1**: Traditional bytecode interpreter with adaptive specialization
- **Tier 2**: Micro-op interpreter for hot code paths
- **JIT**: Future machine code generation for performance-critical code

### 3. Object-Oriented Design
- **Everything is an Object**: All Python values are objects
- **Type System**: Dynamic typing with runtime type checking
- **Method Resolution**: Dynamic method lookup and binding

### 4. Memory Management
- **Reference Counting**: Immediate deallocation for most objects
- **Generational GC**: Cycle detection for complex object graphs
- **Arena Allocation**: Efficient memory management for small objects

### 5. Extension System
- **C API**: Rich interface for C extensions
- **Module System**: Dynamic loading of Python and C modules
- **Import System**: Flexible module discovery and loading

## Performance Optimizations

### 1. Adaptive Specialization
- **Type Specialization**: Optimized code paths for common types
- **Inline Caching**: Fast method and attribute access
- **Superinstructions**: Combined bytecode operations

### 2. Memory Optimizations
- **Free Lists**: Object reuse to reduce allocation overhead
- **Arena Allocation**: Reduced fragmentation and improved locality
- **Copy-on-Write**: Efficient string and tuple operations

### 3. Execution Optimizations
- **Computed Gotos**: Fast opcode dispatch
- **Stack Caching**: Reduced memory access for local variables
- **Tier 2 Interpreter**: Optimized execution for hot code paths

This comprehensive C4 model provides a complete view of the CPython architecture, from high-level system interactions down to detailed code structure, enabling effective contribution to the CPython codebase.
Loading
Loading