|
| 1 | +# CPython C4 Model Architecture Diagrams |
| 2 | + |
| 3 | +This document contains comprehensive C4 model diagrams for the CPython codebase, from high-level system context down to detailed code structure. |
| 4 | + |
| 5 | +## Level 1: System Context Diagram |
| 6 | + |
| 7 | +```mermaid |
| 8 | +graph TB |
| 9 | + subgraph "External Users" |
| 10 | + DEV[Python Developers] |
| 11 | + SYS[System Administrators] |
| 12 | + APP[Application Users] |
| 13 | + end |
| 14 | + |
| 15 | + subgraph "External Systems" |
| 16 | + OS[Operating System<br/>Windows/macOS/Linux] |
| 17 | + FS[File System] |
| 18 | + NET[Network Services] |
| 19 | + LIB[Third-party Libraries] |
| 20 | + EXT[C Extensions] |
| 21 | + end |
| 22 | + |
| 23 | + subgraph "CPython System" |
| 24 | + CPY[CPython Interpreter<br/>Python 3.15] |
| 25 | + end |
| 26 | + |
| 27 | + DEV -->|"Writes Python Code"| CPY |
| 28 | + SYS -->|"Configures & Deploys"| CPY |
| 29 | + APP -->|"Runs Applications"| CPY |
| 30 | + |
| 31 | + CPY -->|"System Calls"| OS |
| 32 | + CPY -->|"File I/O"| FS |
| 33 | + CPY -->|"Network Operations"| NET |
| 34 | + CPY -->|"Imports Modules"| LIB |
| 35 | + CPY -->|"Loads Extensions"| EXT |
| 36 | + |
| 37 | + OS -->|"Process Management"| CPY |
| 38 | + FS -->|"File Access"| CPY |
| 39 | + NET -->|"Network Data"| CPY |
| 40 | + LIB -->|"Standard Library"| CPY |
| 41 | + EXT -->|"Native Code"| CPY |
| 42 | +``` |
| 43 | + |
| 44 | +## Level 2: Container Diagram |
| 45 | + |
| 46 | +```mermaid |
| 47 | +graph TB |
| 48 | + subgraph "CPython Runtime System" |
| 49 | + subgraph "Core Interpreter" |
| 50 | + PARSER[Parser<br/>PEG Grammar<br/>AST Generation] |
| 51 | + COMPILER[Compiler<br/>AST to Bytecode<br/>Optimization] |
| 52 | + VM[Virtual Machine<br/>Bytecode Execution<br/>Tier 1 & Tier 2] |
| 53 | + end |
| 54 | + |
| 55 | + subgraph "Runtime Services" |
| 56 | + GC[Garbage Collector<br/>Reference Counting<br/>Cycle Detection] |
| 57 | + MEM[Memory Manager<br/>Object Allocation<br/>Free Lists] |
| 58 | + THREAD[Threading System<br/>GIL Management<br/>Thread States] |
| 59 | + end |
| 60 | + |
| 61 | + subgraph "Object System" |
| 62 | + OBJ[Object Model<br/>Type System<br/>Method Resolution] |
| 63 | + BUILTIN[Built-in Types<br/>int, str, list, dict<br/>etc.] |
| 64 | + end |
| 65 | + |
| 66 | + subgraph "Module System" |
| 67 | + IMPORT[Import System<br/>Module Loading<br/>Path Resolution] |
| 68 | + STDLIB[Standard Library<br/>Built-in Modules<br/>Extension Modules] |
| 69 | + end |
| 70 | + |
| 71 | + subgraph "C API" |
| 72 | + CAPI[C API<br/>Extension Interface<br/>Embedding Support] |
| 73 | + end |
| 74 | + end |
| 75 | + |
| 76 | + subgraph "External Dependencies" |
| 77 | + OS2[Operating System] |
| 78 | + LIBS[System Libraries] |
| 79 | + end |
| 80 | + |
| 81 | + PARSER -->|"AST"| COMPILER |
| 82 | + COMPILER -->|"Bytecode"| VM |
| 83 | + VM -->|"Object Operations"| OBJ |
| 84 | + VM -->|"Memory Requests"| MEM |
| 85 | + VM -->|"GC Triggers"| GC |
| 86 | + VM -->|"Thread Management"| THREAD |
| 87 | + |
| 88 | + OBJ -->|"Type Operations"| BUILTIN |
| 89 | + IMPORT -->|"Module Loading"| STDLIB |
| 90 | + IMPORT -->|"Extension Loading"| CAPI |
| 91 | + |
| 92 | + MEM -->|"System Calls"| OS2 |
| 93 | + THREAD -->|"Thread APIs"| OS2 |
| 94 | + STDLIB -->|"System Libraries"| LIBS |
| 95 | + CAPI -->|"Extension Interface"| LIBS |
| 96 | +``` |
| 97 | + |
| 98 | +## Level 3: Component Diagrams |
| 99 | + |
| 100 | +### Parser Container Components |
| 101 | + |
| 102 | +```mermaid |
| 103 | +graph TB |
| 104 | + subgraph "Parser Container" |
| 105 | + TOKENIZER[Tokenizer<br/>Lexical Analysis<br/>Token Generation] |
| 106 | + PEG[PEG Parser<br/>Grammar Rules<br/>Syntax Analysis] |
| 107 | + AST[AST Builder<br/>Abstract Syntax Tree<br/>Validation] |
| 108 | + ERROR[Error Handler<br/>Syntax Errors<br/>Diagnostics] |
| 109 | + end |
| 110 | + |
| 111 | + TOKENIZER -->|"Tokens"| PEG |
| 112 | + PEG -->|"Parse Tree"| AST |
| 113 | + PEG -->|"Error Info"| ERROR |
| 114 | + AST -->|"Validated AST"| COMPILER |
| 115 | +``` |
| 116 | + |
| 117 | +### Virtual Machine Container Components |
| 118 | + |
| 119 | +```mermaid |
| 120 | +graph TB |
| 121 | + subgraph "Virtual Machine Container" |
| 122 | + subgraph "Tier 1 Interpreter" |
| 123 | + EVAL[Bytecode Evaluator<br/>Main Execution Loop<br/>Opcode Dispatch] |
| 124 | + FRAME[Frame Management<br/>Call Stack<br/>Local Variables] |
| 125 | + STACK[Evaluation Stack<br/>Value Storage<br/>Stack Operations] |
| 126 | + end |
| 127 | + |
| 128 | + subgraph "Tier 2 Interpreter" |
| 129 | + UOP[Micro-op Interpreter<br/>Optimized Execution<br/>Superblocks] |
| 130 | + OPT[Optimizer<br/>Bytecode Analysis<br/>Hot Path Detection] |
| 131 | + end |
| 132 | + |
| 133 | + subgraph "Execution Support" |
| 134 | + BREAK[Eval Breaker<br/>Signal Handling<br/>Interruption] |
| 135 | + TRACE[Tracing System<br/>Profiling<br/>Debugging] |
| 136 | + end |
| 137 | + end |
| 138 | + |
| 139 | + EVAL -->|"Frame Operations"| FRAME |
| 140 | + EVAL -->|"Stack Operations"| STACK |
| 141 | + EVAL -->|"Hot Code Detection"| OPT |
| 142 | + OPT -->|"Optimized Code"| UOP |
| 143 | + EVAL -->|"Interruption Checks"| BREAK |
| 144 | + EVAL -->|"Trace Events"| TRACE |
| 145 | +``` |
| 146 | + |
| 147 | +### Object System Container Components |
| 148 | + |
| 149 | +```mermaid |
| 150 | +graph TB |
| 151 | + subgraph "Object System Container" |
| 152 | + subgraph "Core Object Model" |
| 153 | + OBJHDR[Object Header<br/>Reference Count<br/>Type Pointer] |
| 154 | + TYPE[Type System<br/>Metaclass<br/>Method Resolution] |
| 155 | + DESC[Descriptor Protocol<br/>Property Access<br/>Method Binding] |
| 156 | + end |
| 157 | + |
| 158 | + subgraph "Built-in Types" |
| 159 | + NUMERIC[Numeric Types<br/>int, float, complex] |
| 160 | + SEQUENCE[Sequence Types<br/>str, list, tuple] |
| 161 | + MAPPING[Mapping Types<br/>dict, set] |
| 162 | + CALLABLE[Callable Types<br/>function, method] |
| 163 | + end |
| 164 | + |
| 165 | + subgraph "Special Objects" |
| 166 | + MODULE[Module Objects<br/>Namespace<br/>Import State] |
| 167 | + CLASS[Class Objects<br/>Inheritance<br/>Instance Creation] |
| 168 | + EXCEPTION[Exception Objects<br/>Error Handling<br/>Stack Traces] |
| 169 | + end |
| 170 | + end |
| 171 | + |
| 172 | + OBJHDR -->|"Type Info"| TYPE |
| 173 | + TYPE -->|"Method Lookup"| DESC |
| 174 | + TYPE -->|"Instance Creation"| NUMERIC |
| 175 | + TYPE -->|"Instance Creation"| SEQUENCE |
| 176 | + TYPE -->|"Instance Creation"| MAPPING |
| 177 | + TYPE -->|"Instance Creation"| CALLABLE |
| 178 | + TYPE -->|"Module Creation"| MODULE |
| 179 | + TYPE -->|"Class Creation"| CLASS |
| 180 | + TYPE -->|"Exception Creation"| EXCEPTION |
| 181 | +``` |
| 182 | + |
| 183 | +### Memory Management Container Components |
| 184 | + |
| 185 | +```mermaid |
| 186 | +graph TB |
| 187 | + subgraph "Memory Management Container" |
| 188 | + subgraph "Allocation" |
| 189 | + ALLOC[Object Allocator<br/>Memory Pools<br/>Arena Management] |
| 190 | + FREELIST[Free Lists<br/>Object Reuse<br/>Size Classes] |
| 191 | + ARENA[Arena Manager<br/>Memory Blocks<br/>Fragmentation Control] |
| 192 | + end |
| 193 | + |
| 194 | + subgraph "Garbage Collection" |
| 195 | + REFCOUNT[Reference Counting<br/>Immediate Deallocation<br/>Cycle Detection] |
| 196 | + GENERATIONAL[Generational GC<br/>Young/Old Generations<br/>Collection Cycles] |
| 197 | + WEAKREF[Weak References<br/>Non-owning References<br/>Callback System] |
| 198 | + end |
| 199 | + |
| 200 | + subgraph "Memory Tracking" |
| 201 | + TRACEMALLOC[Tracemalloc<br/>Memory Profiling<br/>Allocation Tracking] |
| 202 | + DEBUG[Debug Allocator<br/>Memory Validation<br/>Leak Detection] |
| 203 | + end |
| 204 | + end |
| 205 | + |
| 206 | + ALLOC -->|"Memory Requests"| ARENA |
| 207 | + ALLOC -->|"Object Reuse"| FREELIST |
| 208 | + REFCOUNT -->|"Cycle Detection"| GENERATIONAL |
| 209 | + GENERATIONAL -->|"Weak References"| WEAKREF |
| 210 | + ALLOC -->|"Allocation Events"| TRACEMALLOC |
| 211 | + ALLOC -->|"Validation"| DEBUG |
| 212 | +``` |
| 213 | + |
| 214 | +## Level 4: Code Diagrams |
| 215 | + |
| 216 | +### Bytecode Evaluator Component Code Structure |
| 217 | + |
| 218 | +```mermaid |
| 219 | +graph TB |
| 220 | + subgraph "Bytecode Evaluator Code" |
| 221 | + subgraph "Core Files" |
| 222 | + CEVAL["ceval.c<br/>Main evaluation loop<br/>Opcode dispatch"] |
| 223 | + BYTECODES["bytecodes.c<br/>Opcode definitions<br/>Instruction semantics"] |
| 224 | + MACROS["ceval_macros.h<br/>Evaluation macros<br/>Stack operations"] |
| 225 | + end |
| 226 | + |
| 227 | + subgraph "Optimization" |
| 228 | + SPECIALIZE["specialize.c<br/>Adaptive specialization<br/>Type specialization"] |
| 229 | + OPTIMIZER["optimizer.c<br/>Bytecode optimization<br/>Hot path analysis"] |
| 230 | + CASES["generated_cases.c.h<br/>Generated opcode cases<br/>Fast paths"] |
| 231 | + end |
| 232 | + |
| 233 | + subgraph "Tier 2" |
| 234 | + TIER2["tier2_engine.md<br/>Micro-op interpreter<br/>Superblock execution"] |
| 235 | + UOP["optimizer_bytecodes.c<br/>Micro-op definitions<br/>Tier 2 IR"] |
| 236 | + end |
| 237 | + end |
| 238 | + |
| 239 | + CEVAL -->|"Opcode Definitions"| BYTECODES |
| 240 | + CEVAL -->|"Macro Usage"| MACROS |
| 241 | + CEVAL -->|"Specialization"| SPECIALIZE |
| 242 | + SPECIALIZE -->|"Optimized Code"| OPTIMIZER |
| 243 | + OPTIMIZER -->|"Generated Cases"| CASES |
| 244 | + OPTIMIZER -->|"Tier 2 Code"| TIER2 |
| 245 | + TIER2 -->|"Micro-ops"| UOP |
| 246 | +``` |
| 247 | + |
| 248 | +### Parser Component Code Structure |
| 249 | + |
| 250 | +```mermaid |
| 251 | +graph TB |
| 252 | + subgraph "Parser Component Code" |
| 253 | + subgraph "Core Parser" |
| 254 | + PEGEN["pegen.c<br/>PEG parser implementation<br/>Grammar execution"] |
| 255 | + PEGAPI["peg_api.c<br/>Parser API<br/>AST generation"] |
| 256 | + GRAMMAR["python.gram<br/>Grammar definition<br/>PEG rules"] |
| 257 | + end |
| 258 | + |
| 259 | + subgraph "Tokenization" |
| 260 | + TOKEN["token.c<br/>Token definitions<br/>Token types"] |
| 261 | + LEXER["lexer/<br/>Lexical analysis<br/>Token generation"] |
| 262 | + end |
| 263 | + |
| 264 | + subgraph "AST" |
| 265 | + ASDL["Python.asdl<br/>AST definition<br/>Node types"] |
| 266 | + ASTC["asdl_c.py<br/>AST code generation<br/>C structures"] |
| 267 | + end |
| 268 | + end |
| 269 | + |
| 270 | + PEGEN -->|"Grammar Rules"| GRAMMAR |
| 271 | + PEGAPI -->|"Parser Calls"| PEGEN |
| 272 | + PEGEN -->|"Token Stream"| TOKEN |
| 273 | + TOKEN -->|"Lexical Analysis"| LEXER |
| 274 | + PEGEN -->|"AST Nodes"| ASDL |
| 275 | + ASDL -->|"Code Generation"| ASTC |
| 276 | +``` |
| 277 | + |
| 278 | +### Object System Component Code Structure |
| 279 | + |
| 280 | +```mermaid |
| 281 | +graph TB |
| 282 | + subgraph "Object System Component Code" |
| 283 | + subgraph "Core Objects" |
| 284 | + OBJECT["object.c<br/>Base object implementation<br/>Reference counting"] |
| 285 | + TYPEOBJ["typeobject.c<br/>Type system<br/>Metaclass implementation"] |
| 286 | + DESCR["descrobject.c<br/>Descriptor protocol<br/>Property access"] |
| 287 | + end |
| 288 | + |
| 289 | + subgraph "Built-in Types" |
| 290 | + LONG["longobject.c<br/>Arbitrary precision integers"] |
| 291 | + UNICODE["unicodeobject.c<br/>String implementation<br/>UTF-8/UTF-16"] |
| 292 | + DICT["dictobject.c<br/>Dictionary implementation<br/>Hash tables"] |
| 293 | + LIST["listobject.c<br/>Dynamic arrays<br/>List operations"] |
| 294 | + end |
| 295 | + |
| 296 | + subgraph "Function Objects" |
| 297 | + FUNC["funcobject.c<br/>Function objects<br/>Closure support"] |
| 298 | + METHOD["methodobject.c<br/>Method objects<br/>Bound methods"] |
| 299 | + CLASS["classobject.c<br/>Class objects<br/>Inheritance"] |
| 300 | + end |
| 301 | + end |
| 302 | + |
| 303 | + OBJECT -->|"Base Type"| TYPEOBJ |
| 304 | + TYPEOBJ -->|"Descriptor Access"| DESCR |
| 305 | + TYPEOBJ -->|"Type Creation"| LONG |
| 306 | + TYPEOBJ -->|"Type Creation"| UNICODE |
| 307 | + TYPEOBJ -->|"Type Creation"| DICT |
| 308 | + TYPEOBJ -->|"Type Creation"| LIST |
| 309 | + TYPEOBJ -->|"Callable Types"| FUNC |
| 310 | + FUNC -->|"Method Binding"| METHOD |
| 311 | + TYPEOBJ -->|"Class Creation"| CLASS |
| 312 | +``` |
| 313 | + |
| 314 | +### Memory Management Component Code Structure |
| 315 | + |
| 316 | +```mermaid |
| 317 | +graph TB |
| 318 | + subgraph "Memory Management Component Code" |
| 319 | + subgraph "Allocation" |
| 320 | + PYMEM["pymem.c<br/>Memory allocator<br/>Arena management"] |
| 321 | + OBJIMPL["objimpl.h<br/>Object allocation<br/>Type-specific allocators"] |
| 322 | + ARENA["pyarena.c<br/>Arena allocator<br/>Block management"] |
| 323 | + end |
| 324 | + |
| 325 | + subgraph "Garbage Collection" |
| 326 | + GC["gc.c<br/>Reference counting<br/>Cycle detection"] |
| 327 | + GCTHREAD["gc_free_threading.c<br/>Free-threaded GC<br/>Concurrent collection"] |
| 328 | + WEAKREF["weakrefobject.c<br/>Weak references<br/>Callback system"] |
| 329 | + end |
| 330 | + |
| 331 | + subgraph "Memory Tracking" |
| 332 | + TRACE["tracemalloc.c<br/>Memory profiling<br/>Allocation tracking"] |
| 333 | + DEBUG["pydebug.h<br/>Debug macros<br/>Memory validation"] |
| 334 | + end |
| 335 | + end |
| 336 | + |
| 337 | + PYMEM -->|"Object Allocation"| OBJIMPL |
| 338 | + PYMEM -->|"Arena Management"| ARENA |
| 339 | + OBJIMPL -->|"GC Integration"| GC |
| 340 | + GC -->|"Concurrent GC"| GCTHREAD |
| 341 | + GC -->|"Weak References"| WEAKREF |
| 342 | + PYMEM -->|"Allocation Tracking"| TRACE |
| 343 | + PYMEM -->|"Debug Validation"| DEBUG |
| 344 | +``` |
| 345 | + |
| 346 | +## Key Architectural Patterns |
| 347 | + |
| 348 | +### 1. Layered Architecture |
| 349 | +- **Parser Layer**: Converts source code to AST |
| 350 | +- **Compiler Layer**: Transforms AST to bytecode |
| 351 | +- **VM Layer**: Executes bytecode |
| 352 | +- **Object Layer**: Manages Python objects |
| 353 | +- **Memory Layer**: Handles allocation and garbage collection |
| 354 | + |
| 355 | +### 2. Interpreter Pattern |
| 356 | +- **Tier 1**: Traditional bytecode interpreter with adaptive specialization |
| 357 | +- **Tier 2**: Micro-op interpreter for hot code paths |
| 358 | +- **JIT**: Future machine code generation for performance-critical code |
| 359 | + |
| 360 | +### 3. Object-Oriented Design |
| 361 | +- **Everything is an Object**: All Python values are objects |
| 362 | +- **Type System**: Dynamic typing with runtime type checking |
| 363 | +- **Method Resolution**: Dynamic method lookup and binding |
| 364 | + |
| 365 | +### 4. Memory Management |
| 366 | +- **Reference Counting**: Immediate deallocation for most objects |
| 367 | +- **Generational GC**: Cycle detection for complex object graphs |
| 368 | +- **Arena Allocation**: Efficient memory management for small objects |
| 369 | + |
| 370 | +### 5. Extension System |
| 371 | +- **C API**: Rich interface for C extensions |
| 372 | +- **Module System**: Dynamic loading of Python and C modules |
| 373 | +- **Import System**: Flexible module discovery and loading |
| 374 | + |
| 375 | +## Performance Optimizations |
| 376 | + |
| 377 | +### 1. Adaptive Specialization |
| 378 | +- **Type Specialization**: Optimized code paths for common types |
| 379 | +- **Inline Caching**: Fast method and attribute access |
| 380 | +- **Superinstructions**: Combined bytecode operations |
| 381 | + |
| 382 | +### 2. Memory Optimizations |
| 383 | +- **Free Lists**: Object reuse to reduce allocation overhead |
| 384 | +- **Arena Allocation**: Reduced fragmentation and improved locality |
| 385 | +- **Copy-on-Write**: Efficient string and tuple operations |
| 386 | + |
| 387 | +### 3. Execution Optimizations |
| 388 | +- **Computed Gotos**: Fast opcode dispatch |
| 389 | +- **Stack Caching**: Reduced memory access for local variables |
| 390 | +- **Tier 2 Interpreter**: Optimized execution for hot code paths |
| 391 | + |
| 392 | +This comprehensive C4 model provides a complete view of the CPython architecture, from high-level system interactions down to detailed code structure, enabling effective contribution to the CPython codebase. |
0 commit comments