Skip to content

Commit d7d6f1f

Browse files
added c4 diagrams
1 parent 161b306 commit d7d6f1f

File tree

2 files changed

+924
-0
lines changed

2 files changed

+924
-0
lines changed

C4/c4_diagrams.md

Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
# CPython C4 Model Architecture Diagrams
2+
3+
This document contains comprehensive C4 model diagrams for the CPython codebase, from high-level system context down to detailed code structure.
4+
5+
## Level 1: System Context Diagram
6+
7+
```mermaid
8+
graph TB
9+
subgraph "External Users"
10+
DEV[Python Developers]
11+
SYS[System Administrators]
12+
APP[Application Users]
13+
end
14+
15+
subgraph "External Systems"
16+
OS[Operating System<br/>Windows/macOS/Linux]
17+
FS[File System]
18+
NET[Network Services]
19+
LIB[Third-party Libraries]
20+
EXT[C Extensions]
21+
end
22+
23+
subgraph "CPython System"
24+
CPY[CPython Interpreter<br/>Python 3.15]
25+
end
26+
27+
DEV -->|"Writes Python Code"| CPY
28+
SYS -->|"Configures & Deploys"| CPY
29+
APP -->|"Runs Applications"| CPY
30+
31+
CPY -->|"System Calls"| OS
32+
CPY -->|"File I/O"| FS
33+
CPY -->|"Network Operations"| NET
34+
CPY -->|"Imports Modules"| LIB
35+
CPY -->|"Loads Extensions"| EXT
36+
37+
OS -->|"Process Management"| CPY
38+
FS -->|"File Access"| CPY
39+
NET -->|"Network Data"| CPY
40+
LIB -->|"Standard Library"| CPY
41+
EXT -->|"Native Code"| CPY
42+
```
43+
44+
## Level 2: Container Diagram
45+
46+
```mermaid
47+
graph TB
48+
subgraph "CPython Runtime System"
49+
subgraph "Core Interpreter"
50+
PARSER[Parser<br/>PEG Grammar<br/>AST Generation]
51+
COMPILER[Compiler<br/>AST to Bytecode<br/>Optimization]
52+
VM[Virtual Machine<br/>Bytecode Execution<br/>Tier 1 & Tier 2]
53+
end
54+
55+
subgraph "Runtime Services"
56+
GC[Garbage Collector<br/>Reference Counting<br/>Cycle Detection]
57+
MEM[Memory Manager<br/>Object Allocation<br/>Free Lists]
58+
THREAD[Threading System<br/>GIL Management<br/>Thread States]
59+
end
60+
61+
subgraph "Object System"
62+
OBJ[Object Model<br/>Type System<br/>Method Resolution]
63+
BUILTIN[Built-in Types<br/>int, str, list, dict<br/>etc.]
64+
end
65+
66+
subgraph "Module System"
67+
IMPORT[Import System<br/>Module Loading<br/>Path Resolution]
68+
STDLIB[Standard Library<br/>Built-in Modules<br/>Extension Modules]
69+
end
70+
71+
subgraph "C API"
72+
CAPI[C API<br/>Extension Interface<br/>Embedding Support]
73+
end
74+
end
75+
76+
subgraph "External Dependencies"
77+
OS2[Operating System]
78+
LIBS[System Libraries]
79+
end
80+
81+
PARSER -->|"AST"| COMPILER
82+
COMPILER -->|"Bytecode"| VM
83+
VM -->|"Object Operations"| OBJ
84+
VM -->|"Memory Requests"| MEM
85+
VM -->|"GC Triggers"| GC
86+
VM -->|"Thread Management"| THREAD
87+
88+
OBJ -->|"Type Operations"| BUILTIN
89+
IMPORT -->|"Module Loading"| STDLIB
90+
IMPORT -->|"Extension Loading"| CAPI
91+
92+
MEM -->|"System Calls"| OS2
93+
THREAD -->|"Thread APIs"| OS2
94+
STDLIB -->|"System Libraries"| LIBS
95+
CAPI -->|"Extension Interface"| LIBS
96+
```
97+
98+
## Level 3: Component Diagrams
99+
100+
### Parser Container Components
101+
102+
```mermaid
103+
graph TB
104+
subgraph "Parser Container"
105+
TOKENIZER[Tokenizer<br/>Lexical Analysis<br/>Token Generation]
106+
PEG[PEG Parser<br/>Grammar Rules<br/>Syntax Analysis]
107+
AST[AST Builder<br/>Abstract Syntax Tree<br/>Validation]
108+
ERROR[Error Handler<br/>Syntax Errors<br/>Diagnostics]
109+
end
110+
111+
TOKENIZER -->|"Tokens"| PEG
112+
PEG -->|"Parse Tree"| AST
113+
PEG -->|"Error Info"| ERROR
114+
AST -->|"Validated AST"| COMPILER
115+
```
116+
117+
### Virtual Machine Container Components
118+
119+
```mermaid
120+
graph TB
121+
subgraph "Virtual Machine Container"
122+
subgraph "Tier 1 Interpreter"
123+
EVAL[Bytecode Evaluator<br/>Main Execution Loop<br/>Opcode Dispatch]
124+
FRAME[Frame Management<br/>Call Stack<br/>Local Variables]
125+
STACK[Evaluation Stack<br/>Value Storage<br/>Stack Operations]
126+
end
127+
128+
subgraph "Tier 2 Interpreter"
129+
UOP[Micro-op Interpreter<br/>Optimized Execution<br/>Superblocks]
130+
OPT[Optimizer<br/>Bytecode Analysis<br/>Hot Path Detection]
131+
end
132+
133+
subgraph "Execution Support"
134+
BREAK[Eval Breaker<br/>Signal Handling<br/>Interruption]
135+
TRACE[Tracing System<br/>Profiling<br/>Debugging]
136+
end
137+
end
138+
139+
EVAL -->|"Frame Operations"| FRAME
140+
EVAL -->|"Stack Operations"| STACK
141+
EVAL -->|"Hot Code Detection"| OPT
142+
OPT -->|"Optimized Code"| UOP
143+
EVAL -->|"Interruption Checks"| BREAK
144+
EVAL -->|"Trace Events"| TRACE
145+
```
146+
147+
### Object System Container Components
148+
149+
```mermaid
150+
graph TB
151+
subgraph "Object System Container"
152+
subgraph "Core Object Model"
153+
OBJHDR[Object Header<br/>Reference Count<br/>Type Pointer]
154+
TYPE[Type System<br/>Metaclass<br/>Method Resolution]
155+
DESC[Descriptor Protocol<br/>Property Access<br/>Method Binding]
156+
end
157+
158+
subgraph "Built-in Types"
159+
NUMERIC[Numeric Types<br/>int, float, complex]
160+
SEQUENCE[Sequence Types<br/>str, list, tuple]
161+
MAPPING[Mapping Types<br/>dict, set]
162+
CALLABLE[Callable Types<br/>function, method]
163+
end
164+
165+
subgraph "Special Objects"
166+
MODULE[Module Objects<br/>Namespace<br/>Import State]
167+
CLASS[Class Objects<br/>Inheritance<br/>Instance Creation]
168+
EXCEPTION[Exception Objects<br/>Error Handling<br/>Stack Traces]
169+
end
170+
end
171+
172+
OBJHDR -->|"Type Info"| TYPE
173+
TYPE -->|"Method Lookup"| DESC
174+
TYPE -->|"Instance Creation"| NUMERIC
175+
TYPE -->|"Instance Creation"| SEQUENCE
176+
TYPE -->|"Instance Creation"| MAPPING
177+
TYPE -->|"Instance Creation"| CALLABLE
178+
TYPE -->|"Module Creation"| MODULE
179+
TYPE -->|"Class Creation"| CLASS
180+
TYPE -->|"Exception Creation"| EXCEPTION
181+
```
182+
183+
### Memory Management Container Components
184+
185+
```mermaid
186+
graph TB
187+
subgraph "Memory Management Container"
188+
subgraph "Allocation"
189+
ALLOC[Object Allocator<br/>Memory Pools<br/>Arena Management]
190+
FREELIST[Free Lists<br/>Object Reuse<br/>Size Classes]
191+
ARENA[Arena Manager<br/>Memory Blocks<br/>Fragmentation Control]
192+
end
193+
194+
subgraph "Garbage Collection"
195+
REFCOUNT[Reference Counting<br/>Immediate Deallocation<br/>Cycle Detection]
196+
GENERATIONAL[Generational GC<br/>Young/Old Generations<br/>Collection Cycles]
197+
WEAKREF[Weak References<br/>Non-owning References<br/>Callback System]
198+
end
199+
200+
subgraph "Memory Tracking"
201+
TRACEMALLOC[Tracemalloc<br/>Memory Profiling<br/>Allocation Tracking]
202+
DEBUG[Debug Allocator<br/>Memory Validation<br/>Leak Detection]
203+
end
204+
end
205+
206+
ALLOC -->|"Memory Requests"| ARENA
207+
ALLOC -->|"Object Reuse"| FREELIST
208+
REFCOUNT -->|"Cycle Detection"| GENERATIONAL
209+
GENERATIONAL -->|"Weak References"| WEAKREF
210+
ALLOC -->|"Allocation Events"| TRACEMALLOC
211+
ALLOC -->|"Validation"| DEBUG
212+
```
213+
214+
## Level 4: Code Diagrams
215+
216+
### Bytecode Evaluator Component Code Structure
217+
218+
```mermaid
219+
graph TB
220+
subgraph "Bytecode Evaluator Code"
221+
subgraph "Core Files"
222+
CEVAL["ceval.c<br/>Main evaluation loop<br/>Opcode dispatch"]
223+
BYTECODES["bytecodes.c<br/>Opcode definitions<br/>Instruction semantics"]
224+
MACROS["ceval_macros.h<br/>Evaluation macros<br/>Stack operations"]
225+
end
226+
227+
subgraph "Optimization"
228+
SPECIALIZE["specialize.c<br/>Adaptive specialization<br/>Type specialization"]
229+
OPTIMIZER["optimizer.c<br/>Bytecode optimization<br/>Hot path analysis"]
230+
CASES["generated_cases.c.h<br/>Generated opcode cases<br/>Fast paths"]
231+
end
232+
233+
subgraph "Tier 2"
234+
TIER2["tier2_engine.md<br/>Micro-op interpreter<br/>Superblock execution"]
235+
UOP["optimizer_bytecodes.c<br/>Micro-op definitions<br/>Tier 2 IR"]
236+
end
237+
end
238+
239+
CEVAL -->|"Opcode Definitions"| BYTECODES
240+
CEVAL -->|"Macro Usage"| MACROS
241+
CEVAL -->|"Specialization"| SPECIALIZE
242+
SPECIALIZE -->|"Optimized Code"| OPTIMIZER
243+
OPTIMIZER -->|"Generated Cases"| CASES
244+
OPTIMIZER -->|"Tier 2 Code"| TIER2
245+
TIER2 -->|"Micro-ops"| UOP
246+
```
247+
248+
### Parser Component Code Structure
249+
250+
```mermaid
251+
graph TB
252+
subgraph "Parser Component Code"
253+
subgraph "Core Parser"
254+
PEGEN["pegen.c<br/>PEG parser implementation<br/>Grammar execution"]
255+
PEGAPI["peg_api.c<br/>Parser API<br/>AST generation"]
256+
GRAMMAR["python.gram<br/>Grammar definition<br/>PEG rules"]
257+
end
258+
259+
subgraph "Tokenization"
260+
TOKEN["token.c<br/>Token definitions<br/>Token types"]
261+
LEXER["lexer/<br/>Lexical analysis<br/>Token generation"]
262+
end
263+
264+
subgraph "AST"
265+
ASDL["Python.asdl<br/>AST definition<br/>Node types"]
266+
ASTC["asdl_c.py<br/>AST code generation<br/>C structures"]
267+
end
268+
end
269+
270+
PEGEN -->|"Grammar Rules"| GRAMMAR
271+
PEGAPI -->|"Parser Calls"| PEGEN
272+
PEGEN -->|"Token Stream"| TOKEN
273+
TOKEN -->|"Lexical Analysis"| LEXER
274+
PEGEN -->|"AST Nodes"| ASDL
275+
ASDL -->|"Code Generation"| ASTC
276+
```
277+
278+
### Object System Component Code Structure
279+
280+
```mermaid
281+
graph TB
282+
subgraph "Object System Component Code"
283+
subgraph "Core Objects"
284+
OBJECT["object.c<br/>Base object implementation<br/>Reference counting"]
285+
TYPEOBJ["typeobject.c<br/>Type system<br/>Metaclass implementation"]
286+
DESCR["descrobject.c<br/>Descriptor protocol<br/>Property access"]
287+
end
288+
289+
subgraph "Built-in Types"
290+
LONG["longobject.c<br/>Arbitrary precision integers"]
291+
UNICODE["unicodeobject.c<br/>String implementation<br/>UTF-8/UTF-16"]
292+
DICT["dictobject.c<br/>Dictionary implementation<br/>Hash tables"]
293+
LIST["listobject.c<br/>Dynamic arrays<br/>List operations"]
294+
end
295+
296+
subgraph "Function Objects"
297+
FUNC["funcobject.c<br/>Function objects<br/>Closure support"]
298+
METHOD["methodobject.c<br/>Method objects<br/>Bound methods"]
299+
CLASS["classobject.c<br/>Class objects<br/>Inheritance"]
300+
end
301+
end
302+
303+
OBJECT -->|"Base Type"| TYPEOBJ
304+
TYPEOBJ -->|"Descriptor Access"| DESCR
305+
TYPEOBJ -->|"Type Creation"| LONG
306+
TYPEOBJ -->|"Type Creation"| UNICODE
307+
TYPEOBJ -->|"Type Creation"| DICT
308+
TYPEOBJ -->|"Type Creation"| LIST
309+
TYPEOBJ -->|"Callable Types"| FUNC
310+
FUNC -->|"Method Binding"| METHOD
311+
TYPEOBJ -->|"Class Creation"| CLASS
312+
```
313+
314+
### Memory Management Component Code Structure
315+
316+
```mermaid
317+
graph TB
318+
subgraph "Memory Management Component Code"
319+
subgraph "Allocation"
320+
PYMEM["pymem.c<br/>Memory allocator<br/>Arena management"]
321+
OBJIMPL["objimpl.h<br/>Object allocation<br/>Type-specific allocators"]
322+
ARENA["pyarena.c<br/>Arena allocator<br/>Block management"]
323+
end
324+
325+
subgraph "Garbage Collection"
326+
GC["gc.c<br/>Reference counting<br/>Cycle detection"]
327+
GCTHREAD["gc_free_threading.c<br/>Free-threaded GC<br/>Concurrent collection"]
328+
WEAKREF["weakrefobject.c<br/>Weak references<br/>Callback system"]
329+
end
330+
331+
subgraph "Memory Tracking"
332+
TRACE["tracemalloc.c<br/>Memory profiling<br/>Allocation tracking"]
333+
DEBUG["pydebug.h<br/>Debug macros<br/>Memory validation"]
334+
end
335+
end
336+
337+
PYMEM -->|"Object Allocation"| OBJIMPL
338+
PYMEM -->|"Arena Management"| ARENA
339+
OBJIMPL -->|"GC Integration"| GC
340+
GC -->|"Concurrent GC"| GCTHREAD
341+
GC -->|"Weak References"| WEAKREF
342+
PYMEM -->|"Allocation Tracking"| TRACE
343+
PYMEM -->|"Debug Validation"| DEBUG
344+
```
345+
346+
## Key Architectural Patterns
347+
348+
### 1. Layered Architecture
349+
- **Parser Layer**: Converts source code to AST
350+
- **Compiler Layer**: Transforms AST to bytecode
351+
- **VM Layer**: Executes bytecode
352+
- **Object Layer**: Manages Python objects
353+
- **Memory Layer**: Handles allocation and garbage collection
354+
355+
### 2. Interpreter Pattern
356+
- **Tier 1**: Traditional bytecode interpreter with adaptive specialization
357+
- **Tier 2**: Micro-op interpreter for hot code paths
358+
- **JIT**: Future machine code generation for performance-critical code
359+
360+
### 3. Object-Oriented Design
361+
- **Everything is an Object**: All Python values are objects
362+
- **Type System**: Dynamic typing with runtime type checking
363+
- **Method Resolution**: Dynamic method lookup and binding
364+
365+
### 4. Memory Management
366+
- **Reference Counting**: Immediate deallocation for most objects
367+
- **Generational GC**: Cycle detection for complex object graphs
368+
- **Arena Allocation**: Efficient memory management for small objects
369+
370+
### 5. Extension System
371+
- **C API**: Rich interface for C extensions
372+
- **Module System**: Dynamic loading of Python and C modules
373+
- **Import System**: Flexible module discovery and loading
374+
375+
## Performance Optimizations
376+
377+
### 1. Adaptive Specialization
378+
- **Type Specialization**: Optimized code paths for common types
379+
- **Inline Caching**: Fast method and attribute access
380+
- **Superinstructions**: Combined bytecode operations
381+
382+
### 2. Memory Optimizations
383+
- **Free Lists**: Object reuse to reduce allocation overhead
384+
- **Arena Allocation**: Reduced fragmentation and improved locality
385+
- **Copy-on-Write**: Efficient string and tuple operations
386+
387+
### 3. Execution Optimizations
388+
- **Computed Gotos**: Fast opcode dispatch
389+
- **Stack Caching**: Reduced memory access for local variables
390+
- **Tier 2 Interpreter**: Optimized execution for hot code paths
391+
392+
This comprehensive C4 model provides a complete view of the CPython architecture, from high-level system interactions down to detailed code structure, enabling effective contribution to the CPython codebase.

0 commit comments

Comments
 (0)