-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
gh-140521: Added comprehensive c4 diagrams #140522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,392 @@ | ||
| # CPython C4 Model Architecture Diagrams | ||
|
|
||
| This document contains comprehensive C4 model diagrams for the CPython codebase, from high-level system context down to detailed code structure. | ||
|
|
||
| ## Level 1: System Context Diagram | ||
|
|
||
| ```mermaid | ||
| graph TB | ||
| subgraph "External Users" | ||
| DEV[Python Developers] | ||
| SYS[System Administrators] | ||
| APP[Application Users] | ||
| end | ||
|
|
||
| subgraph "External Systems" | ||
| OS[Operating System<br/>Windows/macOS/Linux] | ||
| FS[File System] | ||
| NET[Network Services] | ||
| LIB[Third-party Libraries] | ||
| EXT[C Extensions] | ||
| end | ||
|
|
||
| subgraph "CPython System" | ||
| CPY[CPython Interpreter<br/>Python 3.15] | ||
| end | ||
|
|
||
| DEV -->|"Writes Python Code"| CPY | ||
| SYS -->|"Configures & Deploys"| CPY | ||
| APP -->|"Runs Applications"| CPY | ||
|
|
||
| CPY -->|"System Calls"| OS | ||
| CPY -->|"File I/O"| FS | ||
| CPY -->|"Network Operations"| NET | ||
| CPY -->|"Imports Modules"| LIB | ||
| CPY -->|"Loads Extensions"| EXT | ||
|
|
||
| OS -->|"Process Management"| CPY | ||
| FS -->|"File Access"| CPY | ||
| NET -->|"Network Data"| CPY | ||
| LIB -->|"Standard Library"| CPY | ||
| EXT -->|"Native Code"| CPY | ||
|
Comment on lines
+31
to
+41
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really see what this adds, this is just general information?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The C4 model proposes starting from the top and working its way down to specific implementations Context, Component, Container, Code. |
||
| ``` | ||
|
|
||
| ## Level 2: Container Diagram | ||
|
|
||
| ```mermaid | ||
| graph TB | ||
| subgraph "CPython Runtime System" | ||
| subgraph "Core Interpreter" | ||
| PARSER[Parser<br/>PEG Grammar<br/>AST Generation] | ||
| COMPILER[Compiler<br/>AST to Bytecode<br/>Optimization] | ||
| VM[Virtual Machine<br/>Bytecode Execution<br/>Tier 1 & Tier 2] | ||
| end | ||
|
|
||
| subgraph "Runtime Services" | ||
| GC[Garbage Collector<br/>Reference Counting<br/>Cycle Detection] | ||
| MEM[Memory Manager<br/>Object Allocation<br/>Free Lists] | ||
| THREAD[Threading System<br/>GIL Management<br/>Thread States] | ||
| end | ||
|
|
||
| subgraph "Object System" | ||
| OBJ[Object Model<br/>Type System<br/>Method Resolution] | ||
| BUILTIN[Built-in Types<br/>int, str, list, dict<br/>etc.] | ||
| end | ||
|
|
||
| subgraph "Module System" | ||
| IMPORT[Import System<br/>Module Loading<br/>Path Resolution] | ||
| STDLIB[Standard Library<br/>Built-in Modules<br/>Extension Modules] | ||
| end | ||
|
|
||
| subgraph "C API" | ||
| CAPI[C API<br/>Extension Interface<br/>Embedding Support] | ||
| end | ||
| end | ||
|
|
||
| subgraph "External Dependencies" | ||
| OS2[Operating System] | ||
| LIBS[System Libraries] | ||
| end | ||
|
|
||
| PARSER -->|"AST"| COMPILER | ||
| COMPILER -->|"Bytecode"| VM | ||
| VM -->|"Object Operations"| OBJ | ||
| VM -->|"Memory Requests"| MEM | ||
| VM -->|"GC Triggers"| GC | ||
| VM -->|"Thread Management"| THREAD | ||
|
|
||
| OBJ -->|"Type Operations"| BUILTIN | ||
| IMPORT -->|"Module Loading"| STDLIB | ||
| IMPORT -->|"Extension Loading"| CAPI | ||
|
|
||
| MEM -->|"System Calls"| OS2 | ||
| THREAD -->|"Thread APIs"| OS2 | ||
| STDLIB -->|"System Libraries"| LIBS | ||
| CAPI -->|"Extension Interface"| LIBS | ||
| ``` | ||
|
|
||
| ## Level 3: Component Diagrams | ||
|
|
||
| ### Parser Container Components | ||
|
|
||
| ```mermaid | ||
| graph TB | ||
| subgraph "Parser Container" | ||
| TOKENIZER[Tokenizer<br/>Lexical Analysis<br/>Token Generation] | ||
| PEG[PEG Parser<br/>Grammar Rules<br/>Syntax Analysis] | ||
| AST[AST Builder<br/>Abstract Syntax Tree<br/>Validation] | ||
| ERROR[Error Handler<br/>Syntax Errors<br/>Diagnostics] | ||
| end | ||
|
|
||
| TOKENIZER -->|"Tokens"| PEG | ||
| PEG -->|"Parse Tree"| AST | ||
| PEG -->|"Error Info"| ERROR | ||
| AST -->|"Validated AST"| COMPILER | ||
| ``` | ||
|
|
||
| ### Virtual Machine Container Components | ||
|
|
||
| ```mermaid | ||
| graph TB | ||
| subgraph "Virtual Machine Container" | ||
| subgraph "Tier 1 Interpreter" | ||
| EVAL[Bytecode Evaluator<br/>Main Execution Loop<br/>Opcode Dispatch] | ||
| FRAME[Frame Management<br/>Call Stack<br/>Local Variables] | ||
| STACK[Evaluation Stack<br/>Value Storage<br/>Stack Operations] | ||
| end | ||
|
|
||
| subgraph "Tier 2 Interpreter" | ||
| UOP[Micro-op Interpreter<br/>Optimized Execution<br/>Superblocks] | ||
| OPT[Optimizer<br/>Bytecode Analysis<br/>Hot Path Detection] | ||
| end | ||
|
|
||
| subgraph "Execution Support" | ||
| BREAK[Eval Breaker<br/>Signal Handling<br/>Interruption] | ||
| TRACE[Tracing System<br/>Profiling<br/>Debugging] | ||
| end | ||
| end | ||
|
|
||
| EVAL -->|"Frame Operations"| FRAME | ||
| EVAL -->|"Stack Operations"| STACK | ||
| EVAL -->|"Hot Code Detection"| OPT | ||
| OPT -->|"Optimized Code"| UOP | ||
| EVAL -->|"Interruption Checks"| BREAK | ||
| EVAL -->|"Trace Events"| TRACE | ||
| ``` | ||
|
|
||
| ### Object System Container Components | ||
|
|
||
| ```mermaid | ||
| graph TB | ||
| subgraph "Object System Container" | ||
| subgraph "Core Object Model" | ||
| OBJHDR[Object Header<br/>Reference Count<br/>Type Pointer] | ||
| TYPE[Type System<br/>Metaclass<br/>Method Resolution] | ||
| DESC[Descriptor Protocol<br/>Property Access<br/>Method Binding] | ||
| end | ||
|
|
||
| subgraph "Built-in Types" | ||
| NUMERIC[Numeric Types<br/>int, float, complex] | ||
| SEQUENCE[Sequence Types<br/>str, list, tuple] | ||
| MAPPING[Mapping Types<br/>dict, set] | ||
| CALLABLE[Callable Types<br/>function, method] | ||
| end | ||
|
|
||
| subgraph "Special Objects" | ||
| MODULE[Module Objects<br/>Namespace<br/>Import State] | ||
| CLASS[Class Objects<br/>Inheritance<br/>Instance Creation] | ||
| EXCEPTION[Exception Objects<br/>Error Handling<br/>Stack Traces] | ||
| end | ||
| end | ||
|
|
||
| OBJHDR -->|"Type Info"| TYPE | ||
| TYPE -->|"Method Lookup"| DESC | ||
| TYPE -->|"Instance Creation"| NUMERIC | ||
| TYPE -->|"Instance Creation"| SEQUENCE | ||
| TYPE -->|"Instance Creation"| MAPPING | ||
| TYPE -->|"Instance Creation"| CALLABLE | ||
| TYPE -->|"Module Creation"| MODULE | ||
| TYPE -->|"Class Creation"| CLASS | ||
| TYPE -->|"Exception Creation"| EXCEPTION | ||
| ``` | ||
|
|
||
| ### Memory Management Container Components | ||
|
|
||
| ```mermaid | ||
| graph TB | ||
| subgraph "Memory Management Container" | ||
| subgraph "Allocation" | ||
| ALLOC[Object Allocator<br/>Memory Pools<br/>Arena Management] | ||
| FREELIST[Free Lists<br/>Object Reuse<br/>Size Classes] | ||
| ARENA[Arena Manager<br/>Memory Blocks<br/>Fragmentation Control] | ||
| end | ||
|
|
||
| subgraph "Garbage Collection" | ||
| REFCOUNT[Reference Counting<br/>Immediate Deallocation<br/>Cycle Detection] | ||
| GENERATIONAL[Generational GC<br/>Young/Old Generations<br/>Collection Cycles] | ||
| WEAKREF[Weak References<br/>Non-owning References<br/>Callback System] | ||
| end | ||
|
|
||
| subgraph "Memory Tracking" | ||
| TRACEMALLOC[Tracemalloc<br/>Memory Profiling<br/>Allocation Tracking] | ||
| DEBUG[Debug Allocator<br/>Memory Validation<br/>Leak Detection] | ||
| end | ||
| end | ||
|
|
||
| ALLOC -->|"Memory Requests"| ARENA | ||
| ALLOC -->|"Object Reuse"| FREELIST | ||
| REFCOUNT -->|"Cycle Detection"| GENERATIONAL | ||
| GENERATIONAL -->|"Weak References"| WEAKREF | ||
| ALLOC -->|"Allocation Events"| TRACEMALLOC | ||
| ALLOC -->|"Validation"| DEBUG | ||
| ``` | ||
|
|
||
| ## Level 4: Code Diagrams | ||
|
|
||
| ### Bytecode Evaluator Component Code Structure | ||
|
|
||
| ```mermaid | ||
| graph TB | ||
| subgraph "Bytecode Evaluator Code" | ||
| subgraph "Core Files" | ||
| CEVAL["ceval.c<br/>Main evaluation loop<br/>Opcode dispatch"] | ||
| BYTECODES["bytecodes.c<br/>Opcode definitions<br/>Instruction semantics"] | ||
| MACROS["ceval_macros.h<br/>Evaluation macros<br/>Stack operations"] | ||
| end | ||
|
|
||
| subgraph "Optimization" | ||
| SPECIALIZE["specialize.c<br/>Adaptive specialization<br/>Type specialization"] | ||
| OPTIMIZER["optimizer.c<br/>Bytecode optimization<br/>Hot path analysis"] | ||
| CASES["generated_cases.c.h<br/>Generated opcode cases<br/>Fast paths"] | ||
| end | ||
|
|
||
| subgraph "Tier 2" | ||
| TIER2["tier2_engine.md<br/>Micro-op interpreter<br/>Superblock execution"] | ||
| UOP["optimizer_bytecodes.c<br/>Micro-op definitions<br/>Tier 2 IR"] | ||
| end | ||
| end | ||
|
|
||
| CEVAL -->|"Opcode Definitions"| BYTECODES | ||
| CEVAL -->|"Macro Usage"| MACROS | ||
| CEVAL -->|"Specialization"| SPECIALIZE | ||
| SPECIALIZE -->|"Optimized Code"| OPTIMIZER | ||
| OPTIMIZER -->|"Generated Cases"| CASES | ||
| OPTIMIZER -->|"Tier 2 Code"| TIER2 | ||
| TIER2 -->|"Micro-ops"| UOP | ||
| ``` | ||
|
|
||
| ### Parser Component Code Structure | ||
|
|
||
| ```mermaid | ||
| graph TB | ||
| subgraph "Parser Component Code" | ||
| subgraph "Core Parser" | ||
| PEGEN["pegen.c<br/>PEG parser implementation<br/>Grammar execution"] | ||
| PEGAPI["peg_api.c<br/>Parser API<br/>AST generation"] | ||
| GRAMMAR["python.gram<br/>Grammar definition<br/>PEG rules"] | ||
| end | ||
|
|
||
| subgraph "Tokenization" | ||
| TOKEN["token.c<br/>Token definitions<br/>Token types"] | ||
| LEXER["lexer/<br/>Lexical analysis<br/>Token generation"] | ||
| end | ||
|
|
||
| subgraph "AST" | ||
| ASDL["Python.asdl<br/>AST definition<br/>Node types"] | ||
| ASTC["asdl_c.py<br/>AST code generation<br/>C structures"] | ||
| end | ||
| end | ||
|
|
||
| PEGEN -->|"Grammar Rules"| GRAMMAR | ||
| PEGAPI -->|"Parser Calls"| PEGEN | ||
| PEGEN -->|"Token Stream"| TOKEN | ||
| TOKEN -->|"Lexical Analysis"| LEXER | ||
| PEGEN -->|"AST Nodes"| ASDL | ||
| ASDL -->|"Code Generation"| ASTC | ||
| ``` | ||
|
|
||
| ### Object System Component Code Structure | ||
|
|
||
| ```mermaid | ||
| graph TB | ||
| subgraph "Object System Component Code" | ||
| subgraph "Core Objects" | ||
| OBJECT["object.c<br/>Base object implementation<br/>Reference counting"] | ||
| TYPEOBJ["typeobject.c<br/>Type system<br/>Metaclass implementation"] | ||
| DESCR["descrobject.c<br/>Descriptor protocol<br/>Property access"] | ||
| end | ||
|
|
||
| subgraph "Built-in Types" | ||
| LONG["longobject.c<br/>Arbitrary precision integers"] | ||
| UNICODE["unicodeobject.c<br/>String implementation<br/>UTF-8/UTF-16"] | ||
| DICT["dictobject.c<br/>Dictionary implementation<br/>Hash tables"] | ||
| LIST["listobject.c<br/>Dynamic arrays<br/>List operations"] | ||
| end | ||
|
|
||
| subgraph "Function Objects" | ||
| FUNC["funcobject.c<br/>Function objects<br/>Closure support"] | ||
| METHOD["methodobject.c<br/>Method objects<br/>Bound methods"] | ||
| CLASS["classobject.c<br/>Class objects<br/>Inheritance"] | ||
| end | ||
| end | ||
|
|
||
| OBJECT -->|"Base Type"| TYPEOBJ | ||
| TYPEOBJ -->|"Descriptor Access"| DESCR | ||
| TYPEOBJ -->|"Type Creation"| LONG | ||
| TYPEOBJ -->|"Type Creation"| UNICODE | ||
| TYPEOBJ -->|"Type Creation"| DICT | ||
| TYPEOBJ -->|"Type Creation"| LIST | ||
| TYPEOBJ -->|"Callable Types"| FUNC | ||
| FUNC -->|"Method Binding"| METHOD | ||
| TYPEOBJ -->|"Class Creation"| CLASS | ||
| ``` | ||
|
|
||
| ### Memory Management Component Code Structure | ||
|
|
||
| ```mermaid | ||
| graph TB | ||
| subgraph "Memory Management Component Code" | ||
| subgraph "Allocation" | ||
| PYMEM["pymem.c<br/>Memory allocator<br/>Arena management"] | ||
| OBJIMPL["objimpl.h<br/>Object allocation<br/>Type-specific allocators"] | ||
| ARENA["pyarena.c<br/>Arena allocator<br/>Block management"] | ||
| end | ||
|
|
||
| subgraph "Garbage Collection" | ||
| GC["gc.c<br/>Reference counting<br/>Cycle detection"] | ||
| GCTHREAD["gc_free_threading.c<br/>Free-threaded GC<br/>Concurrent collection"] | ||
| WEAKREF["weakrefobject.c<br/>Weak references<br/>Callback system"] | ||
| end | ||
|
|
||
| subgraph "Memory Tracking" | ||
| TRACE["tracemalloc.c<br/>Memory profiling<br/>Allocation tracking"] | ||
| DEBUG["pydebug.h<br/>Debug macros<br/>Memory validation"] | ||
| end | ||
| end | ||
|
|
||
| PYMEM -->|"Object Allocation"| OBJIMPL | ||
| PYMEM -->|"Arena Management"| ARENA | ||
| OBJIMPL -->|"GC Integration"| GC | ||
| GC -->|"Concurrent GC"| GCTHREAD | ||
| GC -->|"Weak References"| WEAKREF | ||
| PYMEM -->|"Allocation Tracking"| TRACE | ||
| PYMEM -->|"Debug Validation"| DEBUG | ||
| ``` | ||
|
|
||
| ## Key Architectural Patterns | ||
|
|
||
| ### 1. Layered Architecture | ||
| - **Parser Layer**: Converts source code to AST | ||
| - **Compiler Layer**: Transforms AST to bytecode | ||
| - **VM Layer**: Executes bytecode | ||
| - **Object Layer**: Manages Python objects | ||
| - **Memory Layer**: Handles allocation and garbage collection | ||
|
|
||
| ### 2. Interpreter Pattern | ||
| - **Tier 1**: Traditional bytecode interpreter with adaptive specialization | ||
| - **Tier 2**: Micro-op interpreter for hot code paths | ||
| - **JIT**: Future machine code generation for performance-critical code | ||
|
|
||
| ### 3. Object-Oriented Design | ||
| - **Everything is an Object**: All Python values are objects | ||
| - **Type System**: Dynamic typing with runtime type checking | ||
| - **Method Resolution**: Dynamic method lookup and binding | ||
|
|
||
| ### 4. Memory Management | ||
| - **Reference Counting**: Immediate deallocation for most objects | ||
| - **Generational GC**: Cycle detection for complex object graphs | ||
| - **Arena Allocation**: Efficient memory management for small objects | ||
|
|
||
| ### 5. Extension System | ||
| - **C API**: Rich interface for C extensions | ||
| - **Module System**: Dynamic loading of Python and C modules | ||
| - **Import System**: Flexible module discovery and loading | ||
|
|
||
| ## Performance Optimizations | ||
|
|
||
| ### 1. Adaptive Specialization | ||
| - **Type Specialization**: Optimized code paths for common types | ||
| - **Inline Caching**: Fast method and attribute access | ||
| - **Superinstructions**: Combined bytecode operations | ||
|
|
||
| ### 2. Memory Optimizations | ||
| - **Free Lists**: Object reuse to reduce allocation overhead | ||
| - **Arena Allocation**: Reduced fragmentation and improved locality | ||
| - **Copy-on-Write**: Efficient string and tuple operations | ||
|
|
||
| ### 3. Execution Optimizations | ||
| - **Computed Gotos**: Fast opcode dispatch | ||
| - **Stack Caching**: Reduced memory access for local variables | ||
| - **Tier 2 Interpreter**: Optimized execution for hot code paths | ||
|
|
||
| This comprehensive C4 model provides a complete view of the CPython architecture, from high-level system interactions down to detailed code structure, enabling effective contribution to the CPython codebase. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't future proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted