1-
21Compiler design
32===============
43
76
87In CPython, the compilation from source code to bytecode involves several steps:
98
10- 1 . Tokenize the source code [ Parser/lexer/] ( ../Parser/lexer/ )
11- and [ Parser/tokenizer/] ( ../Parser/tokenizer/ ) .
9+ 1 . Tokenize the source code [ Parser/lexer/] ( ../Parser/lexer )
10+ and [ Parser/tokenizer/] ( ../Parser/tokenizer ) .
12112 . Parse the stream of tokens into an Abstract Syntax Tree
1312 [ Parser/parser.c] ( ../Parser/parser.c ) .
14133 . Transform AST into an instruction sequence
@@ -134,9 +133,8 @@ this case) a `stmt_ty` struct with the appropriate initialization. The
134133` FunctionDef() ` constructor function sets 'kind' to ` FunctionDef_kind ` and
135134initializes the * name* , * args* , * body* , and * attributes* fields.
136135
137- See also
138- [ Green Tree Snakes - The missing Python AST docs] ( https://greentreesnakes.readthedocs.io/en/latest )
139- by Thomas Kluyver.
136+ See also [ Green Tree Snakes - The missing Python AST docs] (
137+ https://greentreesnakes.readthedocs.io/en/latest ) by Thomas Kluyver.
140138
141139Memory management
142140=================
@@ -260,33 +258,33 @@ manually -- `generic`, `identifier` and `int`. These types are found in
260258[ Include/internal/pycore_asdl.h] ( ../Include/internal/pycore_asdl.h ) .
261259Functions and macros for creating ` asdl_xx_seq * ` types are as follows:
262260
263- ` _Py_asdl_generic_seq_new(Py_ssize_t, PyArena *) `
264- Allocate memory for an ` asdl_generic_seq ` of the specified length
265- ` _Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *) `
266- Allocate memory for an ` asdl_identifier_seq ` of the specified length
267- ` _Py_asdl_int_seq_new(Py_ssize_t, PyArena *) `
268- Allocate memory for an ` asdl_int_seq ` of the specified length
261+ * ` _Py_asdl_generic_seq_new(Py_ssize_t, PyArena *) ` :
262+ Allocate memory for an ` asdl_generic_seq ` of the specified length
263+ * ` _Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *) ` :
264+ Allocate memory for an ` asdl_identifier_seq ` of the specified length
265+ * ` _Py_asdl_int_seq_new(Py_ssize_t, PyArena *) ` :
266+ Allocate memory for an ` asdl_int_seq ` of the specified length
269267
270268In addition to the three types mentioned above, some ASDL sequence types are
271269automatically generated by [ Parser/asdl_c.py] ( ../Parser/asdl_c.py ) and found in
272270[ Include/internal/pycore_ast.h] ( ../Include/internal/pycore_ast.h ) .
273271Macros for using both manually defined and automatically generated ASDL
274272sequence types are as follows:
275273
276- ` asdl_seq_GET(asdl_xx_seq *, int) `
277- Get item held at a specific position in an ` asdl_xx_seq `
278- ` asdl_seq_SET(asdl_xx_seq *, int, stmt_ty) `
279- Set a specific index in an ` asdl_xx_seq ` to the specified value
274+ * ` asdl_seq_GET(asdl_xx_seq *, int) ` :
275+ Get item held at a specific position in an ` asdl_xx_seq `
276+ * ` asdl_seq_SET(asdl_xx_seq *, int, stmt_ty) ` :
277+ Set a specific index in an ` asdl_xx_seq ` to the specified value
280278
281- Untyped counterparts exist for some of the typed macros. These are useful
279+ Untyped counterparts exist for some of the typed macros. These are useful
282280when a function needs to manipulate a generic ASDL sequence:
283281
284- ` asdl_seq_GET_UNTYPED(asdl_seq *, int) `
285- Get item held at a specific position in an ` asdl_seq `
286- ` asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty) `
287- Set a specific index in an ` asdl_seq ` to the specified value
288- ` asdl_seq_LEN(asdl_seq *) `
289- Return the length of an ` asdl_seq ` or ` asdl_xx_seq `
282+ * ` asdl_seq_GET_UNTYPED(asdl_seq *, int) ` :
283+ Get item held at a specific position in an ` asdl_seq `
284+ * ` asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty) ` :
285+ Set a specific index in an ` asdl_seq ` to the specified value
286+ * ` asdl_seq_LEN(asdl_seq *) ` :
287+ Return the length of an ` asdl_seq ` or ` asdl_xx_seq `
290288
291289Note that typed macros and functions are recommended over their untyped
292290counterparts. Typed macros carry out checks in debug mode and aid
@@ -379,33 +377,33 @@ arguments to a node that used the '*' modifier).
379377
380378Emission of bytecode is handled by the following macros:
381379
382- * ` ADDOP(struct compiler *, location, int) `
383- add a specified opcode
384- * ` ADDOP_IN_SCOPE(struct compiler *, location, int) `
385- like ` ADDOP ` , but also exits current scope; used for adding return value
386- opcodes in lambdas and closures
387- * ` ADDOP_I(struct compiler *, location, int, Py_ssize_t) `
388- add an opcode that takes an integer argument
389- * ` ADDOP_O(struct compiler *, location, int, PyObject *, TYPE) `
390- add an opcode with the proper argument based on the position of the
391- specified PyObject in PyObject sequence object, but with no handling of
392- mangled names; used for when you
393- need to do named lookups of objects such as globals, consts, or
394- parameters where name mangling is not possible and the scope of the
395- name is known; * TYPE* is the name of PyObject sequence
396- (` names ` or ` varnames ` )
397- * ` ADDOP_N(struct compiler *, location, int, PyObject *, TYPE) `
398- just like ` ADDOP_O ` , but steals a reference to PyObject
399- * ` ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE) `
400- just like ` ADDOP_O ` , but name mangling is also handled; used for
401- attribute loading or importing based on name
402- * ` ADDOP_LOAD_CONST(struct compiler *, location, PyObject *) `
403- add the ` LOAD_CONST ` opcode with the proper argument based on the
404- position of the specified PyObject in the consts table.
405- * ` ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *) `
406- just like ` ADDOP_LOAD_CONST_NEW ` , but steals a reference to PyObject
407- * ` ADDOP_JUMP(struct compiler *, location, int, basicblock *) `
408- create a jump to a basic block
380+ * ` ADDOP(struct compiler *, location, int) ` :
381+ add a specified opcode
382+ * ` ADDOP_IN_SCOPE(struct compiler *, location, int) ` :
383+ like ` ADDOP ` , but also exits current scope; used for adding return value
384+ opcodes in lambdas and closures
385+ * ` ADDOP_I(struct compiler *, location, int, Py_ssize_t) ` :
386+ add an opcode that takes an integer argument
387+ * ` ADDOP_O(struct compiler *, location, int, PyObject *, TYPE) ` :
388+ add an opcode with the proper argument based on the position of the
389+ specified PyObject in PyObject sequence object, but with no handling of
390+ mangled names; used for when you
391+ need to do named lookups of objects such as globals, consts, or
392+ parameters where name mangling is not possible and the scope of the
393+ name is known; * TYPE* is the name of PyObject sequence
394+ (` names ` or ` varnames ` )
395+ * ` ADDOP_N(struct compiler *, location, int, PyObject *, TYPE) ` :
396+ just like ` ADDOP_O ` , but steals a reference to PyObject
397+ * ` ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE) ` :
398+ just like ` ADDOP_O ` , but name mangling is also handled; used for
399+ attribute loading or importing based on name
400+ * ` ADDOP_LOAD_CONST(struct compiler *, location, PyObject *) ` :
401+ add the ` LOAD_CONST ` opcode with the proper argument based on the
402+ position of the specified PyObject in the consts table.
403+ * ` ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *) ` :
404+ just like ` ADDOP_LOAD_CONST_NEW ` , but steals a reference to PyObject
405+ * ` ADDOP_JUMP(struct compiler *, location, int, basicblock *) ` :
406+ create a jump to a basic block
409407
410408The ` location ` argument is a struct with the source location to be
411409associated with this instruction. It is typically extracted from an
@@ -433,7 +431,7 @@ Finally, the sequence of pseudo-instructions is converted into actual
433431bytecode. This includes transforming pseudo instructions into actual instructions,
434432converting jump targets from logical labels to relative offsets, and
435433construction of the [ exception table] ( exception_handling.md ) and
436- [ locations table] ( locations .md) .
434+ [ locations table] ( code_objects .md#source-code-locations ) .
437435The bytecode and tables are then wrapped into a ` PyCodeObject ` along with additional
438436metadata, including the ` consts ` and ` names ` arrays, information about function
439437reference to the source code (filename, etc). All of this is implemented by
@@ -453,7 +451,7 @@ in [Python/ceval.c](../Python/ceval.c).
453451Important files
454452===============
455453
456- * [ Parser/] ( ../Parser/ )
454+ * [ Parser/] ( ../Parser )
457455
458456 * [ Parser/Python.asdl] ( ../Parser/Python.asdl ) :
459457 ASDL syntax file.
@@ -534,7 +532,7 @@ Important files
534532 * [ Python/instruction_sequence.c] ( ../Python/instruction_sequence.c ) :
535533 A data structure representing a sequence of bytecode-like pseudo-instructions.
536534
537- * [ Include/] ( ../Include/ )
535+ * [ Include/] ( ../Include )
538536
539537 * [ Include/cpython/code.h] ( ../Include/cpython/code.h )
540538 : Header file for [ Objects/codeobject.c] ( ../Objects/codeobject.c ) ;
@@ -556,7 +554,7 @@ Important files
556554 : Declares ` _PyAST_Validate() ` external (from [ Python/ast.c] ( ../Python/ast.c ) ).
557555
558556 * [ Include/internal/pycore_symtable.h] ( ../Include/internal/pycore_symtable.h )
559- : Header for [ Python/symtable.c] ( ../Python/symtable.c ) .
557+ : Header for [ Python/symtable.c] ( ../Python/symtable.c ) .
560558 ` struct symtable ` and ` PySTEntryObject ` are defined here.
561559
562560 * [ Include/internal/pycore_parser.h] ( ../Include/internal/pycore_parser.h )
@@ -570,7 +568,7 @@ Important files
570568 by
571569 [ Tools/cases_generator/opcode_id_generator.py] ( ../Tools/cases_generator/opcode_id_generator.py ) .
572570
573- * [ Objects/] ( ../Objects/ )
571+ * [ Objects/] ( ../Objects )
574572
575573 * [ Objects/codeobject.c] ( ../Objects/codeobject.c )
576574 : Contains PyCodeObject-related code.
@@ -579,7 +577,7 @@ Important files
579577 : Contains the ` frame_setlineno() ` function which should determine whether it is allowed
580578 to make a jump between two points in a bytecode.
581579
582- * [ Lib/] ( ../Lib/ )
580+ * [ Lib/] ( ../Lib )
583581
584582 * [ Lib/opcode.py] ( ../Lib/opcode.py )
585583 : opcode utilities exposed to Python.
@@ -591,7 +589,7 @@ Important files
591589Objects
592590=======
593591
594- * [ Locations] ( locations .md) : Describes the location table
592+ * [ Locations] ( code_objects .md#source-code-locations ) : Describes the location table
595593* [ Frames] ( frames.md ) : Describes frames and the frame stack
596594* [ Objects/object_layout.md] ( ../Objects/object_layout.md ) : Describes object layout for 3.11 and later
597595* [ Exception Handling] ( exception_handling.md ) : Describes the exception table
0 commit comments