Upload sections

mikebo93 · mikebo93 · commit 7210291d579d · 2025-03-26T22:08:20.000Z
diff --git a/chapter_compiler_frontend/Chapter_Summary.md b/chapter_compiler_frontend/Chapter_Summary.md
@@ -0,0 +1,52 @@
+# Chapter Summary
+
+-   Intermediate Representation (IR) serves as one of the fundamental
+    data structures of a compiler. It represents the transition from the
+    source language to the target language during the process of program
+    compilation.
+
+-   Classical compilers categorize IRs into three types based on their
+    structure: linear IR, graphical IR, and hybrid IR.
+
+-   The demands imposed by machine learning frameworks necessitate new
+    forms of IRs, as classical IRs fail to fully satisfy these
+    requirements. Therefore, innovative IRs that are more compatible
+    with these frameworks must be developed based on classical IRs.
+
+-   The central principle in automatic differentiation is the
+    decomposition of a program's arithmetic operations into a finite set
+    of basic operations. Knowing the derivative evaluation rules for all
+    these operations allows for the calculation of the derivative for
+    each basic operation. Subsequently, these results are aggregated
+    using the chain rule to obtain the derivative result for the entire
+    program.
+
+-   Automatic differentiation operates in two modes---forward-mode and
+    reverse-mode---based on the sequence adopted by the chain rule for
+    combining derivatives.
+
+-   Forward-mode automatic differentiation is applied when evaluating
+    the derivative of a network where the input dimension is smaller
+    than the output dimension. In contrast, reverse-mode automatic
+    differentiation is employed when the output dimension of a network
+    is smaller than the input dimension.
+
+-   Implementation methods for automatic differentiation encompass
+    elemental libraries, operator overloading, and source
+    transformation.
+
+-   Type systems, which are utilized to define various types, detail the
+    operations of each type and outline the interactions among types.
+    Comprising a set of types and the type-based rules that delineate
+    program behavior, type systems are extensively used in compilers,
+    interpreters, and static checking tools.
+
+-   Static analysis involves the inspection and verification of code
+    through lexical analysis, syntactic analysis, control flow analysis,
+    and data flow analysis, all of which are conducted without executing
+    the programs.
+
+-   The objective of compilation optimization is to boost the efficiency
+    of the IRs generated during the compilation process. Notably,
+    compilation optimization conducted at the frontend is
+    hardware-agnostic.
diff --git a/chapter_compiler_frontend/Frontend_Compilation_Optimization.md b/chapter_compiler_frontend/Frontend_Compilation_Optimization.md
@@ -0,0 +1,95 @@
+# Frontend Compilation Optimization
+
+Much like classical compilers, AI compilers implement compilation
+optimization to enhance the effectiveness of the IRs generated during
+the compilation process. This strategy reduces not only the length of
+the code and the time required for its compilation and execution but
+also diminishes the energy usage of processors during execution.
+Compilation optimization techniques can be divided into two categories:
+hardware-agnostic optimization and hardware-specific optimization.
+However, all optimization techniques applied at the frontend are
+inherently hardware-agnostic, as the frontend remains oblivious to the
+backend hardware specifics.
+
+## Process of Compilation Optimization
+
+Typically, compilation optimizers execute a sequence of optimization
+passes. In each pass, an IR is used as input, which then produces a
+revised IR as output. A single pass might incorporate several sub-passes
+and can be conducted once or multiple times.
+
+The overall success of compilation optimization significantly depends on
+the selection and ordering of optimization operations. Not only does the
+compiler execute various compilation optimization operations as needed,
+but it can also adjust the number of optimization passes along with the
+types and sequence of optimization operations. These adjustments are
+contingent upon the set level of compilation optimization, as
+illustrated in Figure :numref:`ch06/ch06-opt-pass`.
+
+![Structural layout of an optimization pass in compilationoptimization](../img/ch04/optimization_pass.png)
+:label:`ch06/ch06-opt-pass`
+
+## Prevalent Optimization Methods
+
+Today, a wide array of frontend compilation optimization methods exist.
+Analogously, machine learning frameworks also employ various
+optimization methods, although these diverge from those found in
+classical compilers. This section will detail three frequently employed
+and versatile frontend compilation optimization methods.
+
+### Elimination of Dead Code and Unreachable Code
+
+Dead code refers to segments of code that yield outputs not utilized by
+any other code, while unreachable code refers to segments of code that
+are not included in any valid control flow path. Figure
+:numref:`ch06/ch06-opt-pass-useless-code0-elimination`
+demonstrates these two types of code. The removal of dead or unreachable
+code can decrease the size of IRs and expedite both the compilation and
+execution of a program. These types of code can result from human errors
+or may manifest during other compilation optimizations.
+
+![Elimination of deadcode](../img/ch04/dead_code_elimination.png)
+:label:`ch06/ch06-opt-pass-useless-code0-elimination`
+
+In Chapter
+[\[subsec:conversion_between_and_combination_of_dynamic_and_static_graphs\]](#subsec:conversion_between_and_combination_of_dynamic_and_static_graphs){reference-type="ref"
+reference="subsec:conversion_between_and_combination_of_dynamic_and_static_graphs"},
+it was previously mentioned that the tracing method can be employed
+during the process of converting dynamic graphs to static graphs. The
+tracing method is considered highly effective in identifying dead code
+and unreachable code. Consequently, this step is often incorporated into
+the graph conversion procedure.
+
+### Constant Propagation and Constant Folding
+
+Constant propagation is a process that replaces specific constants with
+their known values during compilation. On the other hand, constant
+folding is a process that substitutes variables with constants when the
+results of multiple operations can be computed directly during
+compilation.
+Figure :numref:`ch06/ch06-opt-pass-constant-broadcast` depicts these two
+methods.
+
+![Constant propagation and constant foldingtechniques](../img/ch04/constant_propagation_and_constant_folding.png)
+:label:`ch06/ch06-opt-pass-constant-broadcast`
+
+### Common Subexpression Elimination
+
+In order to understand what common subexpression elimination entails,
+let's consider the following: If an expression E has been computed and
+the values of all its variables remain unchanged from the prior
+computation, E is identified as a common subexpression. This concept is
+visualized in
+Figure :numref:`ch06/ch06-opt-pass-CSE`. As such, E doesn't need to be
+computed again; it can be directly replaced with the expression result
+obtained from the preceding computation.
+
+![Common subexpression eliminationprocess](../img/ch04/common_subexpression_elimination.png)
+:label:`ch06/ch06-opt-pass-CSE`
+
+Common subexpression elimination, like the elimination of dead code and
+unreachable code, is typically carried out during the graph conversion
+process. In PyTorch, the torch script module provides a dedicated API
+for common subexpression elimination. This approach is inherent as it
+simplifies the identification of common subexpressions within
+torchscript.
diff --git a/chapter_compiler_frontend/Further_Reading.md b/chapter_compiler_frontend/Further_Reading.md
@@ -0,0 +1,10 @@
+# Further Reading
+
+1.  Microsoft Team proposes a series of modern AI compilers to enhance
+    the utilization of diverse heterogeneous hardware. This paper
+    introduces four distinct optimizations: hardware parallel
+    utilization, compilation efficiency, memory access improvement for
+    enhanced computing efficiency, and efficient control flow execution
+    on accelerators. For more details, see *AI Compiler Quartet*[^1]
+
+[^1]: <https://www.microsoft.com/en-us/research/blog/building-a-heavy-metal-quartet-of-ai-compilers/>
diff --git a/chapter_compiler_frontend/Index.md b/chapter_compiler_frontend/Index.md
@@ -0,0 +1,25 @@
+# AI Compiler Frontend
+
+Tailored for machine learning frameworks, an AI compiler is designed to
+convert Python-based machine learning programs into their optimized
+forms, enabling efficient native execution on heterogeneous processors.
+This chapter first outlines the typical architecture of an AI compiler
+before delving into the design of the compiler's frontend. The compiler
+frontend incorporates various techniques, including intermediate
+representations (IRs), automatic differentiation, type systems, static
+analysis, and compilation optimization.
+
+The learning objectives of this chapter include:
+
+-   Understanding the typical architecture of an AI compiler.
+
+-   Understanding the types and implementation of IRs in machine
+    learning frameworks.
+
+-   Understanding the methods of automatic differentiation implemented
+    in AI compilers.
+
+-   Understanding type systems and static analysis in AI compilers.
+
+-   Understanding common frontend compilation optimization methods used
+    by AI compilers.