Skip to content

Commit 7210291

Browse files
committed
Upload sections
1 parent ff202ef commit 7210291

File tree

4 files changed

+182
-0
lines changed

4 files changed

+182
-0
lines changed
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Chapter Summary
2+
3+
- Intermediate Representation (IR) serves as one of the fundamental
4+
data structures of a compiler. It represents the transition from the
5+
source language to the target language during the process of program
6+
compilation.
7+
8+
- Classical compilers categorize IRs into three types based on their
9+
structure: linear IR, graphical IR, and hybrid IR.
10+
11+
- The demands imposed by machine learning frameworks necessitate new
12+
forms of IRs, as classical IRs fail to fully satisfy these
13+
requirements. Therefore, innovative IRs that are more compatible
14+
with these frameworks must be developed based on classical IRs.
15+
16+
- The central principle in automatic differentiation is the
17+
decomposition of a program's arithmetic operations into a finite set
18+
of basic operations. Knowing the derivative evaluation rules for all
19+
these operations allows for the calculation of the derivative for
20+
each basic operation. Subsequently, these results are aggregated
21+
using the chain rule to obtain the derivative result for the entire
22+
program.
23+
24+
- Automatic differentiation operates in two modes---forward-mode and
25+
reverse-mode---based on the sequence adopted by the chain rule for
26+
combining derivatives.
27+
28+
- Forward-mode automatic differentiation is applied when evaluating
29+
the derivative of a network where the input dimension is smaller
30+
than the output dimension. In contrast, reverse-mode automatic
31+
differentiation is employed when the output dimension of a network
32+
is smaller than the input dimension.
33+
34+
- Implementation methods for automatic differentiation encompass
35+
elemental libraries, operator overloading, and source
36+
transformation.
37+
38+
- Type systems, which are utilized to define various types, detail the
39+
operations of each type and outline the interactions among types.
40+
Comprising a set of types and the type-based rules that delineate
41+
program behavior, type systems are extensively used in compilers,
42+
interpreters, and static checking tools.
43+
44+
- Static analysis involves the inspection and verification of code
45+
through lexical analysis, syntactic analysis, control flow analysis,
46+
and data flow analysis, all of which are conducted without executing
47+
the programs.
48+
49+
- The objective of compilation optimization is to boost the efficiency
50+
of the IRs generated during the compilation process. Notably,
51+
compilation optimization conducted at the frontend is
52+
hardware-agnostic.
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Frontend Compilation Optimization
2+
3+
Much like classical compilers, AI compilers implement compilation
4+
optimization to enhance the effectiveness of the IRs generated during
5+
the compilation process. This strategy reduces not only the length of
6+
the code and the time required for its compilation and execution but
7+
also diminishes the energy usage of processors during execution.
8+
Compilation optimization techniques can be divided into two categories:
9+
hardware-agnostic optimization and hardware-specific optimization.
10+
However, all optimization techniques applied at the frontend are
11+
inherently hardware-agnostic, as the frontend remains oblivious to the
12+
backend hardware specifics.
13+
14+
## Process of Compilation Optimization
15+
16+
Typically, compilation optimizers execute a sequence of optimization
17+
passes. In each pass, an IR is used as input, which then produces a
18+
revised IR as output. A single pass might incorporate several sub-passes
19+
and can be conducted once or multiple times.
20+
21+
The overall success of compilation optimization significantly depends on
22+
the selection and ordering of optimization operations. Not only does the
23+
compiler execute various compilation optimization operations as needed,
24+
but it can also adjust the number of optimization passes along with the
25+
types and sequence of optimization operations. These adjustments are
26+
contingent upon the set level of compilation optimization, as
27+
illustrated in Figure :numref:`ch06/ch06-opt-pass`.
28+
29+
![Structural layout of an optimization pass in compilationoptimization](../img/ch04/optimization_pass.png)
30+
:label:`ch06/ch06-opt-pass`
31+
32+
## Prevalent Optimization Methods
33+
34+
Today, a wide array of frontend compilation optimization methods exist.
35+
Analogously, machine learning frameworks also employ various
36+
optimization methods, although these diverge from those found in
37+
classical compilers. This section will detail three frequently employed
38+
and versatile frontend compilation optimization methods.
39+
40+
### Elimination of Dead Code and Unreachable Code
41+
42+
Dead code refers to segments of code that yield outputs not utilized by
43+
any other code, while unreachable code refers to segments of code that
44+
are not included in any valid control flow path. Figure
45+
:numref:`ch06/ch06-opt-pass-useless-code0-elimination`
46+
demonstrates these two types of code. The removal of dead or unreachable
47+
code can decrease the size of IRs and expedite both the compilation and
48+
execution of a program. These types of code can result from human errors
49+
or may manifest during other compilation optimizations.
50+
51+
![Elimination of deadcode](../img/ch04/dead_code_elimination.png)
52+
:label:`ch06/ch06-opt-pass-useless-code0-elimination`
53+
54+
In Chapter
55+
[\[subsec:conversion_between_and_combination_of_dynamic_and_static_graphs\]](#subsec:conversion_between_and_combination_of_dynamic_and_static_graphs){reference-type="ref"
56+
reference="subsec:conversion_between_and_combination_of_dynamic_and_static_graphs"},
57+
it was previously mentioned that the tracing method can be employed
58+
during the process of converting dynamic graphs to static graphs. The
59+
tracing method is considered highly effective in identifying dead code
60+
and unreachable code. Consequently, this step is often incorporated into
61+
the graph conversion procedure.
62+
63+
### Constant Propagation and Constant Folding
64+
65+
Constant propagation is a process that replaces specific constants with
66+
their known values during compilation. On the other hand, constant
67+
folding is a process that substitutes variables with constants when the
68+
results of multiple operations can be computed directly during
69+
compilation.
70+
Figure :numref:`ch06/ch06-opt-pass-constant-broadcast` depicts these two
71+
methods.
72+
73+
![Constant propagation and constant foldingtechniques](../img/ch04/constant_propagation_and_constant_folding.png)
74+
:label:`ch06/ch06-opt-pass-constant-broadcast`
75+
76+
### Common Subexpression Elimination
77+
78+
In order to understand what common subexpression elimination entails,
79+
let's consider the following: If an expression E has been computed and
80+
the values of all its variables remain unchanged from the prior
81+
computation, E is identified as a common subexpression. This concept is
82+
visualized in
83+
Figure :numref:`ch06/ch06-opt-pass-CSE`. As such, E doesn't need to be
84+
computed again; it can be directly replaced with the expression result
85+
obtained from the preceding computation.
86+
87+
![Common subexpression eliminationprocess](../img/ch04/common_subexpression_elimination.png)
88+
:label:`ch06/ch06-opt-pass-CSE`
89+
90+
Common subexpression elimination, like the elimination of dead code and
91+
unreachable code, is typically carried out during the graph conversion
92+
process. In PyTorch, the torch script module provides a dedicated API
93+
for common subexpression elimination. This approach is inherent as it
94+
simplifies the identification of common subexpressions within
95+
torchscript.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Further Reading
2+
3+
1. Microsoft Team proposes a series of modern AI compilers to enhance
4+
the utilization of diverse heterogeneous hardware. This paper
5+
introduces four distinct optimizations: hardware parallel
6+
utilization, compilation efficiency, memory access improvement for
7+
enhanced computing efficiency, and efficient control flow execution
8+
on accelerators. For more details, see *AI Compiler Quartet*[^1]
9+
10+
[^1]: <https://www.microsoft.com/en-us/research/blog/building-a-heavy-metal-quartet-of-ai-compilers/>

chapter_compiler_frontend/Index.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# AI Compiler Frontend
2+
3+
Tailored for machine learning frameworks, an AI compiler is designed to
4+
convert Python-based machine learning programs into their optimized
5+
forms, enabling efficient native execution on heterogeneous processors.
6+
This chapter first outlines the typical architecture of an AI compiler
7+
before delving into the design of the compiler's frontend. The compiler
8+
frontend incorporates various techniques, including intermediate
9+
representations (IRs), automatic differentiation, type systems, static
10+
analysis, and compilation optimization.
11+
12+
The learning objectives of this chapter include:
13+
14+
- Understanding the typical architecture of an AI compiler.
15+
16+
- Understanding the types and implementation of IRs in machine
17+
learning frameworks.
18+
19+
- Understanding the methods of automatic differentiation implemented
20+
in AI compilers.
21+
22+
- Understanding type systems and static analysis in AI compilers.
23+
24+
- Understanding common frontend compilation optimization methods used
25+
by AI compilers.

0 commit comments

Comments
 (0)