Skip to content

Commit d6c010b

Browse files
committed
Upload sections
1 parent 7210291 commit d6c010b

File tree

3 files changed

+265
-0
lines changed

3 files changed

+265
-0
lines changed
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Overview of AI Compiler Frontends
2+
3+
Figure :numref:`ch04/compiler_frontend_structure` depicts the typical
4+
structure of the AI compiler frontend within a machine learning
5+
framework. As AI compilers parse source programs similarly to classical
6+
compilers, we will not detail the parsing process here. Instead, we will
7+
explore a feature unique to the compiler frontend in a machine learning
8+
framework - its automatic differentiation functionality. To enact
9+
automatic differentiation, the machine learning framework requires a new
10+
IR structure built upon classical IRs. Consequently, this section
11+
concentrates on IRs and automatic differentiation, and later provides a
12+
succinct introduction to basic compiler concepts, including type
13+
systems, static analysis, and frontend optimization.
14+
15+
![Typical structure of an AI compilerfrontend](../img/ch04/compiler_frontend_structure.png)
16+
:label:`ch04/compiler_frontend_structure`
17+
18+
An **Intermediate Representation** is a data structure, or a form of
19+
code, employed by a compiler to represent source code. Essentially, an
20+
IR serves as a bridge between a source language and a target language
21+
during the compilation process. In classical compilers, IRs are divided
22+
into linear IR, graphical IR, and hybrid IR. However, as these classical
23+
IRs do not provide the comprehensive range of functionalities required
24+
by machine learning frameworks, developers have extended classical IRs
25+
and proposed numerous new IRs specifically for machine learning
26+
frameworks.
27+
28+
**Automatic Differentiation** is a method used to compute derivatives
29+
and efficiently resolve symbols for computational graphs. Combining the
30+
benefits of both symbolic and numerical differentiation while mitigating
31+
their shortcomings, automatic differentiation proves particularly
32+
valuable in calculating the gradient of a function. Modern AI
33+
algorithms, such as deep learning algorithms, use vast amounts of data
34+
to learn models with various parameters, and typically employ a gradient
35+
descent approach to update these parameters. Therefore, automatic
36+
differentiation is crucial to deep learning and becomes an integral
37+
component of training algorithms. Automatic differentiation generally
38+
resolves IR symbols during the frontend optimization process to generate
39+
new IRs with gradient functions.
40+
41+
**Type Systems and Static Analysis** are incorporated into the compiler
42+
frontend to help reduce potential runtime errors. A type system can
43+
avert type errors during program execution, while static analysis offers
44+
insights and other information for compilation optimization, effectively
45+
reducing issues like structural errors and security vulnerabilities in
46+
program code.
47+
48+
**Frontend Compilation Optimization** aims to tackle code efficiency
49+
issues. It is a significant aspect in both classical compilers and
50+
machine learning frameworks and is independent of specific hardware
51+
types.
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Overview of AI Compilers
2+
3+
Like classical compilers, AI compilers also convert user-written code
4+
into efficient machine-executable code. In the following, we delve into
5+
the intricacies of AI compilers, discussing various concepts inherent to
6+
general-purpose compilers such as ahead of time (AOT), just in time
7+
(JIT), intermediate representations (IRs), pass-based optimization,
8+
abstract syntax tree, side effects, and closures. Our focus will be
9+
primarily on the distinctive design and functionality of AI compilers as
10+
compared to classical compilers, rather than offering definitions of
11+
these concepts, as these can be found in numerous other compiler-related
12+
textbooks.
13+
14+
The design of AI compilers is significantly influenced by classical
15+
compilers like the Low Level Virtual Machine (LLVM). Thus, gaining an
16+
understanding of the basic architecture of the LLVM compiler, depicted
17+
in Figure :numref:`ch04/llvm-basic`, will be beneficial.
18+
19+
![Basic architecture of the LLVMcompiler](../img/ch04/LLVM_basic_architecture.png)
20+
:label:`ch04/llvm-basicwidth="\\linewidth"`
21+
22+
The LLVM compiler consists of three components: the frontend,
23+
intermediate representations, and the backend. The frontend converts
24+
high-level languages into IRs. The backend then transforms these IRs
25+
into machine instructions executable on the target hardware. As their
26+
name implies, IRs serve as a transition phase from the frontend to the
27+
backend, where necessary optimizations can take place. The architecture
28+
of the LLVM compiler ensures that IRs are reusable and compatible with
29+
any newly introduced frontend or hardware. While IRs can exist on one or
30+
more levels, LLVM typically uses a one-level structure, meaning the
31+
frontend and backend optimizations share the same set of IRs.
32+
33+
AI compilers, on the other hand, commonly employ a multi-level IR
34+
structure. An example is the multi-level IR (MLIR) design adopted by
35+
TensorFlow, as depicted in Figure
36+
:numref:`ch04/TF-IR`.
37+
TensorFlow's MLIR comprises three levels of IRs: the TensorFlow graph
38+
IR, the XLA HLO IR, and hardware-specific LLVM IR or TPU IR. The
39+
subsequent sections briefly outline these levels and their corresponding
40+
compilation optimization processes.
41+
42+
![TensorFlow's multi-level IRdesign](../img/ch04/TensorFlow-IR.png)
43+
:label:`ch04/TF-IRwidth="\\linewidth"`
44+
45+
The process of optimization in computational graphs is known as graph
46+
compilation optimization. The first level of IR, the graph IR, carries
47+
out optimization and operations (e.g., graph optimization and graph
48+
segmentation) for an entire graph. While this complete-graph IR is
49+
suitable for static graph execution, it proves challenging for
50+
hardware-specific optimization due to the absence of hardware
51+
information. To address this, hardware-specific generic compilation
52+
optimization is applied at the mid-level of IRs. Platforms like XLA,
53+
Tensor RT, and MindSpore's graph kernel fusion enhance the execution
54+
performance of various neural networks on specific hardware by executing
55+
operator fusion and other optimizations for different hardware types.
56+
57+
The final level of IR deals exclusively with a certain type of hardware
58+
accelerator and often comes bundled with a hardware vendor's compiler.
59+
For instance, the TBE compiler, paired with the Ascend hardware, is
60+
based on HalideIR as its efficient execution operators are generated
61+
based on TVM's HalideIR.
62+
63+
The multi-level IR design grants IRs enhanced flexibility and
64+
facilitates more efficient pass-based optimization for each specific IR
65+
level. However, this design has limitations. First, achieving fully
66+
compatible IR transformation across different levels is challenging due
67+
to the substantial engineering effort required and potential information
68+
loss during the transformation. Optimization carried out at one IR level
69+
might eliminate some information, and the implications of this removal
70+
must be evaluated at the next level. As a result, IR transformation
71+
imposes stricter constraints on the sequence in which optimization
72+
occurs. Second, the decision of at which of two adjacent levels to
73+
perform certain IR optimizations presents a dilemma for framework
74+
developers. Lastly, because different IR levels can define different
75+
operator granularities, some accuracy might be compromised.
76+
77+
To mitigate these drawbacks, the AI compiler in the MindSpore machine
78+
learning framework uses a unified IR design known as MindIR. Figure
79+
:numref:`ch04/msflow`
80+
illustrates the internal execution process of MindSpore's AI compiler.
81+
In this process, the compiler frontend handles graph compilation and
82+
hardware-agnostic optimization, while the compiler backend conducts
83+
tasks like hardware-specific optimization and operator selection.
84+
85+
![Working process of MindSpore's AIcompiler](../img/ch04/compiler_process.png)
86+
:label:`ch04/msflowwidth="\\linewidth"`
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Type Systems and Static Analysis
2+
3+
In the realm of compiler frontends, type systems and static analysis
4+
play instrumental roles in bolstering the compiler's abstraction
5+
prowess, while simultaneously mitigating potential errors that may arise
6+
during program runtime. This section delves into the basic principles,
7+
functionalities, and quintessential examples related to type systems and
8+
static analysis.
9+
10+
## Type Systems
11+
12+
In the context of programming languages, 'types' represent certain
13+
attributes, which could be numerical values, expressions, or functions.
14+
Type systems, which define these varied types, also determine the
15+
operations applicable to each type and orchestrate the interactions
16+
among these types. Essentially, a type system comprises a set of types
17+
and type-oriented rules that dictate the behavior of a program. They
18+
find extensive applications in compilers, interpreters, and static
19+
checking tools, offering the following capabilities:
20+
21+
1. **Precision**: Type systems in compilers deploy type checking to
22+
detect potential runtime errors, thus enhancing runtime safety.
23+
Leveraging type inference and type checking, the compiler can
24+
identify the majority of type-associated exceptions and errors,
25+
thereby averting runtime errors such as those triggered by program
26+
exceptions. This also ensures memory safety and thwarts invalid
27+
computations and semantic logic errors between types.
28+
29+
2. **Optimization**: The information obtained from static type checking
30+
enables the compiler to execute more efficient instructions, thereby
31+
reducing the runtime duration.
32+
33+
3. **Abstraction**: A type system, when employed with adept
34+
abstraction, can significantly boost system performance, given the
35+
system remains secure. Such streamlined abstraction allows
36+
developers to concentrate their efforts on high-level design.
37+
38+
4. **Readability**: The use of explicit type declarations amplifies
39+
code readability, enabling readers to grasp the program code more
40+
effectively.
41+
42+
Machine learning frameworks frequently use Python, a both dynamically
43+
and strongly typed language, as the frontend language for describing
44+
neural network model structures. Python's simplicity and ease of
45+
development have earned its popularity, despite its slower execution due
46+
to its interpretative execution mode.
47+
48+
While Python offers users dynamic and flexible semantics at the
49+
frontend, the backend framework demands static and strongly typed IRs
50+
that are optimization-friendly, to generate efficient backend code. To
51+
transform Python frontend representations into their equivalent static
52+
and strongly typed IRs, we require an effective and trustworthy static
53+
analysis method to enhance both development and execution efficiency.
54+
55+
A notable example is the Hindley--Milner (HM) type system---a type
56+
system that caters to the simply typed lambda calculus with parametric
57+
polymorphism. Initially proposed by J. Roger Hindley , the HM type
58+
system was subsequently expanded and validated by Robin Milner . Later,
59+
Luis Damas conducted a comprehensive formal analysis and proof of this
60+
system , further extending it to support polymorphic references. The HM
61+
type system is designed to infer the type of any expression
62+
automatically, without requiring any given type annotations. It employs
63+
a versatile algorithm to represent expressions using simple symbols and
64+
infer clear and intuitive definitions. This type system is widely used
65+
for type inference and type checking in the design of programming
66+
languages such as Haskell and OCaml.
67+
68+
## Static Analysis
69+
70+
Once a type system has been established, we must then construct a static
71+
analysis system. This will allow the compiler to perform static checking
72+
and analysis of IRs. Initially, the syntax parser deciphers the program
73+
code and forms an abstract syntax tree based on the resultant data,
74+
which subsequently generates the corresponding IR. As this IR lacks the
75+
abstract information stipulated in the type system, a static analysis
76+
module is needed to process and scrutinize the IR. This paves the way
77+
for a statically and strongly typed IR, which is indispensable for
78+
subsequent steps such as compilation optimization, automatic
79+
parallelization, and automatic differentiation. During the process of
80+
compiling program code, the frontend compiler might execute static
81+
analysis several times. In certain frameworks, the decision to terminate
82+
compilation optimization could be based on the outcome of static
83+
analysis.
84+
85+
The static analysis module is responsible for executing operations like
86+
type inference and generic specialization on IRs, utilizing abstract
87+
interpretations. Alongside these processes, the following operations are
88+
also undertaken:
89+
90+
1. **Abstract Interpretation**: This involves an abstract interpreter
91+
creating a generalized abstraction of a language's semantics,
92+
garnering only the attributes needed for subsequent optimization,
93+
and carrying out interpretive execution on ambiguous aspects.
94+
Abstract values typically include aspects like the types and
95+
dimensions of variables.
96+
97+
2. **Type Inference**: Based on abstract interpretation, the compiler
98+
can infer the abstract types of variables or expressions within the
99+
program code. This process is integral to facilitating subsequent
100+
compilation optimization that hinges on type information.
101+
102+
3. **Generic Specialization**: During the compilation phase, the
103+
compiler carries out type inference, a necessary precursor for
104+
generic specialization. This helps determine the type of function to
105+
be invoked. Subsequently, the compiler conducts type replacement
106+
(provided it can supply the context of types), generating a distinct
107+
function method for each type through generic specialization.
108+
109+
To illustrate the implementation of the static analysis module, we can
110+
consider the example of the MindSpore framework. MindSpore employs
111+
abstract interpretation to perform interpretive execution on uncertain
112+
abstract semantics, thereby acquiring abstract values. These abstract
113+
values for each node in a function graph represent the anticipated
114+
static program information. Within an abstract interpretation method,
115+
interpretive execution commences from the entry point of a top-level
116+
function graph in MindIR. This is followed by topological sorting of all
117+
nodes in the function graph, and the recursive inference of the abstract
118+
value for each node, based on node semantics. If there are any function
119+
subgraphs involved, interpretive execution is carried out within each
120+
subgraph recursively. The outcome of this process is the abstract value
121+
of the top-level function's output node. The static analysis module in
122+
MindSpore consists of several components, such as the abstract domain
123+
module, cache module, semantics inference module, and control flow
124+
processing module, as illustrated in
125+
Figure :numref:`ch04/ch04-compiler-frontend`.
126+
127+
![Static analysismodule](../img/ch04/static_analysis_module.png)
128+
:label:`ch04/ch04-compiler-frontend`

0 commit comments

Comments
 (0)