Skip to content

Commit 2904bcd

Browse files
committed
Upload chapter3 programming model
1 parent 300a1a6 commit 2904bcd

File tree

101 files changed

+657
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+657
-0
lines changed
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Bridging Python and C/C++ Functions
2+
3+
Developers frequently encounter the need to incorporate custom operators
4+
into a machine learning framework. These operators implement new models,
5+
optimizers, data processing functions, and more. Custom operators, in
6+
particular, often require implementation in C/C++ to achieve optimized
7+
performance. They also have Python interfaces, facilitating developers
8+
to integrate custom operators with existing machine learning workflows
9+
written in Python. This section will delve into the implementation
10+
details of this process.
11+
12+
The Python interpreter, being implemented in C, enables the invocation
13+
of C and C++ functions within Python. Contemporary machine learning
14+
frameworks such as TensorFlow, PyTorch, and MindSpore rely on pybind11
15+
to automatically generate Python functions from underlying C and C++
16+
functions. This mechanism is known as *Python binding*. Prior to the
17+
advent of pybind11, Python binding was accomplished using one of the
18+
following approaches:
19+
20+
1. **C-APIs in Python**: This approach necessitates the inclusion of
21+
`Python.h` in C++ programs and the utilization of Python's C-APIs to
22+
execute Python operations. To effectively work with C-APIs,
23+
developers must possess a comprehensive understanding of Python's
24+
internal implementation, such as managing reference counting.
25+
26+
2. **Simplified Wrapper and Interface Generator (SWIG)**: SWIG serves
27+
as a bridge between C/C++ code and Python, and it played a
28+
significant role in the initial development of TensorFlow. Utilizing
29+
SWIG involves crafting intricate interface statements and relying on
30+
SWIG to automatically generate C code that interfaces with Python's
31+
C-APIs. However, due to the lack of readability in the generated
32+
code, the maintenance costs associated with it tend to be high.
33+
34+
3. **Python `ctypes` module**: This module encompasses a comprehensive
35+
range of types found in the C language and allows direct invocation
36+
of dynamic link libraries (DLLs). However, a limitation of this
37+
module is its heavy reliance on native C types, which results in
38+
insufficient support for customized types.
39+
40+
4. **CPython**: In basic terms, CPython can be described as the fusion
41+
of Python syntax with static types from the C language. It
42+
facilitates the retention of Python's syntax while automatically
43+
translating CPython functions into C/C++ code. This functionality
44+
empowers developers to seamlessly incorporate invocations of C/C++
45+
functions within the CPython environment.
46+
47+
5. **Boost::Python (a C++ library)**: Boost::Python allows for the
48+
exposure of C++ functions as Python functions. It operates on
49+
similar principles to Python's C-APIs but provides a more
50+
user-friendly interface. However, the reliance on the Boost library
51+
introduces a significant dependency on third-party components, which
52+
can be a potential drawback for Boost::Python.
53+
54+
In comparison to the above Python binding approaches, pybind11 shares
55+
similarities with Boost::Python in terms of simplicity and usability.
56+
However, pybind11 stands out due to its focus on supporting C++ 11 and
57+
eliminating dependencies on Boost. As a lightweight Python library,
58+
pybind11 is particularly suitable for exposing numerous Python functions
59+
in complex C++ projects such as the machine learning system discussed in
60+
this book. The combination of Code
61+
[\[ch02/code2.5.1\]](#ch02/code2.5.1){reference-type="ref"
62+
reference="ch02/code2.5.1"} and Code
63+
[\[ch02/code2.5.2\]](#ch02/code2.5.2){reference-type="ref"
64+
reference="ch02/code2.5.2"} is an example of adding a custom operator to
65+
Pytorch with the integration of C++ and Python:\
66+
In C++:
67+
68+
``` {#ch02/code2.5.1 caption="Custom Operator C++ Part" label="ch02/code2.5.1" style="Cpp"}
69+
//custom_add.cpp
70+
#include <torch/extension.h>
71+
#include <pybind11/pybind11.h>
72+
73+
torch::Tensor custom_add(torch::Tensor a, torch::Tensor b) {
74+
return a + b;
75+
}
76+
77+
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
78+
m.def("custom_add", &custom_add, "A custom add function");
79+
}
80+
```
81+
82+
In Python:
83+
84+
``` {#ch02/code2.5.2 caption="Custom Operator Python Part" label="ch02/code2.5.2"}
85+
import torch
86+
from torch.utils.cpp_extension import load
87+
88+
# Load the C++ extension
89+
custom_extension = load(
90+
name='custom_extension',
91+
sources=['custom_add.cpp'],
92+
verbose=True
93+
)
94+
# Use your custom add function
95+
a = torch.randn(10)
96+
b = torch.randn(10)
97+
c = custom_extension.custom_add(a, b)
98+
```
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Chapter Summary
2+
3+
1. In order to achieve a balance between usability and performance,
4+
modern machine learning systems utilize Python for frontend
5+
programming and C/C++ for backend programming.
6+
7+
2. It is expected from a machine learning framework to offer
8+
programming support for all aspects of a machine learning
9+
application workflow. This is usually delivered through high-level
10+
Python APIs, which facilitate activities such as data processing,
11+
model definition, loss function determination, model training, and
12+
model testing.
13+
14+
3. Large DNNs can be constructed by stacking neural network layers.
15+
16+
4. Various technologies are used to facilitate interoperability between
17+
Python and C, with pybind being a popular choice in machine learning
18+
frameworks.
19+
20+
5. Machine learning frameworks typically offer a variety of C/C++
21+
interfaces, allowing users to define and register operators
22+
implemented in C++. These operators enable users to create various
23+
framework extensions, such as high-performance models, data
24+
processing functions, and optimizers.
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Functional Programming
2+
3+
In the following, we will discuss the reasons behind the growing trend
4+
of incorporating functional programming into the design of machine
5+
learning frameworks.
6+
7+
## Benefits of Functional Programming
8+
9+
Training constitutes the most critical phase in machine learning, and
10+
the manner in which training is depicted hinges significantly on
11+
optimizer algorithms. Predominantly, contemporary machine learning tasks
12+
utilize first-order optimizers, favored for their ease of use. With
13+
machine learning advancing at a rapid pace, both software and hardware
14+
are incessantly updated to stay abreast. Consequently, an increasing
15+
number of researchers are beginning to investigate higher-order
16+
optimizers, noted for their superior convergence performance. Frequently
17+
utilized second-order optimizers, such as the Newton method,
18+
quasi-Newton method, and AdaHessians, necessitate the computation of a
19+
Hessian matrix incorporating second-order derivative information. Two
20+
considerable challenges arise from this computation: 1) how to manage
21+
such a hefty computational load efficiently; 2) how to express
22+
higher-order derivatives in programmatic language.
23+
24+
In recent times, numerous large AI models have been introduced, which
25+
include (with the number of parameters noted in parentheses) OpenAI
26+
GPT-3 (175B) in 2020; PanGu (100B), PanGu-$\alpha$ (200B), Google's
27+
Switch Transformer (1.6T), and WuDao (1.75T) in 2021; along with
28+
Facebook's NLLB-200 (54B) in 2022. The demand for ultra-large model
29+
training is escalating, and data parallelism alone cannot meet this
30+
growing requirement. Conversely, model parallelism demands manual model
31+
segmentation, a process that is time-intensive and laborious.
32+
Consequently, the main challenge future machine learning frameworks must
33+
overcome is how to actualize automatic parallelism. At its core, a
34+
machine learning model is a representation of a mathematical model.
35+
Hence, the ability to succinctly represent machine learning models has
36+
risen to a key concern in the design of programming paradigms for
37+
machine learning frameworks.
38+
39+
Recognizing the challenges presented by the practical implementation of
40+
machine learning frameworks, researchers have identified that functional
41+
programming could offer beneficial solutions. Functional programming, in
42+
computer science, is a programming paradigm that envisions computation
43+
as the evaluation of mathematical functions, actively avoiding state
44+
changes and data mutations. This paradigm harmonizes well with
45+
mathematical reasoning. Neural networks are composed of interconnected
46+
nodes, with each node performing basic mathematical operations.
47+
Functional programming languages allow developers to portray these
48+
mathematical operations in a language that closely mirrors the
49+
operations, enhancing the readability and maintainability of programs.
50+
Concurrently, in functional languages, functions are kept separate,
51+
simplifying the management of concurrency and parallelism.
52+
53+
In summary, functional programming is anticipated to confer the
54+
following benefits to machine learning frameworks:
55+
56+
1. It is suited for machine learning scenarios where higher-order
57+
derivatives are needed.
58+
59+
2. It simplifies the development of parallel programming interfaces.
60+
61+
3. It results in a more concise code representation.
62+
63+
## Framework Support for Functional Programming
64+
65+
Machine learning frameworks have increasing support for functional
66+
programming. In 2018, Google rolled out JAX. Contrary to traditional
67+
machine learning frameworks, JAX amalgamates neural network computation
68+
and numerical computation. Its interfaces are compatible with native
69+
data science interfaces in Python, such as NumPy and SciPy. Moreover,
70+
JAX extends distribution, vectorization, high-order derivation, and
71+
hardware acceleration in a functional programming style, characterized
72+
by Lambda closure and no side effects.
73+
74+
In 2020, Huawei introduced MindSpore, the functional differential
75+
programming architecture of which allows users to concentrate on the
76+
native mathematical expressions of machine learning models. In 2022,
77+
taking inspiration from Google's JAX, PyTorch launched functorch.
78+
Functorch is essentially a library aimed at providing composable vmap
79+
(vectorization) and autodiff transforms compatible with PyTorch modules
80+
and PyTorch autograd, thereby achieving excellent eager-mode
81+
performance. It can be inferred that functorch meets the requirements
82+
for distributed parallelism in PyTorch static graphs. Code
83+
[\[ch02/code2.4\]](#ch02/code2.4){reference-type="ref"
84+
reference="ch02/code2.4"} gives an example of functorch.
85+
86+
``` {#ch02/code2.4 caption="Functorch Example" label="ch02/code2.4"}
87+
from functorch import combine_state_for_ensemble, vmap
88+
minibatches = data[:num_models]
89+
models = [MLP().to(device) for _ in range(num_models)]
90+
fmodel, params, buffers = combine_state_for_ensemble(models)
91+
predictions1_vmap = vmap(fmodel, out_dims=1)(params, buffers, minibatches)
92+
```
93+
94+
Functorch introduces *vmap*, standing for \"vectorized map\". Its role
95+
is to adapt functions designed for individual inputs so that they can
96+
handle batches of inputs, therefore facilitating efficient vectorized
97+
calculations. Unlike the batch processing capabilities of standard
98+
PyTorch modules, vmap can convert any operation to be batch-aware
99+
without the need to alter the operation's original structure. Moreover,
100+
vmap offers greater flexibility to batch dimensions, allowing users to
101+
specify which dimension should be treated as the batch dimension
102+
(specifying the $out\_dim$ argument), a contrast to the default
103+
behaviour of the standard PyTorch where the first dimension is usually
104+
chosen as the batch dimension.
105+
106+
By tracing the development of machine learning frameworks, it becomes
107+
evident that the functional programming paradigm become increasingly
108+
popular. This can be attributed to functional programming's ability to
109+
express machine learning models intuitively and its convenience for
110+
implementing automatic differentiation, high-order derivation, and
111+
parallel execution. Consequently, future machine learning frameworks are
112+
likely to adopt layered frontend interfaces that are not exclusively
113+
designed for machine learning scenarios. Instead, they will primarily
114+
offer differential programming in their abstraction designs, making
115+
gradient-based software easy to be developed for various applications.

chapter_programming_model/Index.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Programming Model
2+
3+
Machine learning frameworks comprise various components that facilitate
4+
the efficient development of algorithms, data processing, model
5+
deployment, performance optimization, and hardware acceleration. When
6+
designing the application programming interfaces (APIs) for these
7+
components, a key consideration is striking the right balance between
8+
framework performance and usability. To achieve optimal performance,
9+
developers utilize C or C++, as these programming languages enable
10+
efficient invocation of the APIs provided by the operating system and
11+
hardware accelerators.
12+
13+
Regarding usability, machine learning framework users, including data
14+
scientists, biologists, chemists, and physicists, often possess strong
15+
industrial backgrounds and are skilled in using high-level scripting
16+
languages like Python, Matlab, R, and Julia. While these languages offer
17+
remarkable programming usability, they lack deep optimization
18+
capabilities for underlying hardware or operating systems compared to C
19+
and C++. Therefore, the core design objective of machine learning
20+
frameworks encompasses two aspects: providing easy-to-use APIs for
21+
implementing algorithms using high-level languages like Python, and
22+
providing low-level APIs centered around C and C++ to assist framework
23+
developers in implementing numerous high-performance components and
24+
efficiently executing them on hardware. This chapter describes
25+
strategies for achieving this design objective.
26+
27+
The chapter aims to achieve the following learning objectives:
28+
29+
1. Understanding the workflows and programming principles of machine
30+
learning frameworks.
31+
32+
2. Understanding the design of neural network models and layers.
33+
34+
3. Understanding how machine learning frameworks bridge Python and
35+
C/C++ functions.
36+
37+
4. Understanding the support for functional programming in machine
38+
learning frameworks.

0 commit comments

Comments
 (0)