Skip to content

Commit 9224cfb

Browse files
committed
Upload section
1 parent fbaa64f commit 9224cfb

File tree

1 file changed

+160
-0
lines changed

1 file changed

+160
-0
lines changed
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# Neural Network Programming
2+
3+
To implement an AI model, a machine learning framework often takes a
4+
neural-network-centric programming interface. Regardless of their
5+
structures, neural networks are comprised of three elements: (1) Nodes
6+
serve as computational units that carry out the processing of a neural
7+
network, (2) Node Weights are variables updated by gradients during the
8+
training process, and (3) Node Connections specify how data (for
9+
example, activation and gradients) are transmitted within a neural
10+
network.
11+
12+
## Neural Network Layers
13+
14+
In order to simplify the construction of a neural network, many machine
15+
learning frameworks utilize a layer-oriented approach. This method
16+
organizes nodes, their weights, and their connections into cohesive
17+
neural network layers.
18+
19+
To illustrate this, we can examine the use of fully connected layers, a
20+
type of neural network layer. A distinguishing characteristic of fully
21+
connected layers is that every node in one layer is linked to every node
22+
in the succeeding layer. This method facilitates an extensive linear
23+
transformation of the feature space. By doing so, data can be transposed
24+
from a high-dimensional space to a lower-dimensional one, and
25+
conversely.
26+
27+
As shown in Figure :numref:`ch03/fc_layer_1`, the fully connected process transforms
28+
*n* data points from the input into an *m* sized feature space. This is
29+
followed by a further transformation into a *p* sized feature space.
30+
It's important to highlight that the quantity of parameters in a fully
31+
connected layer grows substantially --- from n$\times$m during the
32+
initial transformation to m$\times$p in the subsequent one.
33+
34+
<figure id="fig:ch03/fc_layer_1">
35+
<img src="../img/ch03/fc_layer_1.png" style="width:60.0%" />
36+
<figcaption>Fully connected layer illustration</figcaption>
37+
</figure>
38+
39+
Several types of neural network layers are widely used in various
40+
applications, including fully connected, convolution, pooling,
41+
recurrent, attention, batch normalization, and dropout layers. When
42+
dealing with problems related to time series association in sequential
43+
data, recurrent neural layers are commonly employed. However, recurrent
44+
neural layers encounter difficulties with vanishing or exploding
45+
gradients as the sequence length increases during the training process.
46+
The Long Short-Term Memory (LSTM) model was developed as a solution to
47+
this problem, enabling the capturing of long-term dependencies in
48+
sequential data. Code `ch02/code2.3.1` shows some examples of NN Layers in Pytorch:
49+
50+
**ch02/code2.3.1**
51+
```python
52+
fc_layer = nn.Linear(16, 5) # A fully connected layer with 16 input features and 5 output features
53+
relu_layer = nn.ReLU() # A ReLU activation layer
54+
conv_layer = nn.Conv2d(3, 16, 3, padding=1) # A convolutional layer with 3 input channels, 16 output channels, and a 3x3 kernel
55+
dropout_layer = nn.Dropout(0.2) # A dropout layer with 20% dropout rate
56+
batch_norm_layer = nn.BatchNorm2d(16) # A batch normalization layer with 16 channels
57+
layers = nn.Sequential(conv_layer, batch_norm_layer, relu_layer, fc_layer, dropout_layer) # A sequential container to combine layers
58+
```
59+
60+
In tasks related to natural language processing, the
61+
Sequence-to-Sequence (Seq2Seq) architecture applies recurrent neural
62+
layers in an encoder-decoder framework. Often, the decoder component of
63+
Seq2Seq integrates the attention mechanism, allowing the model to
64+
concentrate on pertinent segments of the input sequence. This
65+
amalgamation contributed to the inception of the *Transformer* model, a
66+
pivotal element in the architecture of the Bidirectional Encoder
67+
Representations from Transformers (BERT) and Generative Pre-trained
68+
Transformers (GPT) models. Both BERT and GPT have propelled significant
69+
progress in diverse language-related tasks.
70+
71+
## Neural Network Implementation
72+
73+
With an increase in the number of network layers, the manual management
74+
of training variables becomes progressively complex. Thankfully, most
75+
machine learning frameworks provide user-friendly APIs that encapsulate
76+
neural network layers into a base class, which is then inherited by all
77+
other layers. Notable examples include `mindspore.nn.Cell` in MindSpore
78+
and `torch.nn.Module` in PyTorch. Code
79+
`ch02/code2.3.2` gives a MLP Implementation using Pytorch.
80+
81+
**ch02/code2.3.2**
82+
```python
83+
class MLP(nn.Module):
84+
def __init__(self, input_size, hidden_size, num_classes, dropout_rate=0.5):
85+
super(MLP, self).__init__()
86+
self.fc1 = nn.Linear(input_size, hidden_size)
87+
self.bn1 = nn.BatchNorm1d(hidden_size)
88+
self.relu = nn.ReLU()
89+
self.dropout = nn.Dropout(dropout_rate)
90+
self.fc2 = nn.Linear(hidden_size, num_classes)
91+
92+
def forward(self, x):
93+
out = self.fc1(x)
94+
out = self.bn1(out)
95+
out = self.relu(out)
96+
out = self.dropout(out)
97+
out = self.fc2(out)
98+
return out
99+
```
100+
101+
Figure :numref:`ch03/model_build` demonstrates the intricate process of
102+
constructing a neural network. The base class plays a pivotal role in
103+
initializing training parameters, managing their status, and outlining
104+
the computation process. Conversely, the neural network model implements
105+
functions to administer the network layers and their associated
106+
parameters. Both MindSpore's Cell and PyTorch's Module efficiently serve
107+
these functions. Notably, Cell and Module function not just as model
108+
abstraction methods but also as base classes for all networks.
109+
110+
Existing model abstraction strategies can be divided into two
111+
categories. The first involves the abstraction of two methods: Layer
112+
(which oversees parameter construction and forward computation for an
113+
individual neural network layer) and Model (which manages the
114+
connection, combination of neural network layers, and administration of
115+
layer parameters). The second category combines Layer and Model into a
116+
single method, representing both an individual neural network layer and
117+
a model composed of multiple layers. Cell and Module implementations
118+
fall into this second category.
119+
120+
<figure id="fig:ch03/model_build">
121+
<embed src="../img/ch03/model_build.pdf" style="width:90.0%" />
122+
<figcaption>Comprehensive neural network construction
123+
process</figcaption>
124+
</figure>
125+
126+
Figure :numref:`ch03/cell_abstract` portrays a universal method for
127+
designing the abstraction of a neural network layer. The constructor
128+
uses the `OrderedDict` class from the Python `collections` module to
129+
store initialized neural network layers and their corresponding
130+
parameters. This results in an ordered output, which is more compatible
131+
with stacked deep learning models compared to an unordered `Dict`. The
132+
management of neural network layers and parameters is conducted within
133+
the `__setattr__` method. Upon detecting that an attribute pertains to a
134+
neural network layer or represents a layer parameter, `__setattr__`
135+
records the attribute appropriately.
136+
137+
In the neural network model, the computation process is vital. This
138+
process is defined by reloading the `__call__` method during the
139+
implementation of neural network layers. To acquire the training
140+
parameters, the base class traverses all network layers. All retrieved
141+
training parameters are then conveyed to the optimizer through the
142+
assigned interface that returns such parameters. This text, however,
143+
only touches on a few significant methods.
144+
145+
Concerning custom methods, it is often required to implement techniques
146+
for inserting/deleting parameters, adding/removing neural network
147+
layers, and retrieving neural network model information.
148+
149+
<figure id="fig:ch03/cell_abstract">
150+
<img src="../img/ch03/cell_abstract.png" style="width:90.0%" />
151+
<figcaption>Abstraction technique of neural network base
152+
classes</figcaption>
153+
</figure>
154+
155+
In order to preserve simplicity, we provide a condensed overview of the
156+
base class implementation for neural network interface layers. In
157+
practical applications, users are typically unable to directly reload
158+
the `__call__` method responsible for computation. Instead, an operation
159+
method is usually defined outside of `__call__`, which users can invoke
160+
to utilize `__call__`.

0 commit comments

Comments
 (0)