Skip to content

Commit 9064fcf

Browse files
committed
Upload sections
1 parent 1009ee8 commit 9064fcf

File tree

2 files changed

+129
-0
lines changed

2 files changed

+129
-0
lines changed

.DS_Store

0 Bytes
Binary file not shown.
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# Machine Learning Workflow
2+
3+
In machine learning systems, the fundamental design objective of
4+
programming models is to offer comprehensive workflow programming
5+
support for developers. A typical machine learning task adheres to the
6+
workflow depicted in Figure :numref:`ch03/workflow`. This workflow involves loading the
7+
training dataset, training, testing, and debugging models. The following
8+
APIs are defined to facilitate customization within the workflow
9+
(assuming that high-level APIs are provided as Python functions):
10+
11+
1. **Data Processing API:** Users first require a data processing API
12+
to read datasets from a disk. Subsequently, they need to preprocess
13+
the data to make it suitable for input into machine learning models.
14+
Code `ch02/code2.2.1` is an example of how PyTorch can be used
15+
to load data and create data loaders for both training and testing
16+
purposes.
17+
18+
**ch02/code2.2.1**
19+
```python
20+
import pickle
21+
from torch.utils.data import Dataset, DataLoader
22+
data_path = '/path/to/data'
23+
dataset = pickle.load(open(data_path, 'rb')) # Example for a pkl file
24+
batch_size = ... # You can make it an argument of the script
25+
26+
class CustomDataset(Dataset):
27+
def __init__(self, data, labels):
28+
self.data = data
29+
self.labels = labels
30+
31+
def __len__(self):
32+
return len(self.data)
33+
34+
def __getitem__(self, idx):
35+
sample = self.data[idx]
36+
label = self.labels[idx]
37+
return sample, label
38+
39+
training_dataset = CustomDataset(dataset['training_data'], dataset['training_labels'])
40+
testing_dataset = CustomDataset(dataset['testing_data'], dataset['testing_labels'])
41+
42+
training_dataloader = DataLoader(training_dataset, batch_size=batch_size, shuffle=True) # Create a training dataloader
43+
testing_dataloader = DataLoader(testing_dataset, batch_size=batch_size, shuffle=False) # Create a testing dataloader
44+
```
45+
46+
2. **Model Definition API:** Once the data is preprocessed, users need
47+
a model definition API to define machine learning models. These
48+
models include model parameters and can perform inference based on
49+
given data. Code
50+
`ch02/code2.2.2` is an example of how to create a custom
51+
model in Pytorch:
52+
53+
**ch02/code2.2.2**
54+
```python
55+
import torch.nn as nn
56+
class CustomModel(nn.Module):
57+
def __init__(self, input_size, output_size):
58+
super(CustomModel, self).__init__()
59+
self.linear = nn.Linear(input_size, output_size) # A single linear layer
60+
61+
def forward(self, x):
62+
return self.linear(x)
63+
```
64+
65+
3. **Optimizer Definition API:** The outputs of models need to be
66+
compared with user labels, and their difference is evaluated using a
67+
loss function. The optimizer definition API enables users to define
68+
their own loss functions and import or define optimization
69+
algorithms based on the actual loss. These algorithms calculate
70+
gradients and update model parameters. Code
71+
`ch02/code2.2.3` is an example of an optimizer definition
72+
in Pytorch:
73+
74+
**ch02/code2.2.3**
75+
```python
76+
import torch.optim as optim
77+
import torch.nn
78+
model = CustomModel(...)
79+
# Optimizer definition (Adam, SGD, etc.)
80+
optimizer = optim.Adam(model.parameters(), lr=1e-4, momentum=0.9)
81+
loss = nn.CrossEntropyLoss() # Loss function definition
82+
```
83+
84+
4. **Training API:** Given a dataset, model, loss function, and
85+
optimizer, users require a training API to define a loop that reads
86+
data from datasets in a mini-batch mode. In this process, gradients
87+
are computed repeatedly, and model parameters are updated
88+
accordingly. This iterative update process is known as *training*.
89+
Code `ch02/code2.2.4` is an example of how to train a model in
90+
Pytorch:
91+
92+
**ch02/code2.2.4**
93+
```python
94+
device = "cuda:0" if torch.cuda.is_available() else "cpu" # Select your training device
95+
model.to(device) # Move the model to the training device
96+
model.train() # Set the model to train mode
97+
epochs = ... # You can make it an argument of the script
98+
for epoch in range(epochs):
99+
for batch_idx, (data, target) in enumerate(training_dataloader):
100+
data, target = data.to(device), target.to(device)
101+
optimizer.zero_grad() # zero the parameter gradients
102+
output = model(data) # Forward pass
103+
loss_value = loss(output, target) # Compute the loss
104+
loss_value.backward() # Backpropagation
105+
optimizer.step()
106+
```
107+
108+
5. **Testing and Debugging APIs:** Throughout the training process,
109+
users need a testing API to evaluate the accuracy of the model
110+
(training concludes when the accuracy exceeds the set goal).
111+
Additionally, a debugging API is necessary to verify the performance
112+
and correctness of the model. Code
113+
`ch02/code2.2.5` is an example of model evaluation in
114+
Pytorch:
115+
116+
**ch02/code2.2.5**
117+
```python
118+
model.eval() # Set the model to evaluation mode
119+
overall_accuracy = []
120+
for batch_idx, (data, target) in enumerate(testing_dataloader):
121+
data, target = data.to(device), target.to(device)
122+
output = model(data) # Forward pass
123+
accuracy = your_metrics(output, target) # Compute the accuracy
124+
overall_accuracy.append(accuracy) # Print the accuracy
125+
# For debugging, you can print logs inside the training or evaluation loop, or use python debugger.
126+
```
127+
128+
![Workflow within a machine learningsystem](../img/ch03/workflow.pdf)
129+
:label:`ch03/workflow`

0 commit comments

Comments
 (0)