Skip to content

Commit ef7e353

Browse files
YanKE01laride
authored andcommitted
feat(touch_digit): add digital prediction model based on esp_dl
1 parent d505abf commit ef7e353

25 files changed

+1799
-12
lines changed
10.1 KB
Loading
33.7 KB
Loading

docs/en/ai/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,5 @@ AI
77
.. toctree::
88
:maxdepth: 1
99

10-
OpenAI <openai>
10+
OpenAI <openai>
11+
TouchDigitRecognition <touch_digit_recognition>
Lines changed: 310 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,310 @@
1+
Touch Digit Recognition
2+
=========================
3+
4+
:link_to_translation:`zh_CN:[中文]`
5+
6+
Touch Principle and Data Acquisition
7+
---------------------------------------
8+
9+
Touch Principle
10+
^^^^^^^^^^^^^^^^^^
11+
12+
13+
Data Collection
14+
^^^^^^^^^^^^^^^^^^^
15+
16+
In real-world scenarios, the digits drawn on a touchpad are often visually different from the handwritten digits in the MNIST dataset. As a result, models trained directly on MNIST perform poorly when applied to actual touch input. Therefore, it is necessary to collect a custom dataset of digits drawn on a touchpad and use it for training.
17+
18+
.. figure:: ../../_static/ai/touch_hw_real_data.png
19+
:align: center
20+
21+
Real Dataset Based on Touchpad Drawn Input
22+
23+
.. note:: The size of the handwritten data image after interpolation is 30×25.
24+
25+
Click here to download the dataset used in this example: :download:`touch_dataset.zip <https://dl.espressif.com/AE/esp-iot-solution/touch_dataset.zip>`
26+
27+
Model Training and Deployment
28+
--------------------------------
29+
30+
Model Construction
31+
^^^^^^^^^^^^^^^^^^^^
32+
33+
Based on the PyTorch framework, a neural network model suitable for touch-based handwritten digit recognition is constructed. The architecture is as follows:
34+
35+
.. code-block:: python
36+
37+
class Net(torch.nn.Module):
38+
def __init__(self):
39+
super(Net, self).__init__()
40+
self.model = torch.nn.Sequential(
41+
torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1),
42+
torch.nn.ReLU(),
43+
torch.nn.MaxPool2d(kernel_size=2, stride=2),
44+
45+
torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1),
46+
torch.nn.ReLU(),
47+
torch.nn.MaxPool2d(kernel_size=2, stride=2),
48+
49+
torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
50+
torch.nn.ReLU(),
51+
52+
torch.nn.Flatten(),
53+
torch.nn.Linear(in_features=7 * 6 * 64, out_features=256),
54+
torch.nn.ReLU(),
55+
torch.nn.Dropout(p=0.5),
56+
torch.nn.Linear(in_features=256, out_features=10),
57+
torch.nn.Softmax(dim=1)
58+
)
59+
60+
def forward(self, x):
61+
output = self.model(x)
62+
return output
63+
64+
Model Training
65+
^^^^^^^^^^^^^^^^^^
66+
67+
The training process of the model includes dataset loading and preprocessing, configuration of training parameters, monitoring of the training progress, and saving of the trained model.
68+
69+
Data Loading and Preprocessing
70+
""""""""""""""""""""""""""""""""
71+
72+
The images corresponding to different digits are organized under the ``dataset/extra`` directory, with each digit stored in a separate subfolder named after the digit. Image preprocessing is performed using ``transforms.Compose``, including grayscale conversion, random rotation and translation, and normalization. The entire dataset is then loaded using ``ImageFolder`` and split into training and test sets in an 8:2 ratio. Finally, DataLoader is used to construct batch loaders for subsequent model training and evaluation.
73+
74+
.. code-block:: python
75+
76+
import matplotlib.pyplot as plt
77+
import torch
78+
import torch.nn as nn
79+
import torch.optim as optim
80+
from torch.utils.data import DataLoader, random_split
81+
from torchvision import datasets, transforms
82+
83+
transform = transforms.Compose([
84+
transforms.Grayscale(num_output_channels=1),
85+
transforms.RandomAffine(degrees=10, translate=(0.1, 0.1)),
86+
transforms.ToTensor(),
87+
transforms.Normalize((0.5,), (0.5,)),
88+
])
89+
90+
dataset = datasets.ImageFolder(root='./dataset/extra', transform=transform)
91+
92+
train_size = int(0.8 * len(dataset))
93+
test_size = len(dataset) - train_size
94+
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])
95+
96+
train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)
97+
test_loader = DataLoader(dataset=test_dataset, batch_size=32, shuffle=False)
98+
99+
Model Training Parameter Configuration
100+
""""""""""""""""""""""""""""""""""""""""""
101+
102+
Model training parameters include learning rate, optimizer, loss function, and others. In the actual training process, cross-entropy is used as the loss function, and the Adam optimizer is employed to update the model parameters.
103+
104+
.. code-block:: python
105+
106+
device = "cuda:0" if torch.cuda.is_available() else "cpu"
107+
model = Net().to(device)
108+
criterion = nn.CrossEntropyLoss()
109+
optimizer = optim.Adam(model.parameters(), lr=0.001)
110+
111+
Model Training and Saving
112+
""""""""""""""""""""""""""""
113+
114+
The number of training epochs is set to 100. During training, the model parameters are updated using the training set, while the test set is used to evaluate the model's performance after each epoch. Once training is complete, the model parameters are saved to the file ``./models/final_model.pth``.
115+
116+
.. code-block:: python
117+
118+
def train_epoch(model, train_loader, criterion, optimizer, device):
119+
model.train()
120+
running_loss = 0.0
121+
correct = 0
122+
total = 0
123+
124+
for inputs, labels in train_loader:
125+
inputs, labels = inputs.to(device), labels.to(device)
126+
127+
optimizer.zero_grad()
128+
outputs = model(inputs)
129+
loss = criterion(outputs, labels)
130+
loss.backward()
131+
optimizer.step()
132+
133+
running_loss += loss.item()
134+
_, predicted = torch.max(outputs.data, 1)
135+
total += labels.size(0)
136+
correct += (predicted == labels).sum().item()
137+
138+
epoch_loss = running_loss / len(train_loader)
139+
epoch_acc = 100 * correct / total
140+
return epoch_loss, epoch_acc
141+
142+
143+
def test_epoch(model, test_loader, criterion, device):
144+
model.eval()
145+
running_loss = 0.0
146+
correct = 0
147+
total = 0
148+
149+
with torch.no_grad():
150+
for inputs, labels in test_loader:
151+
inputs, labels = inputs.to(device), labels.to(device)
152+
153+
outputs = model(inputs)
154+
loss = criterion(outputs, labels)
155+
156+
running_loss += loss.item()
157+
_, predicted = torch.max(outputs.data, 1)
158+
total += labels.size(0)
159+
correct += (predicted == labels).sum().item()
160+
161+
epoch_loss = running_loss / len(test_loader)
162+
epoch_acc = 100 * correct / total
163+
return epoch_loss, epoch_acc
164+
165+
num_epochs = 100
166+
train_acc_array = []
167+
test_acc_array = []
168+
for epoch in range(num_epochs):
169+
train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
170+
test_loss, test_acc = test_epoch(model, test_loader, criterion, device)
171+
172+
print(f'Epoch [{epoch + 1}/{num_epochs}], '
173+
f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%, '
174+
f'Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%')
175+
train_acc_array.append(train_acc)
176+
test_acc_array.append(test_acc)
177+
178+
torch.save(model.state_dict(), './models/final_model.pth')
179+
180+
During the training process, the accuracy curves of the training and test sets evolve as follows:
181+
182+
.. figure:: ../../_static/ai/touch_train_acc.png
183+
:align: center
184+
185+
Accuracy Curves of the Training and Test Sets
186+
187+
Model Deployment
188+
^^^^^^^^^^^^^^^^^^^
189+
190+
ESP-PPQ Environment Configuration
191+
""""""""""""""""""""""""""""""""""""""
192+
193+
``ESP-PPQ`` is a quantization tool based on ``ppq``. Please use the following command to install ``ESP-PPQ``:
194+
195+
.. code-block:: bash
196+
197+
pip uninstall ppq
198+
pip install git+https://github.com/espressif/esp-ppq.git
199+
200+
Model Quantization and Deployment
201+
""""""""""""""""""""""""""""""""""""""
202+
203+
Refer to `How to quantize model <https://github.com/espressif/esp-dl/blob/master/docs/en/tutorials/how_to_quantize_model.rst>`_ for model quantization and export. If you need to export a model for ESP32P4, set ``TARGET`` to ``esp32p4``.
204+
205+
.. code-block:: python
206+
207+
import torch
208+
from PIL import Image
209+
from ppq.api import espdl_quantize_torch
210+
from torch.utils.data import Dataset
211+
from torch.utils.data import random_split
212+
from torchvision import transforms, datasets
213+
214+
DEVICE = "cpu"
215+
216+
class FeatureOnlyDataset(Dataset):
217+
def __init__(self, original_dataset):
218+
self.features = []
219+
for item in original_dataset:
220+
self.features.append(item[0])
221+
222+
def __len__(self):
223+
return len(self.features)
224+
225+
def __getitem__(self, idx):
226+
return self.features[idx]
227+
228+
229+
def collate_fn2(batch):
230+
features = torch.stack(batch)
231+
return features.to(DEVICE)
232+
233+
234+
if __name__ == '__main__':
235+
BATCH_SIZE = 32
236+
INPUT_SHAPE = [1, 25, 30]
237+
TARGET = "esp32s3"
238+
NUM_OF_BITS = 8
239+
ESPDL_MODEL_PATH = "./s3/touch_recognition.espdl"
240+
241+
transform = transforms.Compose([
242+
transforms.Grayscale(num_output_channels=1),
243+
transforms.ToTensor(),
244+
transforms.Normalize((0.5,), (0.5,)),
245+
])
246+
247+
dataset = datasets.ImageFolder(root="../dataset/extra", transform=transform)
248+
train_size = int(0.8 * len(dataset))
249+
test_size = len(dataset) - train_size
250+
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])
251+
252+
image = Image.open("../dataset/extra/9/20250225_140331.png").convert('L')
253+
input_tensor = transform(image).unsqueeze(0)
254+
print(input_tensor)
255+
256+
feature_only_test_data = FeatureOnlyDataset(test_dataset)
257+
258+
testDataLoader = torch.utils.data.DataLoader(dataset=feature_only_test_data, batch_size=BATCH_SIZE, shuffle=False,
259+
collate_fn=collate_fn2)
260+
261+
model = Net().to(DEVICE)
262+
model.load_state_dict(torch.load("./final_model.pth", map_location=DEVICE))
263+
model.eval()
264+
265+
quant_ppq_graph = espdl_quantize_torch(
266+
model=model,
267+
espdl_export_file=ESPDL_MODEL_PATH,
268+
calib_dataloader=testDataLoader,
269+
calib_steps=8,
270+
input_shape=[1] + INPUT_SHAPE,
271+
inputs=[input_tensor],
272+
target=TARGET,
273+
num_of_bits=NUM_OF_BITS,
274+
device=DEVICE,
275+
error_report=True,
276+
skip_export=False,
277+
export_test_values=True,
278+
verbose=1,
279+
dispatching_override=None
280+
)
281+
282+
To facilitate model debugging, ESP-DL provides the functionality to add test data during quantization and view inference results on the PC side. In the above process, ``image`` is loaded into ``espdl_quantize_torch`` for testing. After model conversion is complete, the inference results of the test data will be saved in a file with the ``*.info`` extension:
283+
284+
.. code-block:: bash
285+
286+
test outputs value:
287+
%23, shape: [1, 10], exponents: [0],
288+
value: array([9.85415445e-34, 1.92874989e-22, 7.46892081e-43, 1.60381094e-28,
289+
3.22134028e-27, 1.05306175e-20, 4.07960022e-41, 1.42516404e-21,
290+
2.38026637e-26, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00],
291+
dtype=float32)
292+
293+
.. important:: During model quantization and deployment, please set the ``shuffle`` parameter in ``torch.utils.data.DataLoader`` to ``False``.
294+
295+
On-device Inference
296+
---------------------
297+
298+
Refer to `How to load test profile model <https://github.com/espressif/esp-dl/blob/master/docs/en/tutorials/how_to_load_test_profile_model.rst>`_ and `How to run model <https://github.com/espressif/esp-dl/blob/master/docs/en/tutorials/how_to_run_model.rst>`_ for implementing model loading and inference.
299+
300+
It's important to note that in this example, the Touch driver reports pressed and unpressed states as 1 and 0, while the model input is normalized image data. Therefore, preprocessing of the data reported by the Touch driver is necessary:
301+
302+
.. code-block:: c
303+
304+
for (size_t i = 0; i < m_feature_size; i++) {
305+
int8_t value = (input_data[i] == 0 ? -1 : 1);
306+
quant_buffer[i] = dl::quantize<int8_t>((float)value, m_input_scale);
307+
}
308+
309+
310+
For the complete project, please refer to: :example:`ai/esp_dl/touchpad_digit_recognition`

docs/zh_CN/ai/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@
88
:maxdepth: 1
99

1010
OpenAI <openai>
11-
11+
Touch 手写数字识别 <touch_digit_recognition>

0 commit comments

Comments
 (0)