Skip to content

Commit ab2ce06

Browse files
committed
Merge branch 'feat/add_espdl_touchpad_digit_recognition_example' into 'master'
feat(esp_dl): add espdl touchpad digit recognition example Closes AEG-2358 See merge request ae_group/esp-iot-solution!1279
2 parents 01eb468 + 9ccb7e0 commit ab2ce06

35 files changed

+2546
-2
lines changed

.gitlab/ci/build.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,14 @@ build_example_ai_esp_dl_human_activity_recognition:
6969
IMAGE: espressif/idf:release-v5.3
7070
EXAMPLE_DIR: examples/ai/esp_dl/human_activity_recognition
7171

72+
build_example_ai_esp_dl_touchpad_digit_recognition:
73+
extends:
74+
- .build_examples_template
75+
- .rules:build:example_ai_esp_dl_touchpad_digit_recognition
76+
variables:
77+
IMAGE: espressif/idf:release-v5.3
78+
EXAMPLE_DIR: examples/ai/esp_dl/touchpad_digit_recognition
79+
7280
build_example_audio_wav_player:
7381
extends:
7482
- .build_examples_template

.gitlab/ci/rules.yml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -427,6 +427,9 @@
427427
.patterns-example_ai_esp_dl_human_activity_recognition: &patterns-example_ai_esp_dl_human_activity_recognition
428428
- "examples/ai/esp_dl/human_activity_recognition/**/*"
429429

430+
.patterns-example_ai_esp_dl_touchpad_digit_recognition: &patterns-example_ai_esp_dl_touchpad_digit_recognition
431+
- "examples/ai/esp_dl/touchpad_digit_recognition/**/*"
432+
430433
.patterns-example_audio_wav_player: &patterns-example_audio_wav_player
431434
- "examples/audio/**/*"
432435

@@ -725,6 +728,16 @@
725728
- <<: *if-dev-push
726729
changes: *patterns-example_ai_esp_dl_human_activity_recognition
727730

731+
.rules:build:example_ai_esp_dl_touchpad_digit_recognition:
732+
rules:
733+
- <<: *if-protected
734+
- <<: *if-label-build
735+
- <<: *if-trigger-job
736+
- <<: *if-dev-push
737+
changes: *patterns-build_system
738+
- <<: *if-dev-push
739+
changes: *patterns-example_ai_esp_dl_touchpad_digit_recognition
740+
728741
.rules:build:example_audio_wav_player:
729742
rules:
730743
- <<: *if-protected
10.1 KB
Loading

docs/_static/ai/touch_kit_pad.png

4.01 MB
Loading
33.7 KB
Loading

docs/en/ai/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,5 @@ AI
77
.. toctree::
88
:maxdepth: 1
99

10-
OpenAI <openai>
10+
OpenAI <openai>
11+
TouchDigitRecognition <touch_digit_recognition>
Lines changed: 339 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,339 @@
1+
Touch Digit Recognition
2+
=========================
3+
4+
:link_to_translation:`zh_CN:[中文]`
5+
6+
Touch Principle and Data Acquisition
7+
---------------------------------------
8+
9+
Touch Principle
10+
^^^^^^^^^^^^^^^^^^
11+
12+
The `ESP_Touch_Kit_Touchpad <https://dl.espressif.com/dl/schematics/SCH_ESP-Touch-Kit-Touchpad_V1.0_20210406.pdf>`_ is used as the touch pad. This touchpad consists of a 6*7 array of touch channels.
13+
14+
.. figure:: ../../_static/ai/touch_kit_pad.png
15+
:align: center
16+
17+
Physical Touch Pad
18+
19+
When a finger moves across the touch pad, it changes the capacitance values of the touch channels, allowing us to detect the finger position by monitoring these capacitance changes.
20+
21+
Detection Algorithm
22+
^^^^^^^^^^^^^^^^^^^^
23+
24+
1. Due to hardware variations, each channel has different maximum and minimum trigger values. Therefore, the capacitance values need to be normalized. This is done by recording the maximum and minimum values as the finger slides across the pad.
25+
26+
2. During finger movement, one channel will show the highest rate of capacitance change. By identifying this channel and its adjacent channel with the next highest change rate, we can locate two neighboring channels.
27+
28+
3. Using a ratio-based calculation between these two channels' values, we can determine the relative coordinate of the finger along that direction (the offset from the center point between the two channels).
29+
30+
.. math::
31+
32+
x = \frac{Fa - Fb}{Fa + Fb}
33+
34+
4. By applying the above steps, we can obtain relative coordinates in both directions, thus determining the finger's position.
35+
36+
.. note::
37+
38+
An appropriate trigger threshold needs to be set to determine whether the finger is moving on the touch pad.
39+
40+
5. When finger lift-off is detected, the drawn data can be saved.
41+
42+
Data Collection
43+
^^^^^^^^^^^^^^^^^^^
44+
45+
In real-world scenarios, the digits drawn on a touchpad are often visually different from the handwritten digits in the MNIST dataset. As a result, models trained directly on MNIST perform poorly when applied to actual touch input. Therefore, it is necessary to collect a custom dataset of digits drawn on a touchpad and use it for training.
46+
47+
.. figure:: ../../_static/ai/touch_hw_real_data.png
48+
:align: center
49+
50+
Real Dataset Based on Touchpad Drawn Input
51+
52+
.. note:: The size of the handwritten data image after interpolation is 30×25.
53+
54+
Click here to download the dataset used in this example: :download:`touch_dataset.zip <https://dl.espressif.com/AE/esp-iot-solution/touch_dataset.zip>`
55+
56+
Model Training and Deployment
57+
--------------------------------
58+
59+
Model Construction
60+
^^^^^^^^^^^^^^^^^^^^
61+
62+
Based on the PyTorch framework, a neural network model suitable for touch-based handwritten digit recognition is constructed. The architecture is as follows:
63+
64+
.. code-block:: python
65+
66+
class Net(torch.nn.Module):
67+
def __init__(self):
68+
super(Net, self).__init__()
69+
self.model = torch.nn.Sequential(
70+
torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1),
71+
torch.nn.ReLU(),
72+
torch.nn.MaxPool2d(kernel_size=2, stride=2),
73+
74+
torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1),
75+
torch.nn.ReLU(),
76+
torch.nn.MaxPool2d(kernel_size=2, stride=2),
77+
78+
torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),
79+
torch.nn.ReLU(),
80+
81+
torch.nn.Flatten(),
82+
torch.nn.Linear(in_features=7 * 6 * 64, out_features=256),
83+
torch.nn.ReLU(),
84+
torch.nn.Dropout(p=0.5),
85+
torch.nn.Linear(in_features=256, out_features=10),
86+
torch.nn.Softmax(dim=1)
87+
)
88+
89+
def forward(self, x):
90+
output = self.model(x)
91+
return output
92+
93+
Model Training
94+
^^^^^^^^^^^^^^^^^^
95+
96+
The training process of the model includes dataset loading and preprocessing, configuration of training parameters, monitoring of the training progress, and saving of the trained model.
97+
98+
Data Loading and Preprocessing
99+
""""""""""""""""""""""""""""""""
100+
101+
The images corresponding to different digits are organized under the ``dataset/extra`` directory, with each digit stored in a separate subfolder named after the digit. Image preprocessing is performed using ``transforms.Compose``, including grayscale conversion, random rotation and translation, and normalization. The entire dataset is then loaded using ``ImageFolder`` and split into training and test sets in an 8:2 ratio. Finally, DataLoader is used to construct batch loaders for subsequent model training and evaluation.
102+
103+
.. code-block:: python
104+
105+
import matplotlib.pyplot as plt
106+
import torch
107+
import torch.nn as nn
108+
import torch.optim as optim
109+
from torch.utils.data import DataLoader, random_split
110+
from torchvision import datasets, transforms
111+
112+
transform = transforms.Compose([
113+
transforms.Grayscale(num_output_channels=1),
114+
transforms.RandomAffine(degrees=10, translate=(0.1, 0.1)),
115+
transforms.ToTensor(),
116+
transforms.Normalize((0.5,), (0.5,)),
117+
])
118+
119+
dataset = datasets.ImageFolder(root='./dataset/extra', transform=transform)
120+
121+
train_size = int(0.8 * len(dataset))
122+
test_size = len(dataset) - train_size
123+
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])
124+
125+
train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)
126+
test_loader = DataLoader(dataset=test_dataset, batch_size=32, shuffle=False)
127+
128+
Model Training Parameter Configuration
129+
""""""""""""""""""""""""""""""""""""""""""
130+
131+
Model training parameters include learning rate, optimizer, loss function, and others. In the actual training process, cross-entropy is used as the loss function, and the Adam optimizer is employed to update the model parameters.
132+
133+
.. code-block:: python
134+
135+
device = "cuda:0" if torch.cuda.is_available() else "cpu"
136+
model = Net().to(device)
137+
criterion = nn.CrossEntropyLoss()
138+
optimizer = optim.Adam(model.parameters(), lr=0.001)
139+
140+
Model Training and Saving
141+
""""""""""""""""""""""""""""
142+
143+
The number of training epochs is set to 100. During training, the model parameters are updated using the training set, while the test set is used to evaluate the model's performance after each epoch. Once training is complete, the model parameters are saved to the file ``./models/final_model.pth``.
144+
145+
.. code-block:: python
146+
147+
def train_epoch(model, train_loader, criterion, optimizer, device):
148+
model.train()
149+
running_loss = 0.0
150+
correct = 0
151+
total = 0
152+
153+
for inputs, labels in train_loader:
154+
inputs, labels = inputs.to(device), labels.to(device)
155+
156+
optimizer.zero_grad()
157+
outputs = model(inputs)
158+
loss = criterion(outputs, labels)
159+
loss.backward()
160+
optimizer.step()
161+
162+
running_loss += loss.item()
163+
_, predicted = torch.max(outputs.data, 1)
164+
total += labels.size(0)
165+
correct += (predicted == labels).sum().item()
166+
167+
epoch_loss = running_loss / len(train_loader)
168+
epoch_acc = 100 * correct / total
169+
return epoch_loss, epoch_acc
170+
171+
172+
def test_epoch(model, test_loader, criterion, device):
173+
model.eval()
174+
running_loss = 0.0
175+
correct = 0
176+
total = 0
177+
178+
with torch.no_grad():
179+
for inputs, labels in test_loader:
180+
inputs, labels = inputs.to(device), labels.to(device)
181+
182+
outputs = model(inputs)
183+
loss = criterion(outputs, labels)
184+
185+
running_loss += loss.item()
186+
_, predicted = torch.max(outputs.data, 1)
187+
total += labels.size(0)
188+
correct += (predicted == labels).sum().item()
189+
190+
epoch_loss = running_loss / len(test_loader)
191+
epoch_acc = 100 * correct / total
192+
return epoch_loss, epoch_acc
193+
194+
num_epochs = 100
195+
train_acc_array = []
196+
test_acc_array = []
197+
for epoch in range(num_epochs):
198+
train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
199+
test_loss, test_acc = test_epoch(model, test_loader, criterion, device)
200+
201+
print(f'Epoch [{epoch + 1}/{num_epochs}], '
202+
f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%, '
203+
f'Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%')
204+
train_acc_array.append(train_acc)
205+
test_acc_array.append(test_acc)
206+
207+
torch.save(model.state_dict(), './models/final_model.pth')
208+
209+
During the training process, the accuracy curves of the training and test sets evolve as follows:
210+
211+
.. figure:: ../../_static/ai/touch_train_acc.png
212+
:align: center
213+
214+
Accuracy Curves of the Training and Test Sets
215+
216+
Model Deployment
217+
^^^^^^^^^^^^^^^^^^^
218+
219+
ESP-PPQ Environment Configuration
220+
""""""""""""""""""""""""""""""""""""""
221+
222+
``ESP-PPQ`` is a quantization tool based on ``ppq``. Please use the following command to install ``ESP-PPQ``:
223+
224+
.. code-block:: bash
225+
226+
pip uninstall ppq
227+
pip install git+https://github.com/espressif/esp-ppq.git
228+
229+
Model Quantization and Deployment
230+
""""""""""""""""""""""""""""""""""""""
231+
232+
Refer to `How to quantize model <https://github.com/espressif/esp-dl/blob/master/docs/en/tutorials/how_to_quantize_model.rst>`_ for model quantization and export. If you need to export a model for ESP32P4, set ``TARGET`` to ``esp32p4``.
233+
234+
.. code-block:: python
235+
236+
import torch
237+
from PIL import Image
238+
from ppq.api import espdl_quantize_torch
239+
from torch.utils.data import Dataset
240+
from torch.utils.data import random_split
241+
from torchvision import transforms, datasets
242+
243+
DEVICE = "cpu"
244+
245+
class FeatureOnlyDataset(Dataset):
246+
def __init__(self, original_dataset):
247+
self.features = []
248+
for item in original_dataset:
249+
self.features.append(item[0])
250+
251+
def __len__(self):
252+
return len(self.features)
253+
254+
def __getitem__(self, idx):
255+
return self.features[idx]
256+
257+
258+
def collate_fn2(batch):
259+
features = torch.stack(batch)
260+
return features.to(DEVICE)
261+
262+
263+
if __name__ == '__main__':
264+
BATCH_SIZE = 32
265+
INPUT_SHAPE = [1, 25, 30]
266+
TARGET = "esp32s3"
267+
NUM_OF_BITS = 8
268+
ESPDL_MODEL_PATH = "./s3/touch_recognition.espdl"
269+
270+
transform = transforms.Compose([
271+
transforms.Grayscale(num_output_channels=1),
272+
transforms.ToTensor(),
273+
transforms.Normalize((0.5,), (0.5,)),
274+
])
275+
276+
dataset = datasets.ImageFolder(root="../dataset/extra", transform=transform)
277+
train_size = int(0.8 * len(dataset))
278+
test_size = len(dataset) - train_size
279+
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])
280+
281+
image = Image.open("../dataset/extra/9/20250225_140331.png").convert('L')
282+
input_tensor = transform(image).unsqueeze(0)
283+
print(input_tensor)
284+
285+
feature_only_test_data = FeatureOnlyDataset(test_dataset)
286+
287+
testDataLoader = torch.utils.data.DataLoader(dataset=feature_only_test_data, batch_size=BATCH_SIZE, shuffle=False,
288+
collate_fn=collate_fn2)
289+
290+
model = Net().to(DEVICE)
291+
model.load_state_dict(torch.load("./final_model.pth", map_location=DEVICE))
292+
model.eval()
293+
294+
quant_ppq_graph = espdl_quantize_torch(
295+
model=model,
296+
espdl_export_file=ESPDL_MODEL_PATH,
297+
calib_dataloader=testDataLoader,
298+
calib_steps=8,
299+
input_shape=[1] + INPUT_SHAPE,
300+
inputs=[input_tensor],
301+
target=TARGET,
302+
num_of_bits=NUM_OF_BITS,
303+
device=DEVICE,
304+
error_report=True,
305+
skip_export=False,
306+
export_test_values=True,
307+
verbose=1,
308+
dispatching_override=None
309+
)
310+
311+
To facilitate model debugging, ESP-DL provides the functionality to add test data during quantization and view inference results on the PC side. In the above process, ``image`` is loaded into ``espdl_quantize_torch`` for testing. After model conversion is complete, the inference results of the test data will be saved in a file with the ``*.info`` extension:
312+
313+
.. code-block:: bash
314+
315+
test outputs value:
316+
%23, shape: [1, 10], exponents: [0],
317+
value: array([9.85415445e-34, 1.92874989e-22, 7.46892081e-43, 1.60381094e-28,
318+
3.22134028e-27, 1.05306175e-20, 4.07960022e-41, 1.42516404e-21,
319+
2.38026637e-26, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00],
320+
dtype=float32)
321+
322+
.. important:: During model quantization and deployment, please set the ``shuffle`` parameter in ``torch.utils.data.DataLoader`` to ``False``.
323+
324+
On-device Inference
325+
---------------------
326+
327+
Refer to `How to load test profile model <https://github.com/espressif/esp-dl/blob/master/docs/en/tutorials/how_to_load_test_profile_model.rst>`_ and `How to run model <https://github.com/espressif/esp-dl/blob/master/docs/en/tutorials/how_to_run_model.rst>`_ for implementing model loading and inference.
328+
329+
It's important to note that in this example, the Touch driver reports pressed and unpressed states as 1 and 0, while the model input is normalized image data. Therefore, preprocessing of the data reported by the Touch driver is necessary:
330+
331+
.. code-block:: c
332+
333+
for (size_t i = 0; i < m_feature_size; i++) {
334+
int8_t value = (input_data[i] == 0 ? -1 : 1);
335+
quant_buffer[i] = dl::quantize<int8_t>((float)value, m_input_scale);
336+
}
337+
338+
339+
For the complete project, please refer to: :example:`ai/esp_dl/touchpad_digit_recognition`

docs/zh_CN/ai/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,4 @@
88
:maxdepth: 1
99

1010
OpenAI <openai>
11-
11+
Touch 手写数字识别 <touch_digit_recognition>

0 commit comments

Comments
 (0)