|
| 1 | +Touch Digit Recognition |
| 2 | +========================= |
| 3 | + |
| 4 | +:link_to_translation:`zh_CN:[中文]` |
| 5 | + |
| 6 | +Touch Principle and Data Acquisition |
| 7 | +--------------------------------------- |
| 8 | + |
| 9 | +Touch Principle |
| 10 | +^^^^^^^^^^^^^^^^^^ |
| 11 | + |
| 12 | +The `ESP_Touch_Kit_Touchpad <https://dl.espressif.com/dl/schematics/SCH_ESP-Touch-Kit-Touchpad_V1.0_20210406.pdf>`_ is used as the touch pad. This touchpad consists of a 6*7 array of touch channels. |
| 13 | + |
| 14 | +.. figure:: ../../_static/ai/touch_kit_pad.png |
| 15 | + :align: center |
| 16 | + |
| 17 | + Physical Touch Pad |
| 18 | + |
| 19 | +When a finger moves across the touch pad, it changes the capacitance values of the touch channels, allowing us to detect the finger position by monitoring these capacitance changes. |
| 20 | + |
| 21 | +Detection Algorithm |
| 22 | +^^^^^^^^^^^^^^^^^^^^ |
| 23 | + |
| 24 | +1. Due to hardware variations, each channel has different maximum and minimum trigger values. Therefore, the capacitance values need to be normalized. This is done by recording the maximum and minimum values as the finger slides across the pad. |
| 25 | + |
| 26 | +2. During finger movement, one channel will show the highest rate of capacitance change. By identifying this channel and its adjacent channel with the next highest change rate, we can locate two neighboring channels. |
| 27 | + |
| 28 | +3. Using a ratio-based calculation between these two channels' values, we can determine the relative coordinate of the finger along that direction (the offset from the center point between the two channels). |
| 29 | + |
| 30 | +.. math:: |
| 31 | +
|
| 32 | + x = \frac{Fa - Fb}{Fa + Fb} |
| 33 | +
|
| 34 | +4. By applying the above steps, we can obtain relative coordinates in both directions, thus determining the finger's position. |
| 35 | + |
| 36 | +.. note:: |
| 37 | + |
| 38 | + An appropriate trigger threshold needs to be set to determine whether the finger is moving on the touch pad. |
| 39 | + |
| 40 | +5. When finger lift-off is detected, the drawn data can be saved. |
| 41 | + |
| 42 | +Data Collection |
| 43 | +^^^^^^^^^^^^^^^^^^^ |
| 44 | + |
| 45 | +In real-world scenarios, the digits drawn on a touchpad are often visually different from the handwritten digits in the MNIST dataset. As a result, models trained directly on MNIST perform poorly when applied to actual touch input. Therefore, it is necessary to collect a custom dataset of digits drawn on a touchpad and use it for training. |
| 46 | + |
| 47 | +.. figure:: ../../_static/ai/touch_hw_real_data.png |
| 48 | + :align: center |
| 49 | + |
| 50 | + Real Dataset Based on Touchpad Drawn Input |
| 51 | + |
| 52 | +.. note:: The size of the handwritten data image after interpolation is 30×25. |
| 53 | + |
| 54 | +Click here to download the dataset used in this example: :download:`touch_dataset.zip <https://dl.espressif.com/AE/esp-iot-solution/touch_dataset.zip>` |
| 55 | + |
| 56 | +Model Training and Deployment |
| 57 | +-------------------------------- |
| 58 | + |
| 59 | +Model Construction |
| 60 | +^^^^^^^^^^^^^^^^^^^^ |
| 61 | + |
| 62 | +Based on the PyTorch framework, a neural network model suitable for touch-based handwritten digit recognition is constructed. The architecture is as follows: |
| 63 | + |
| 64 | +.. code-block:: python |
| 65 | +
|
| 66 | + class Net(torch.nn.Module): |
| 67 | + def __init__(self): |
| 68 | + super(Net, self).__init__() |
| 69 | + self.model = torch.nn.Sequential( |
| 70 | + torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, stride=1, padding=1), |
| 71 | + torch.nn.ReLU(), |
| 72 | + torch.nn.MaxPool2d(kernel_size=2, stride=2), |
| 73 | +
|
| 74 | + torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1), |
| 75 | + torch.nn.ReLU(), |
| 76 | + torch.nn.MaxPool2d(kernel_size=2, stride=2), |
| 77 | +
|
| 78 | + torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1), |
| 79 | + torch.nn.ReLU(), |
| 80 | +
|
| 81 | + torch.nn.Flatten(), |
| 82 | + torch.nn.Linear(in_features=7 * 6 * 64, out_features=256), |
| 83 | + torch.nn.ReLU(), |
| 84 | + torch.nn.Dropout(p=0.5), |
| 85 | + torch.nn.Linear(in_features=256, out_features=10), |
| 86 | + torch.nn.Softmax(dim=1) |
| 87 | + ) |
| 88 | +
|
| 89 | + def forward(self, x): |
| 90 | + output = self.model(x) |
| 91 | + return output |
| 92 | +
|
| 93 | +Model Training |
| 94 | +^^^^^^^^^^^^^^^^^^ |
| 95 | + |
| 96 | +The training process of the model includes dataset loading and preprocessing, configuration of training parameters, monitoring of the training progress, and saving of the trained model. |
| 97 | + |
| 98 | +Data Loading and Preprocessing |
| 99 | +"""""""""""""""""""""""""""""""" |
| 100 | + |
| 101 | +The images corresponding to different digits are organized under the ``dataset/extra`` directory, with each digit stored in a separate subfolder named after the digit. Image preprocessing is performed using ``transforms.Compose``, including grayscale conversion, random rotation and translation, and normalization. The entire dataset is then loaded using ``ImageFolder`` and split into training and test sets in an 8:2 ratio. Finally, DataLoader is used to construct batch loaders for subsequent model training and evaluation. |
| 102 | + |
| 103 | +.. code-block:: python |
| 104 | +
|
| 105 | + import matplotlib.pyplot as plt |
| 106 | + import torch |
| 107 | + import torch.nn as nn |
| 108 | + import torch.optim as optim |
| 109 | + from torch.utils.data import DataLoader, random_split |
| 110 | + from torchvision import datasets, transforms |
| 111 | +
|
| 112 | + transform = transforms.Compose([ |
| 113 | + transforms.Grayscale(num_output_channels=1), |
| 114 | + transforms.RandomAffine(degrees=10, translate=(0.1, 0.1)), |
| 115 | + transforms.ToTensor(), |
| 116 | + transforms.Normalize((0.5,), (0.5,)), |
| 117 | + ]) |
| 118 | +
|
| 119 | + dataset = datasets.ImageFolder(root='./dataset/extra', transform=transform) |
| 120 | +
|
| 121 | + train_size = int(0.8 * len(dataset)) |
| 122 | + test_size = len(dataset) - train_size |
| 123 | + train_dataset, test_dataset = random_split(dataset, [train_size, test_size]) |
| 124 | +
|
| 125 | + train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True) |
| 126 | + test_loader = DataLoader(dataset=test_dataset, batch_size=32, shuffle=False) |
| 127 | +
|
| 128 | +Model Training Parameter Configuration |
| 129 | +"""""""""""""""""""""""""""""""""""""""""" |
| 130 | + |
| 131 | +Model training parameters include learning rate, optimizer, loss function, and others. In the actual training process, cross-entropy is used as the loss function, and the Adam optimizer is employed to update the model parameters. |
| 132 | + |
| 133 | +.. code-block:: python |
| 134 | +
|
| 135 | + device = "cuda:0" if torch.cuda.is_available() else "cpu" |
| 136 | + model = Net().to(device) |
| 137 | + criterion = nn.CrossEntropyLoss() |
| 138 | + optimizer = optim.Adam(model.parameters(), lr=0.001) |
| 139 | +
|
| 140 | +Model Training and Saving |
| 141 | +"""""""""""""""""""""""""""" |
| 142 | + |
| 143 | +The number of training epochs is set to 100. During training, the model parameters are updated using the training set, while the test set is used to evaluate the model's performance after each epoch. Once training is complete, the model parameters are saved to the file ``./models/final_model.pth``. |
| 144 | + |
| 145 | +.. code-block:: python |
| 146 | +
|
| 147 | + def train_epoch(model, train_loader, criterion, optimizer, device): |
| 148 | + model.train() |
| 149 | + running_loss = 0.0 |
| 150 | + correct = 0 |
| 151 | + total = 0 |
| 152 | +
|
| 153 | + for inputs, labels in train_loader: |
| 154 | + inputs, labels = inputs.to(device), labels.to(device) |
| 155 | +
|
| 156 | + optimizer.zero_grad() |
| 157 | + outputs = model(inputs) |
| 158 | + loss = criterion(outputs, labels) |
| 159 | + loss.backward() |
| 160 | + optimizer.step() |
| 161 | +
|
| 162 | + running_loss += loss.item() |
| 163 | + _, predicted = torch.max(outputs.data, 1) |
| 164 | + total += labels.size(0) |
| 165 | + correct += (predicted == labels).sum().item() |
| 166 | +
|
| 167 | + epoch_loss = running_loss / len(train_loader) |
| 168 | + epoch_acc = 100 * correct / total |
| 169 | + return epoch_loss, epoch_acc |
| 170 | +
|
| 171 | +
|
| 172 | + def test_epoch(model, test_loader, criterion, device): |
| 173 | + model.eval() |
| 174 | + running_loss = 0.0 |
| 175 | + correct = 0 |
| 176 | + total = 0 |
| 177 | +
|
| 178 | + with torch.no_grad(): |
| 179 | + for inputs, labels in test_loader: |
| 180 | + inputs, labels = inputs.to(device), labels.to(device) |
| 181 | +
|
| 182 | + outputs = model(inputs) |
| 183 | + loss = criterion(outputs, labels) |
| 184 | +
|
| 185 | + running_loss += loss.item() |
| 186 | + _, predicted = torch.max(outputs.data, 1) |
| 187 | + total += labels.size(0) |
| 188 | + correct += (predicted == labels).sum().item() |
| 189 | +
|
| 190 | + epoch_loss = running_loss / len(test_loader) |
| 191 | + epoch_acc = 100 * correct / total |
| 192 | + return epoch_loss, epoch_acc |
| 193 | +
|
| 194 | + num_epochs = 100 |
| 195 | + train_acc_array = [] |
| 196 | + test_acc_array = [] |
| 197 | + for epoch in range(num_epochs): |
| 198 | + train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device) |
| 199 | + test_loss, test_acc = test_epoch(model, test_loader, criterion, device) |
| 200 | +
|
| 201 | + print(f'Epoch [{epoch + 1}/{num_epochs}], ' |
| 202 | + f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%, ' |
| 203 | + f'Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%') |
| 204 | + train_acc_array.append(train_acc) |
| 205 | + test_acc_array.append(test_acc) |
| 206 | +
|
| 207 | + torch.save(model.state_dict(), './models/final_model.pth') |
| 208 | +
|
| 209 | +During the training process, the accuracy curves of the training and test sets evolve as follows: |
| 210 | + |
| 211 | +.. figure:: ../../_static/ai/touch_train_acc.png |
| 212 | + :align: center |
| 213 | + |
| 214 | + Accuracy Curves of the Training and Test Sets |
| 215 | + |
| 216 | +Model Deployment |
| 217 | +^^^^^^^^^^^^^^^^^^^ |
| 218 | + |
| 219 | +ESP-PPQ Environment Configuration |
| 220 | +"""""""""""""""""""""""""""""""""""""" |
| 221 | + |
| 222 | +``ESP-PPQ`` is a quantization tool based on ``ppq``. Please use the following command to install ``ESP-PPQ``: |
| 223 | + |
| 224 | +.. code-block:: bash |
| 225 | +
|
| 226 | + pip uninstall ppq |
| 227 | + pip install git+https://github.com/espressif/esp-ppq.git |
| 228 | +
|
| 229 | +Model Quantization and Deployment |
| 230 | +"""""""""""""""""""""""""""""""""""""" |
| 231 | + |
| 232 | +Refer to `How to quantize model <https://github.com/espressif/esp-dl/blob/master/docs/en/tutorials/how_to_quantize_model.rst>`_ for model quantization and export. If you need to export a model for ESP32P4, set ``TARGET`` to ``esp32p4``. |
| 233 | + |
| 234 | +.. code-block:: python |
| 235 | +
|
| 236 | + import torch |
| 237 | + from PIL import Image |
| 238 | + from ppq.api import espdl_quantize_torch |
| 239 | + from torch.utils.data import Dataset |
| 240 | + from torch.utils.data import random_split |
| 241 | + from torchvision import transforms, datasets |
| 242 | +
|
| 243 | + DEVICE = "cpu" |
| 244 | +
|
| 245 | + class FeatureOnlyDataset(Dataset): |
| 246 | + def __init__(self, original_dataset): |
| 247 | + self.features = [] |
| 248 | + for item in original_dataset: |
| 249 | + self.features.append(item[0]) |
| 250 | +
|
| 251 | + def __len__(self): |
| 252 | + return len(self.features) |
| 253 | +
|
| 254 | + def __getitem__(self, idx): |
| 255 | + return self.features[idx] |
| 256 | +
|
| 257 | +
|
| 258 | + def collate_fn2(batch): |
| 259 | + features = torch.stack(batch) |
| 260 | + return features.to(DEVICE) |
| 261 | +
|
| 262 | +
|
| 263 | + if __name__ == '__main__': |
| 264 | + BATCH_SIZE = 32 |
| 265 | + INPUT_SHAPE = [1, 25, 30] |
| 266 | + TARGET = "esp32s3" |
| 267 | + NUM_OF_BITS = 8 |
| 268 | + ESPDL_MODEL_PATH = "./s3/touch_recognition.espdl" |
| 269 | +
|
| 270 | + transform = transforms.Compose([ |
| 271 | + transforms.Grayscale(num_output_channels=1), |
| 272 | + transforms.ToTensor(), |
| 273 | + transforms.Normalize((0.5,), (0.5,)), |
| 274 | + ]) |
| 275 | +
|
| 276 | + dataset = datasets.ImageFolder(root="../dataset/extra", transform=transform) |
| 277 | + train_size = int(0.8 * len(dataset)) |
| 278 | + test_size = len(dataset) - train_size |
| 279 | + train_dataset, test_dataset = random_split(dataset, [train_size, test_size]) |
| 280 | +
|
| 281 | + image = Image.open("../dataset/extra/9/20250225_140331.png").convert('L') |
| 282 | + input_tensor = transform(image).unsqueeze(0) |
| 283 | + print(input_tensor) |
| 284 | +
|
| 285 | + feature_only_test_data = FeatureOnlyDataset(test_dataset) |
| 286 | +
|
| 287 | + testDataLoader = torch.utils.data.DataLoader(dataset=feature_only_test_data, batch_size=BATCH_SIZE, shuffle=False, |
| 288 | + collate_fn=collate_fn2) |
| 289 | +
|
| 290 | + model = Net().to(DEVICE) |
| 291 | + model.load_state_dict(torch.load("./final_model.pth", map_location=DEVICE)) |
| 292 | + model.eval() |
| 293 | +
|
| 294 | + quant_ppq_graph = espdl_quantize_torch( |
| 295 | + model=model, |
| 296 | + espdl_export_file=ESPDL_MODEL_PATH, |
| 297 | + calib_dataloader=testDataLoader, |
| 298 | + calib_steps=8, |
| 299 | + input_shape=[1] + INPUT_SHAPE, |
| 300 | + inputs=[input_tensor], |
| 301 | + target=TARGET, |
| 302 | + num_of_bits=NUM_OF_BITS, |
| 303 | + device=DEVICE, |
| 304 | + error_report=True, |
| 305 | + skip_export=False, |
| 306 | + export_test_values=True, |
| 307 | + verbose=1, |
| 308 | + dispatching_override=None |
| 309 | + ) |
| 310 | +
|
| 311 | +To facilitate model debugging, ESP-DL provides the functionality to add test data during quantization and view inference results on the PC side. In the above process, ``image`` is loaded into ``espdl_quantize_torch`` for testing. After model conversion is complete, the inference results of the test data will be saved in a file with the ``*.info`` extension: |
| 312 | + |
| 313 | +.. code-block:: bash |
| 314 | +
|
| 315 | + test outputs value: |
| 316 | + %23, shape: [1, 10], exponents: [0], |
| 317 | + value: array([9.85415445e-34, 1.92874989e-22, 7.46892081e-43, 1.60381094e-28, |
| 318 | + 3.22134028e-27, 1.05306175e-20, 4.07960022e-41, 1.42516404e-21, |
| 319 | + 2.38026637e-26, 1.00000000e+00, 0.00000000e+00, 0.00000000e+00], |
| 320 | + dtype=float32) |
| 321 | +
|
| 322 | +.. important:: During model quantization and deployment, please set the ``shuffle`` parameter in ``torch.utils.data.DataLoader`` to ``False``. |
| 323 | + |
| 324 | +On-device Inference |
| 325 | +--------------------- |
| 326 | + |
| 327 | +Refer to `How to load test profile model <https://github.com/espressif/esp-dl/blob/master/docs/en/tutorials/how_to_load_test_profile_model.rst>`_ and `How to run model <https://github.com/espressif/esp-dl/blob/master/docs/en/tutorials/how_to_run_model.rst>`_ for implementing model loading and inference. |
| 328 | + |
| 329 | +It's important to note that in this example, the Touch driver reports pressed and unpressed states as 1 and 0, while the model input is normalized image data. Therefore, preprocessing of the data reported by the Touch driver is necessary: |
| 330 | + |
| 331 | +.. code-block:: c |
| 332 | +
|
| 333 | + for (size_t i = 0; i < m_feature_size; i++) { |
| 334 | + int8_t value = (input_data[i] == 0 ? -1 : 1); |
| 335 | + quant_buffer[i] = dl::quantize<int8_t>((float)value, m_input_scale); |
| 336 | + } |
| 337 | +
|
| 338 | +
|
| 339 | +For the complete project, please refer to: :example:`ai/esp_dl/touchpad_digit_recognition` |
0 commit comments