|
| 1 | +--- |
| 2 | +title: MaixCAM MaixPy Facial Expression Recognition, Gender, Mask, Age, and More |
| 3 | +update: |
| 4 | + - date: 2025-01-10 |
| 5 | + version: v1.0 |
| 6 | + author: neucrack |
| 7 | + content: Added source code, documentation, and examples for facial emotion recognition. |
| 8 | +--- |
| 9 | + |
| 10 | +## Introduction |
| 11 | + |
| 12 | +In the previous articles, [Facial Detection and Keypoint Detection](./face_detection.md) and [Facial Multi-Keypoint Detection], we introduced how to detect faces, keypoints, and facial recognition. This article focuses on recognizing facial emotions (expressions). It also explores how to identify other characteristics, such as gender, mask-wearing, and age. |
| 13 | + |
| 14 | +  |
| 15 | + |
| 16 | +Demonstration video on MaixCAM: |
| 17 | +<video playsinline controls autoplay loop muted preload src="/static/video/maixcam_face_emotion.mp4" type="video/mp4"> |
| 18 | +Classifier Result video |
| 19 | +</video> |
| 20 | + |
| 21 | +> Video source: [oarriaga/face_classification](https://github.com/oarriaga/face_classification) |
| 22 | +
|
| 23 | +## Using Facial Emotion Recognition in MaixCAM MaixPy |
| 24 | + |
| 25 | +MaixPy provides a default emotion recognition model with seven categories: |
| 26 | +* angry |
| 27 | +* disgust |
| 28 | +* fear |
| 29 | +* happy |
| 30 | +* sad |
| 31 | +* surprise |
| 32 | +* neutral |
| 33 | + |
| 34 | +The process for emotion recognition involves several steps: |
| 35 | +1. Detect the face. |
| 36 | +2. Crop the face into a standard format, as shown in the small image in the top-left corner above. |
| 37 | +3. Classify the cropped face image using a simple model. |
| 38 | + |
| 39 | +In MaixPy, the `yolov8-face` model is used for detecting facial and eye positions, followed by classification. Below is the code, which is also available in the [MaixPy](https://github.com/sipeed/maixpy) `examples` directory: |
| 40 | + |
| 41 | +```python |
| 42 | +from maix import camera, display, image, nn, app |
| 43 | + |
| 44 | +detect_conf_th = 0.5 |
| 45 | +detect_iou_th = 0.45 |
| 46 | +emotion_conf_th = 0.5 |
| 47 | +max_face_num = -1 |
| 48 | +crop_scale = 1.2 |
| 49 | + |
| 50 | +# detect face model |
| 51 | +detector = nn.YOLOv8(model="/root/models/yolov8n_face.mud", dual_buff=False) |
| 52 | +# landmarks detector for cropping images |
| 53 | +landmarks_detector = nn.FaceLandmarks(model="") |
| 54 | +# emotion classify model |
| 55 | +classifier = nn.Classifier(model="/root/models/face_emotion.mud", dual_buff=False) |
| 56 | + |
| 57 | +cam = camera.Camera(detector.input_width(), detector.input_height(), detector.input_format()) |
| 58 | +disp = display.Display() |
| 59 | + |
| 60 | +# for drawing result info |
| 61 | +max_labels_length = 0 |
| 62 | +for label in classifier.labels: |
| 63 | + size = image.string_size(label) |
| 64 | + if size.width() > max_labels_length: |
| 65 | + max_labels_length = size.width() |
| 66 | + |
| 67 | +max_score_length = cam.width() / 4 |
| 68 | + |
| 69 | +while not app.need_exit(): |
| 70 | + img = cam.read() |
| 71 | + results = [] |
| 72 | + objs = detector.detect(img, conf_th=detect_conf_th, iou_th=detect_iou_th, sort=1) |
| 73 | + count = 0 |
| 74 | + idxes = [] |
| 75 | + img_std_first: image.Image = None |
| 76 | + for i, obj in enumerate(objs): |
| 77 | + img_std = landmarks_detector.crop_image(img, obj.x, obj.y, obj.w, obj.h, obj.points, |
| 78 | + classifier.input_width(), classifier.input_height(), crop_scale) |
| 79 | + if img_std: |
| 80 | + img_std_gray = img_std.to_format(image.Format.FMT_GRAYSCALE) |
| 81 | + res = classifier.classify(img_std_gray, softmax=True) |
| 82 | + results.append(res) |
| 83 | + idxes.append(i) |
| 84 | + if i == 0: |
| 85 | + img_std_first = img_std |
| 86 | + count += 1 |
| 87 | + if max_face_num > 0 and count >= max_face_num: |
| 88 | + break |
| 89 | + for i, res in enumerate(results): |
| 90 | + if i == 0: |
| 91 | + img.draw_image(0, 0, img_std_first) |
| 92 | + for j in range(len(classifier.labels)): |
| 93 | + idx = res[j][0] |
| 94 | + score = res[j][1] |
| 95 | + img.draw_string(0, img_std_first.height() + idx * 16, classifier.labels[idx], image.COLOR_WHITE) |
| 96 | + img.draw_rect(max_labels_length, int(img_std_first.height() + idx * 16), int(score * max_score_length), 8, image.COLOR_GREEN if score >= emotion_conf_th else image.COLOR_RED, -1) |
| 97 | + img.draw_string(int(max_labels_length + score * max_score_length + 2), int(img_std_first.height() + idx * 16), f"{score:.1f}", image.COLOR_RED) |
| 98 | + color = image.COLOR_GREEN if res[0][1] >= emotion_conf_th else image.COLOR_RED |
| 99 | + obj = objs[idxes[i]] |
| 100 | + img.draw_rect(obj.x, obj.y, obj.w, obj.h, color, 1) |
| 101 | + img.draw_string(obj.x, obj.y, f"{classifier.labels[res[0][0]]}: {res[0][1]:.1f}", color) |
| 102 | + disp.show(img) |
| 103 | +``` |
| 104 | + |
| 105 | +### Key Code |
| 106 | + |
| 107 | +The core code steps are as follows: |
| 108 | +```python |
| 109 | +objs = detector.detect(img, conf_th=detect_conf_th, iou_th=detect_iou_th, sort=1) |
| 110 | +img_std = landmarks_detector.crop_image(...) |
| 111 | +img_std_gray = img_std.to_format(image.Format.FMT_GRAYSCALE) |
| 112 | +res = classifier.classify(img_std_gray, softmax=True) |
| 113 | +``` |
| 114 | + |
| 115 | +These correspond to: |
| 116 | +1. Detect the face. |
| 117 | +2. Crop the face. |
| 118 | +3. Classify the face image using a model (convert to grayscale before input as required). |
| 119 | + |
| 120 | +## Improving Recognition Accuracy |
| 121 | + |
| 122 | +The default MaixPy model offers basic classification but can be optimized by: |
| 123 | +* **Using keypoints as model input:** Instead of cropped images, facial keypoints can be used for input, removing background interference and improving training accuracy. |
| 124 | +* **Enhancing datasets:** Increase the number and variety of samples. |
| 125 | +* **Improving cropping techniques:** Use advanced transformations for precise cropping, such as affine transformations commonly used in facial recognition. |
| 126 | + |
| 127 | +## Training a Custom Classification Model |
| 128 | + |
| 129 | +### Overview |
| 130 | +1. **Define categories:** E.g., 7 emotions, gender, mask detection, etc. |
| 131 | +2. **Choose a model:** Lightweight classification models like MobileNetV2 work well. |
| 132 | +3. **Select a training platform:** |
| 133 | + * Use [MaixHub](https://maixhub.com) for online training (**recommended**). |
| 134 | + * Alternatively, train locally using PyTorch or TensorFlow. |
| 135 | +4. **Collect data:** Modify the code to save captured images, e.g., `img.save("/root/image0.jpg")`. |
| 136 | +5. **Clean data:** Organize samples into labeled folders. |
| 137 | +6. **Train:** |
| 138 | + * On MaixHub for an easy-to-deploy model. |
| 139 | + * Locally, then convert the model to ONNX and to MUD format for MaixPy. |
| 140 | + |
| 141 | +## Recognizing Other Facial Features (Gender, Mask, Age, etc.) |
| 142 | + |
| 143 | +The same principles apply to features like gender or mask detection. For numerical outputs like age, consider using regression models. Research online for more advanced techniques. |
| 144 | + |
| 145 | + |
| 146 | + |
0 commit comments