Skip to content

Commit 0e13e8f

Browse files
authored
Merge pull request #476 from PINTO0309/486_MWC
486_MWC
2 parents e72692e + 67750f4 commit 0e13e8f

6 files changed

Lines changed: 2209 additions & 0 deletions

File tree

486_MWC/LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 Katsuya Hyodo
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

486_MWC/README.md

Lines changed: 245 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,245 @@
1+
# 486_MWC
2+
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19617672.svg)](https://doi.org/10.5281/zenodo.19617672) ![GitHub License](https://img.shields.io/github/license/pinto0309/MWC) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/PINTO0309/mwc)
3+
4+
Mask wearing classifier.
5+
6+
https://github.com/user-attachments/assets/a02290cd-b8cc-45b6-8e97-2144cc2628ae
7+
8+
|Variant|Size|F1|CPU<br>inference<br>latency|ONNX|
9+
|:-:|:-:|:-:|:-:|:-:|
10+
|P|115 KB|0.9981|0.23 ms|[Download](https://github.com/PINTO0309/MWC/releases/download/onnx/mwc_p_48x48.onnx)|
11+
|N|176 KB|0.9995|0.41 ms|[Download](https://github.com/PINTO0309/MWC/releases/download/onnx/mwc_n_48x48.onnx)|
12+
|T|280 KB|0.9996|0.52 ms|[Download](https://github.com/PINTO0309/MWC/releases/download/onnx/mwc_t_48x48.onnx)|
13+
|S|495 KB|0.9998|0.64 ms|[Download](https://github.com/PINTO0309/MWC/releases/download/onnx/mwc_s_48x48.onnx)|
14+
|L|6.4 MB|0.9998|1.03 ms|[Download](https://github.com/PINTO0309/MWC/releases/download/onnx/mwc_l_48x48.onnx)|
15+
16+
## Setup
17+
18+
```bash
19+
git clone https://github.com/PINTO0309/MWC.git && cd MWC
20+
curl -LsSf https://astral.sh/uv/install.sh | sh
21+
uv sync
22+
source .venv/bin/activate
23+
```
24+
25+
## Inference
26+
27+
```bash
28+
uv run python demo_mwc.py \
29+
-hm mwc_l_48x48.onnx \
30+
-v 0 \
31+
-ep cuda \
32+
-dlr -dnm -dgm -dhm -dhd
33+
34+
uv run python demo_mwc.py \
35+
-hm mwc_l_48x48.onnx \
36+
-v 0 \
37+
-ep tensorrt \
38+
-dlr -dnm -dgm -dhm -dhd
39+
```
40+
41+
## Archive extraction
42+
43+
Extract images from the source archive into numbered folders under `data/`,
44+
storing up to 2,000 images per folder:
45+
46+
```bash
47+
python 00_extract_tar.py \
48+
--archive /path/to/train_aug_120x120_part_masked_clean.tar.gz \
49+
--output-dir data \
50+
--images-per-dir 2000
51+
```
52+
53+
## Dataset parquet
54+
55+
Generate a parquet dataset with embedded resized image bytes:
56+
57+
```bash
58+
SIZE=48x48 # HxW
59+
python 01_build_mask_parquet.py \
60+
--root data \
61+
--output data/dataset_${SIZE}.parquet \
62+
--image-size ${SIZE}
63+
```
64+
65+
Labels are derived from filenames:
66+
67+
- `*_mask_*` -> `masked` / `1`
68+
- otherwise -> `no_masked` / `0`
69+
70+
<img width="600" alt="dataset_48x48_class_ratio" src="https://github.com/user-attachments/assets/8ce1e680-5f10-4f0c-bf25-4338da47ef40" />
71+
72+
## Data sample
73+
74+
|1|2|3|4|5|
75+
|:-:|:-:|:-:|:-:|:-:|
76+
|<img width="48" height="48" alt="image" src="https://github.com/user-attachments/assets/0c8dd9bd-eec3-44fa-9a15-ab0d92b0247c" />|<img width="48" height="48" alt="image" src="https://github.com/user-attachments/assets/5e4dccbd-5b54-4296-9f96-ffba6e3c0298" />|<img width="48" height="48" alt="image" src="https://github.com/user-attachments/assets/4cde7b4b-b162-49c0-b660-474688f66f50" />|<img width="48" height="48" alt="image" src="https://github.com/user-attachments/assets/5aa8d6b3-82e5-4430-9934-2f204e1ec51b" />|<img width="48" height="48" alt="image" src="https://github.com/user-attachments/assets/32aa0434-767a-4e39-a0cc-09aeb886881e" />|
77+
78+
## Training Pipeline
79+
80+
- The training loop relies on `BCEWithLogitsLoss` plus class-balanced `pos_weight` to stabilise optimisation under class imbalance; inference produces sigmoid probabilities. Use `--train_resampling weighted` to switch on the previous `WeightedRandomSampler` behaviour, or `--train_resampling balanced` to physically duplicate minority classes before shuffling.
81+
- Training history, validation metrics, optional test predictions, checkpoints, configuration JSON, and ONNX exports are produced automatically.
82+
- Per-epoch checkpoints named like `mwc_epoch_0001.pt` are retained (latest 10), as well as the best checkpoints named `mwc_best_epoch0004_f1_0.9321.pt` (also latest 10).
83+
- The backbone can be switched with `--arch_variant`. Supported combinations with `--head_variant` are:
84+
85+
| `--arch_variant` | Default (`--head_variant auto`) | Explicitly selectable heads | Remarks |
86+
|------------------|-----------------------------|---------------------------|------|
87+
| `baseline` | `avg` | `avg`, `avgmax_mlp` | When using `transformer`/`mlp_mixer`, you need to adjust the height and width of the feature map so that they are divisible by `--token_mixer_grid` (if left as is, an exception will occur during ONNX conversion or inference). |
88+
| `inverted_se` | `avgmax_mlp` | `avg`, `avgmax_mlp` | When using `transformer`/`mlp_mixer`, it is necessary to adjust `--token_mixer_grid` as above. |
89+
| `convnext` | `transformer` | `avg`, `avgmax_mlp`, `transformer`, `mlp_mixer` | For both heads, the grid must be divisible by the feature map (default `3x2` fits with 30x48 input). |
90+
- The classification head is selected with `--head_variant` (`avg`, `avgmax_mlp`, `transformer`, `mlp_mixer`, or `auto` which derives a sensible default from the backbone).
91+
- Pass `--rgb_to_yuv_to_y` to convert RGB crops to YUV, keep only the Y (luma) channel inside the network, and train a single-channel stem without modifying the dataloader.
92+
- Alternatively, use `--rgb_to_lab` or `--rgb_to_luv` to convert inputs to CIE Lab/Luv (3-channel) before the stem; these options are mutually exclusive with each other and with `--rgb_to_yuv_to_y`.
93+
- Mixed precision can be enabled with `--use_amp` when CUDA is available.
94+
- Resume training with `--resume path/to/mwc_epoch_XXXX.pt`; all optimiser/scheduler/AMP states and history are restored.
95+
- Loss/accuracy/F1 metrics are logged to TensorBoard under `output_dir`, and `tqdm` progress bars expose per-epoch progress for train/val/test loops.
96+
97+
Baseline depthwise-separable CNN:
98+
99+
```bash
100+
SIZE=48x48
101+
uv run python -m mwc train \
102+
--data_root data/dataset.parquet \
103+
--output_dir runs/mwc_${SIZE} \
104+
--epochs 40 \
105+
--batch_size 256 \
106+
--train_resampling balanced \
107+
--image_size ${SIZE} \
108+
--base_channels 32 \
109+
--num_blocks 4 \
110+
--arch_variant baseline \
111+
--seed 42 \
112+
--device auto \
113+
--use_amp
114+
```
115+
116+
Inverted residual + SE variant (recommended for higher capacity):
117+
118+
```bash
119+
SIZE=48x48
120+
VAR=s
121+
uv run python -m mwc train \
122+
--data_root data/dataset.parquet \
123+
--output_dir runs/mwc_is_${VAR}_${SIZE} \
124+
--epochs 40 \
125+
--batch_size 256 \
126+
--train_resampling balanced \
127+
--image_size ${SIZE} \
128+
--base_channels 32 \
129+
--num_blocks 4 \
130+
--arch_variant inverted_se \
131+
--head_variant avgmax_mlp \
132+
--seed 42 \
133+
--device auto \
134+
--use_amp
135+
```
136+
137+
ConvNeXt-style backbone with transformer head over pooled tokens:
138+
139+
```bash
140+
SIZE=48x48
141+
uv run python -m mwc train \
142+
--data_root data/dataset.parquet \
143+
--output_dir runs/mwc_convnext_${SIZE} \
144+
--epochs 40 \
145+
--batch_size 256 \
146+
--train_resampling balanced \
147+
--image_size ${SIZE} \
148+
--base_channels 32 \
149+
--num_blocks 4 \
150+
--arch_variant convnext \
151+
--head_variant transformer \
152+
--token_mixer_grid 3x3 \
153+
--seed 42 \
154+
--device auto \
155+
--use_amp
156+
```
157+
158+
- Outputs include the latest 10 `mwc_epoch_*.pt`, the latest 10 `mwc_best_epochXXXX_f1_YYYY.pt` (highest validation F1, or training F1 when no validation split), `history.json`, `summary.json`, optional `test_predictions.csv`, and `train.log`.
159+
- After every epoch a confusion matrix and ROC curve are saved under `runs/mwc/diagnostics/<split>/confusion_<split>_epochXXXX.png` and `roc_<split>_epochXXXX.png`.
160+
- `--image_size` accepts either a single integer for square crops (e.g. `--image_size 48`) or `HEIGHTxWIDTH` to resize non-square frames (e.g. `--image_size 64x48`).
161+
- Add `--resume <checkpoint>` to continue from an earlier epoch. Remember that `--epochs` indicates the desired total epoch count (e.g. resuming `--epochs 40` after training to epoch 30 will run 10 additional epochs).
162+
- Launch TensorBoard with:
163+
```bash
164+
tensorboard --logdir runs/mwc
165+
```
166+
167+
### ONNX Export
168+
169+
```bash
170+
uv run python -m mwc exportonnx \
171+
--checkpoint runs/mwc_is_s_48x48/mwc_best_epoch0049_f1_0.9939.pt \
172+
--output mwc_s_48x48.onnx \
173+
--opset 17
174+
```
175+
176+
## Arch
177+
178+
<img width="300" alt="mwc_p_48x48" src="https://github.com/user-attachments/assets/43a75836-b851-4941-80c1-82d24fa37487" />
179+
180+
## Ultra-lightweight classification model series
181+
1. [VSDLM: Visual-only speech detection driven by lip movements](https://github.com/PINTO0309/VSDLM) - MIT License
182+
2. [OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model](https://github.com/PINTO0309/OCEC) - MIT License
183+
3. [PGC: Ultrafast pointing gesture classification](https://github.com/PINTO0309/PGC) - MIT License
184+
4. [SC: Ultrafast sitting classification](https://github.com/PINTO0309/SC) - MIT License
185+
5. [PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people
186+
interact with smartphones](https://github.com/PINTO0309/PUC) - MIT License
187+
6. [HSC: Happy smile classifier](https://github.com/PINTO0309/HSC) - MIT License
188+
7. [WHC: Waving Hand Classification](https://github.com/PINTO0309/WHC) - MIT License
189+
8. [UHD: Ultra-lightweight human detection](https://github.com/PINTO0309/UHD) - MIT License
190+
9. [MWC: Mask wearing classifier.](https://github.com/PINTO0309/MWC) - MIT License
191+
192+
## Citation
193+
194+
If you find this project useful, please consider citing:
195+
196+
```bibtex
197+
@software{hyodo2026mwc,
198+
author = {Katsuya Hyodo},
199+
title = {PINTO0309/MWC},
200+
month = {04},
201+
year = {2026},
202+
publisher = {Zenodo},
203+
doi = {10.5281/zenodo.19617672},
204+
url = {https://github.com/PINTO0309/mwc},
205+
abstract = {Mask wearing classifier.},
206+
}
207+
```
208+
209+
## Acknowledgments
210+
211+
- https://github.com/cleardusk/3DDFA: MIT License
212+
```bibtex
213+
@misc{3ddfa_cleardusk,
214+
author = {Guo, Jianzhu and Zhu, Xiangyu and Lei, Zhen},
215+
title = {3DDFA},
216+
howpublished = {\url{https://github.com/cleardusk/3DDFA}},
217+
year = {2018}
218+
}
219+
220+
@inproceedings{guo2020towards,
221+
title= {Towards Fast, Accurate and Stable 3D Dense Face Alignment},
222+
author= {Guo, Jianzhu and Zhu, Xiangyu and Yang, Yang and Yang, Fan and Lei, Zhen and Li, Stan Z},
223+
booktitle= {Proceedings of the European Conference on Computer Vision (ECCV)},
224+
year= {2020}
225+
}
226+
227+
@article{zhu2017face,
228+
title= {Face alignment in full pose range: A 3d total solution},
229+
author= {Zhu, Xiangyu and Liu, Xiaoming and Lei, Zhen and Li, Stan Z},
230+
journal= {IEEE transactions on pattern analysis and machine intelligence},
231+
year= {2017},
232+
publisher= {IEEE}
233+
}
234+
```
235+
- https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34: Apache 2.0 License
236+
```bibtex
237+
@software{DEIMv2-Wholebody34,
238+
author={Katsuya Hyodo},
239+
title={Lightweight human detection models generated on high-quality human data sets. It can detect objects with high accuracy and speed in a total of 28 classes: body, adult, child, male, female, body_with_wheelchair, body_with_crutches, head, front, right-front, right-side, right-back, back, left-back, left-side, left-front, face, eye, nose, mouth, ear, collarbone, shoulder, solar_plexus, elbow, wrist, hand, hand_left, hand_right, abdomen, hip_joint, knee, ankle, foot.},
240+
url={https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34},
241+
year={2025},
242+
month={10},
243+
doi={10.5281/zenodo.17625710}
244+
}
245+
```

0 commit comments

Comments
 (0)