|
| 1 | +# 486_MWC |
| 2 | +[](https://doi.org/10.5281/zenodo.19617672)  [](https://deepwiki.com/PINTO0309/mwc) |
| 3 | + |
| 4 | +Mask wearing classifier. |
| 5 | + |
| 6 | +https://github.com/user-attachments/assets/a02290cd-b8cc-45b6-8e97-2144cc2628ae |
| 7 | + |
| 8 | +|Variant|Size|F1|CPU<br>inference<br>latency|ONNX| |
| 9 | +|:-:|:-:|:-:|:-:|:-:| |
| 10 | +|P|115 KB|0.9981|0.23 ms|[Download](https://github.com/PINTO0309/MWC/releases/download/onnx/mwc_p_48x48.onnx)| |
| 11 | +|N|176 KB|0.9995|0.41 ms|[Download](https://github.com/PINTO0309/MWC/releases/download/onnx/mwc_n_48x48.onnx)| |
| 12 | +|T|280 KB|0.9996|0.52 ms|[Download](https://github.com/PINTO0309/MWC/releases/download/onnx/mwc_t_48x48.onnx)| |
| 13 | +|S|495 KB|0.9998|0.64 ms|[Download](https://github.com/PINTO0309/MWC/releases/download/onnx/mwc_s_48x48.onnx)| |
| 14 | +|L|6.4 MB|0.9998|1.03 ms|[Download](https://github.com/PINTO0309/MWC/releases/download/onnx/mwc_l_48x48.onnx)| |
| 15 | + |
| 16 | +## Setup |
| 17 | + |
| 18 | +```bash |
| 19 | +git clone https://github.com/PINTO0309/MWC.git && cd MWC |
| 20 | +curl -LsSf https://astral.sh/uv/install.sh | sh |
| 21 | +uv sync |
| 22 | +source .venv/bin/activate |
| 23 | +``` |
| 24 | + |
| 25 | +## Inference |
| 26 | + |
| 27 | +```bash |
| 28 | +uv run python demo_mwc.py \ |
| 29 | +-hm mwc_l_48x48.onnx \ |
| 30 | +-v 0 \ |
| 31 | +-ep cuda \ |
| 32 | +-dlr -dnm -dgm -dhm -dhd |
| 33 | + |
| 34 | +uv run python demo_mwc.py \ |
| 35 | +-hm mwc_l_48x48.onnx \ |
| 36 | +-v 0 \ |
| 37 | +-ep tensorrt \ |
| 38 | +-dlr -dnm -dgm -dhm -dhd |
| 39 | +``` |
| 40 | + |
| 41 | +## Archive extraction |
| 42 | + |
| 43 | +Extract images from the source archive into numbered folders under `data/`, |
| 44 | +storing up to 2,000 images per folder: |
| 45 | + |
| 46 | +```bash |
| 47 | +python 00_extract_tar.py \ |
| 48 | +--archive /path/to/train_aug_120x120_part_masked_clean.tar.gz \ |
| 49 | +--output-dir data \ |
| 50 | +--images-per-dir 2000 |
| 51 | +``` |
| 52 | + |
| 53 | +## Dataset parquet |
| 54 | + |
| 55 | +Generate a parquet dataset with embedded resized image bytes: |
| 56 | + |
| 57 | +```bash |
| 58 | +SIZE=48x48 # HxW |
| 59 | +python 01_build_mask_parquet.py \ |
| 60 | +--root data \ |
| 61 | +--output data/dataset_${SIZE}.parquet \ |
| 62 | +--image-size ${SIZE} |
| 63 | +``` |
| 64 | + |
| 65 | +Labels are derived from filenames: |
| 66 | + |
| 67 | +- `*_mask_*` -> `masked` / `1` |
| 68 | +- otherwise -> `no_masked` / `0` |
| 69 | + |
| 70 | +<img width="600" alt="dataset_48x48_class_ratio" src="https://github.com/user-attachments/assets/8ce1e680-5f10-4f0c-bf25-4338da47ef40" /> |
| 71 | + |
| 72 | +## Data sample |
| 73 | + |
| 74 | +|1|2|3|4|5| |
| 75 | +|:-:|:-:|:-:|:-:|:-:| |
| 76 | +|<img width="48" height="48" alt="image" src="https://github.com/user-attachments/assets/0c8dd9bd-eec3-44fa-9a15-ab0d92b0247c" />|<img width="48" height="48" alt="image" src="https://github.com/user-attachments/assets/5e4dccbd-5b54-4296-9f96-ffba6e3c0298" />|<img width="48" height="48" alt="image" src="https://github.com/user-attachments/assets/4cde7b4b-b162-49c0-b660-474688f66f50" />|<img width="48" height="48" alt="image" src="https://github.com/user-attachments/assets/5aa8d6b3-82e5-4430-9934-2f204e1ec51b" />|<img width="48" height="48" alt="image" src="https://github.com/user-attachments/assets/32aa0434-767a-4e39-a0cc-09aeb886881e" />| |
| 77 | + |
| 78 | +## Training Pipeline |
| 79 | + |
| 80 | +- The training loop relies on `BCEWithLogitsLoss` plus class-balanced `pos_weight` to stabilise optimisation under class imbalance; inference produces sigmoid probabilities. Use `--train_resampling weighted` to switch on the previous `WeightedRandomSampler` behaviour, or `--train_resampling balanced` to physically duplicate minority classes before shuffling. |
| 81 | +- Training history, validation metrics, optional test predictions, checkpoints, configuration JSON, and ONNX exports are produced automatically. |
| 82 | +- Per-epoch checkpoints named like `mwc_epoch_0001.pt` are retained (latest 10), as well as the best checkpoints named `mwc_best_epoch0004_f1_0.9321.pt` (also latest 10). |
| 83 | +- The backbone can be switched with `--arch_variant`. Supported combinations with `--head_variant` are: |
| 84 | + |
| 85 | + | `--arch_variant` | Default (`--head_variant auto`) | Explicitly selectable heads | Remarks | |
| 86 | + |------------------|-----------------------------|---------------------------|------| |
| 87 | + | `baseline` | `avg` | `avg`, `avgmax_mlp` | When using `transformer`/`mlp_mixer`, you need to adjust the height and width of the feature map so that they are divisible by `--token_mixer_grid` (if left as is, an exception will occur during ONNX conversion or inference). | |
| 88 | + | `inverted_se` | `avgmax_mlp` | `avg`, `avgmax_mlp` | When using `transformer`/`mlp_mixer`, it is necessary to adjust `--token_mixer_grid` as above. | |
| 89 | + | `convnext` | `transformer` | `avg`, `avgmax_mlp`, `transformer`, `mlp_mixer` | For both heads, the grid must be divisible by the feature map (default `3x2` fits with 30x48 input). | |
| 90 | +- The classification head is selected with `--head_variant` (`avg`, `avgmax_mlp`, `transformer`, `mlp_mixer`, or `auto` which derives a sensible default from the backbone). |
| 91 | +- Pass `--rgb_to_yuv_to_y` to convert RGB crops to YUV, keep only the Y (luma) channel inside the network, and train a single-channel stem without modifying the dataloader. |
| 92 | +- Alternatively, use `--rgb_to_lab` or `--rgb_to_luv` to convert inputs to CIE Lab/Luv (3-channel) before the stem; these options are mutually exclusive with each other and with `--rgb_to_yuv_to_y`. |
| 93 | +- Mixed precision can be enabled with `--use_amp` when CUDA is available. |
| 94 | +- Resume training with `--resume path/to/mwc_epoch_XXXX.pt`; all optimiser/scheduler/AMP states and history are restored. |
| 95 | +- Loss/accuracy/F1 metrics are logged to TensorBoard under `output_dir`, and `tqdm` progress bars expose per-epoch progress for train/val/test loops. |
| 96 | + |
| 97 | +Baseline depthwise-separable CNN: |
| 98 | + |
| 99 | +```bash |
| 100 | +SIZE=48x48 |
| 101 | +uv run python -m mwc train \ |
| 102 | +--data_root data/dataset.parquet \ |
| 103 | +--output_dir runs/mwc_${SIZE} \ |
| 104 | +--epochs 40 \ |
| 105 | +--batch_size 256 \ |
| 106 | +--train_resampling balanced \ |
| 107 | +--image_size ${SIZE} \ |
| 108 | +--base_channels 32 \ |
| 109 | +--num_blocks 4 \ |
| 110 | +--arch_variant baseline \ |
| 111 | +--seed 42 \ |
| 112 | +--device auto \ |
| 113 | +--use_amp |
| 114 | +``` |
| 115 | + |
| 116 | +Inverted residual + SE variant (recommended for higher capacity): |
| 117 | + |
| 118 | +```bash |
| 119 | +SIZE=48x48 |
| 120 | +VAR=s |
| 121 | +uv run python -m mwc train \ |
| 122 | +--data_root data/dataset.parquet \ |
| 123 | +--output_dir runs/mwc_is_${VAR}_${SIZE} \ |
| 124 | +--epochs 40 \ |
| 125 | +--batch_size 256 \ |
| 126 | +--train_resampling balanced \ |
| 127 | +--image_size ${SIZE} \ |
| 128 | +--base_channels 32 \ |
| 129 | +--num_blocks 4 \ |
| 130 | +--arch_variant inverted_se \ |
| 131 | +--head_variant avgmax_mlp \ |
| 132 | +--seed 42 \ |
| 133 | +--device auto \ |
| 134 | +--use_amp |
| 135 | +``` |
| 136 | + |
| 137 | +ConvNeXt-style backbone with transformer head over pooled tokens: |
| 138 | + |
| 139 | +```bash |
| 140 | +SIZE=48x48 |
| 141 | +uv run python -m mwc train \ |
| 142 | +--data_root data/dataset.parquet \ |
| 143 | +--output_dir runs/mwc_convnext_${SIZE} \ |
| 144 | +--epochs 40 \ |
| 145 | +--batch_size 256 \ |
| 146 | +--train_resampling balanced \ |
| 147 | +--image_size ${SIZE} \ |
| 148 | +--base_channels 32 \ |
| 149 | +--num_blocks 4 \ |
| 150 | +--arch_variant convnext \ |
| 151 | +--head_variant transformer \ |
| 152 | +--token_mixer_grid 3x3 \ |
| 153 | +--seed 42 \ |
| 154 | +--device auto \ |
| 155 | +--use_amp |
| 156 | +``` |
| 157 | + |
| 158 | +- Outputs include the latest 10 `mwc_epoch_*.pt`, the latest 10 `mwc_best_epochXXXX_f1_YYYY.pt` (highest validation F1, or training F1 when no validation split), `history.json`, `summary.json`, optional `test_predictions.csv`, and `train.log`. |
| 159 | +- After every epoch a confusion matrix and ROC curve are saved under `runs/mwc/diagnostics/<split>/confusion_<split>_epochXXXX.png` and `roc_<split>_epochXXXX.png`. |
| 160 | +- `--image_size` accepts either a single integer for square crops (e.g. `--image_size 48`) or `HEIGHTxWIDTH` to resize non-square frames (e.g. `--image_size 64x48`). |
| 161 | +- Add `--resume <checkpoint>` to continue from an earlier epoch. Remember that `--epochs` indicates the desired total epoch count (e.g. resuming `--epochs 40` after training to epoch 30 will run 10 additional epochs). |
| 162 | +- Launch TensorBoard with: |
| 163 | + ```bash |
| 164 | + tensorboard --logdir runs/mwc |
| 165 | + ``` |
| 166 | + |
| 167 | +### ONNX Export |
| 168 | + |
| 169 | +```bash |
| 170 | +uv run python -m mwc exportonnx \ |
| 171 | +--checkpoint runs/mwc_is_s_48x48/mwc_best_epoch0049_f1_0.9939.pt \ |
| 172 | +--output mwc_s_48x48.onnx \ |
| 173 | +--opset 17 |
| 174 | +``` |
| 175 | + |
| 176 | +## Arch |
| 177 | + |
| 178 | +<img width="300" alt="mwc_p_48x48" src="https://github.com/user-attachments/assets/43a75836-b851-4941-80c1-82d24fa37487" /> |
| 179 | + |
| 180 | +## Ultra-lightweight classification model series |
| 181 | +1. [VSDLM: Visual-only speech detection driven by lip movements](https://github.com/PINTO0309/VSDLM) - MIT License |
| 182 | +2. [OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model](https://github.com/PINTO0309/OCEC) - MIT License |
| 183 | +3. [PGC: Ultrafast pointing gesture classification](https://github.com/PINTO0309/PGC) - MIT License |
| 184 | +4. [SC: Ultrafast sitting classification](https://github.com/PINTO0309/SC) - MIT License |
| 185 | +5. [PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people |
| 186 | +interact with smartphones](https://github.com/PINTO0309/PUC) - MIT License |
| 187 | +6. [HSC: Happy smile classifier](https://github.com/PINTO0309/HSC) - MIT License |
| 188 | +7. [WHC: Waving Hand Classification](https://github.com/PINTO0309/WHC) - MIT License |
| 189 | +8. [UHD: Ultra-lightweight human detection](https://github.com/PINTO0309/UHD) - MIT License |
| 190 | +9. [MWC: Mask wearing classifier.](https://github.com/PINTO0309/MWC) - MIT License |
| 191 | + |
| 192 | +## Citation |
| 193 | + |
| 194 | +If you find this project useful, please consider citing: |
| 195 | + |
| 196 | +```bibtex |
| 197 | +@software{hyodo2026mwc, |
| 198 | + author = {Katsuya Hyodo}, |
| 199 | + title = {PINTO0309/MWC}, |
| 200 | + month = {04}, |
| 201 | + year = {2026}, |
| 202 | + publisher = {Zenodo}, |
| 203 | + doi = {10.5281/zenodo.19617672}, |
| 204 | + url = {https://github.com/PINTO0309/mwc}, |
| 205 | + abstract = {Mask wearing classifier.}, |
| 206 | +} |
| 207 | +``` |
| 208 | + |
| 209 | +## Acknowledgments |
| 210 | + |
| 211 | +- https://github.com/cleardusk/3DDFA: MIT License |
| 212 | + ```bibtex |
| 213 | + @misc{3ddfa_cleardusk, |
| 214 | + author = {Guo, Jianzhu and Zhu, Xiangyu and Lei, Zhen}, |
| 215 | + title = {3DDFA}, |
| 216 | + howpublished = {\url{https://github.com/cleardusk/3DDFA}}, |
| 217 | + year = {2018} |
| 218 | + } |
| 219 | +
|
| 220 | + @inproceedings{guo2020towards, |
| 221 | + title= {Towards Fast, Accurate and Stable 3D Dense Face Alignment}, |
| 222 | + author= {Guo, Jianzhu and Zhu, Xiangyu and Yang, Yang and Yang, Fan and Lei, Zhen and Li, Stan Z}, |
| 223 | + booktitle= {Proceedings of the European Conference on Computer Vision (ECCV)}, |
| 224 | + year= {2020} |
| 225 | + } |
| 226 | +
|
| 227 | + @article{zhu2017face, |
| 228 | + title= {Face alignment in full pose range: A 3d total solution}, |
| 229 | + author= {Zhu, Xiangyu and Liu, Xiaoming and Lei, Zhen and Li, Stan Z}, |
| 230 | + journal= {IEEE transactions on pattern analysis and machine intelligence}, |
| 231 | + year= {2017}, |
| 232 | + publisher= {IEEE} |
| 233 | + } |
| 234 | + ``` |
| 235 | +- https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34: Apache 2.0 License |
| 236 | + ```bibtex |
| 237 | + @software{DEIMv2-Wholebody34, |
| 238 | + author={Katsuya Hyodo}, |
| 239 | + title={Lightweight human detection models generated on high-quality human data sets. It can detect objects with high accuracy and speed in a total of 28 classes: body, adult, child, male, female, body_with_wheelchair, body_with_crutches, head, front, right-front, right-side, right-back, back, left-back, left-side, left-front, face, eye, nose, mouth, ear, collarbone, shoulder, solar_plexus, elbow, wrist, hand, hand_left, hand_right, abdomen, hip_joint, knee, ankle, foot.}, |
| 240 | + url={https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34}, |
| 241 | + year={2025}, |
| 242 | + month={10}, |
| 243 | + doi={10.5281/zenodo.17625710} |
| 244 | + } |
| 245 | + ``` |
0 commit comments