xlite-dev
diff --git a/‎README.md‎
Lines changed: 143 additions & 97 deletions b/‎README.md‎
Lines changed: 143 additions & 97 deletions
diff --git a/‎docs/assets/pipnet0.jpg‎
201 KB b/‎docs/assets/pipnet0.jpg‎
201 KB
diff --git a/‎docs/assets/pipnet_300W_CELEBA_model.gif‎
18.8 MB b/‎docs/assets/pipnet_300W_CELEBA_model.gif‎
18.8 MB
diff --git a/‎docs/assets/pipnet_WFLW_model.gif‎
19.3 MB b/‎docs/assets/pipnet_WFLW_model.gif‎
19.3 MB
diff --git a/‎docs/assets/pipnet_shaolin_soccer.gif‎
14.1 MB b/‎docs/assets/pipnet_shaolin_soccer.gif‎
14.1 MB
diff --git a/‎requirements.txt‎
Lines changed: 1 addition & 1 deletion b/‎requirements.txt‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎setup.py‎
Lines changed: 2 additions & 2 deletions b/‎setup.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎torchlm/data/_converters.py‎
Lines changed: 13 additions & 0 deletions b/‎torchlm/data/_converters.py‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎torchlm/models/pipnet/_impls.py‎
Lines changed: 27 additions & 0 deletions b/‎torchlm/models/pipnet/_impls.py‎
Lines changed: 27 additions & 0 deletions
@@ -12,30 +12,27 @@
 
 
 ## 🤗 Introduction
-**torchlm** is a PyTorch landmarks-only library with **100+ data augmentations**, support **training** and **inference**. **torchlm** is aims at only focus on any landmark detection, such as face landmarks, hand keypoints and body keypoints, etc. It provides **30+** native data augmentations and can **bind** with **80+** transforms from torchvision and albumentations, no matter the input is a np.ndarray or a torch Tensor, **torchlm** will automatically be compatible with different data types and then wrap it back to the original type through a **autodtype** wrapper. Further, **torchlm** will add modules for **training** and **inference** in the future. 
+**torchlm** is aims to build a high level pipeline for face landmarks detection, support **100+ data augmentations**, **training** and **inference**, can can easily install with **pip**.
 <div align='center'>
-  <img src='docs/res/605.jpg' height="100px" width="100px">
-  <img src='docs/res/802.jpg' height="100px" width="100px">
-  <img src='docs/res/92.jpg' height="100px" width="100px">
-  <img src='docs/res/234.jpg' height="100px" width="100px">
-  <img src='docs/res/906.jpg' height="100px" width="100px">
-  <img src='docs/res/825.jpg' height="100px" width="100px">
-  <img src='docs/res/388.jpg' height="100px" width="100px">
-  <br>
   <img src='docs/res/2_wflw_44.jpg' height="100px" width="100px">
   <img src='docs/res/2_wflw_67.jpg' height="100px" width="100px">
   <img src='docs/res/2_wflw_76.jpg' height="100px" width="100px">
-  <img src='docs/res/2_wflw_162.jpg' height="100px" width="100px">
-  <img src='docs/res/2_wflw_229.jpg' height="100px" width="100px">
-  <img src='docs/res/2_wflw_440.jpg' height="100px" width="100px">
-  <img src='docs/res/2_wflw_478.jpg' height="100px" width="100px">
+  <img src='docs/assets/pipnet0.jpg' height="100px" width="100px">
+  <img src='docs/assets/pipnet_300W_CELEBA_model.gif' height="100px" width="100px">
+  <img src='docs/assets/pipnet_shaolin_soccer.gif' height="100px" width="100px">
+  <img src='docs/assets/pipnet_WFLW_model.gif' height="100px" width="100px">
 </div>  
 
 <p align="center"> ❤️ Star 🌟👆🏻 this repo to support me if it does any helps to you, thanks ~  </p>
 
+## 👋 Core Features
+* High level pipeline for **training** and **inference**.
+* Provides **30+** native landmarks data augmentations.
+* Can **bind 80+** transforms from torchvision and albumentations with **one-line-code**.
+* Support awesome models for face landmarks detection, such as YOLOX, YOLOv5, ResNet, MobileNet, ShuffleNet and PIPNet, etc.
 
-# 🆕 What's New
-
+## 🆕 What's New
+* [2022/03/08]: Add **PIPNet**: [Towards Efficient Facial Landmark Detection in the Wild, CVPR2021](https://github.com/jhb86253817/PIPNet)
 * [2022/02/13]: Add **30+** native data augmentations and **bind** **80+** transforms from torchvision and albumentations.
 
 ## 🛠️ Usage
@@ -44,53 +41,56 @@
 * opencv-python-headless>=4.5.2
 * numpy>=1.14.4
 * torch>=1.6.0
-* torchvision>=0.9.0
+* torchvision>=0.8.0
 * albumentations>=1.1.0
+* onnx>=1.8.0
+* onnxruntime>=1.7.0
+* tqdm>=4.10.0
 
 ### Installation
-you can install **torchlm** directly from [pypi](https://pypi.org/project/torchlm/).
+you can install **torchlm** directly from [pypi](https://pypi.org/project/torchlm/). See [NOTE](#torchlm-NOTE) before installation!!!
 ```shell
 pip3 install torchlm
 # install from specific pypi mirrors use '-i'
 pip3 install torchlm -i https://pypi.org/simple/
 ```
-or install from source.
+or install from source if you want the latest torchlm and install it in editable mode with `-e`.
 ```shell
-# clone torchlm repository locally
+# clone torchlm repository locally if you want the latest torchlm
 git clone --depth=1 https://github.com/DefTruth/torchlm.git 
 cd torchlm
 # install in editable mode
 pip install -e .
 ```
+<div id="torchlm-NOTE"></div>  
+
+**NOTE**: If you have the conflict problem between different installed version of opencv (opencv-python and opencv-python-headless, `ablumentations` need opencv-python-headless). Please uninstall the opencv-python and opencv-python-headless first, and then reinstall torchlm. See [albumentations#1139](https://github.com/albumentations-team/albumentations/issues/1139) for more details.
+
+```shell
+# first uninstall confilct opencvs
+pip uninstall opencv-python
+pip uninstall opencv-python-headless
+pip uninstall torchlm  # if you have installed torchlm
+# then reinstall torchlm
+pip install torchlm # will also install deps, e.g opencv
+```
 
-### Data Augmentation
+### 🌟🌟Data Augmentation
 **torchlm** provides **30+** native data augmentations for landmarks and can **bind** with **80+** transforms from torchvision and albumentations through **torchlm.bind** method. Further, **torchlm.bind** provide a `prob` param at bind-level to force any transform or callable be a random-style augmentation. The data augmentations in **torchlm** are `safe` and `simplest`. Any transform operations at runtime cause landmarks outside will be auto dropped to keep the number of landmarks unchanged. The layout format of landmarks is `xy` with shape `(N, 2)`, `N` denotes the number of the input landmarks. No matter the input is a np.ndarray or a torch Tensor, **torchlm** will automatically be compatible with different data types and then wrap it back to the original type through a **autodtype** wrapper. 
 
 * use almost **30+** native transforms from **torchlm** directly
 ```python
 import torchlm
 transform = torchlm.LandmarksCompose([
-        # use native torchlm transforms
-        torchlm.LandmarksRandomScale(prob=0.5),
-        torchlm.LandmarksRandomTranslate(prob=0.5),
-        torchlm.LandmarksRandomShear(prob=0.5),
-        torchlm.LandmarksRandomMask(prob=0.5),
-        torchlm.LandmarksRandomBlur(kernel_range=(5, 25), prob=0.5),
-        torchlm.LandmarksRandomBrightness(prob=0.),
-        torchlm.LandmarksRandomRotate(40, prob=0.5, bins=8),
-        torchlm.LandmarksRandomCenterCrop((0.5, 1.0), (0.5, 1.0), prob=0.5),
-        # ...
-    ])
+    torchlm.LandmarksRandomScale(prob=0.5),
+    torchlm.LandmarksRandomMask(prob=0.5),
+    torchlm.LandmarksRandomBlur(kernel_range=(5, 25), prob=0.5),
+    torchlm.LandmarksRandomBrightness(prob=0.),
+    torchlm.LandmarksRandomRotate(40, prob=0.5, bins=8),
+    torchlm.LandmarksRandomCenterCrop((0.5, 1.0), (0.5, 1.0), prob=0.5)
+])
 ```  
 <div align='center'>
-  <img src='docs/res/605.jpg' height="100px" width="100px">
-  <img src='docs/res/802.jpg' height="100px" width="100px">
-  <img src='docs/res/92.jpg' height="100px" width="100px">
-  <img src='docs/res/234.jpg' height="100px" width="100px">
-  <img src='docs/res/906.jpg' height="100px" width="100px">
-  <img src='docs/res/825.jpg' height="100px" width="100px">
-  <img src='docs/res/388.jpg' height="100px" width="100px">
-  <br>
   <img src='docs/res/2_wflw_44.jpg' height="100px" width="100px">
   <img src='docs/res/2_wflw_67.jpg' height="100px" width="100px">
   <img src='docs/res/2_wflw_76.jpg' height="100px" width="100px">
@@ -102,76 +102,45 @@ transform = torchlm.LandmarksCompose([
 
 * **bind** **80+** torchvision and albumentations's transforms through **torchlm.bind**
 ```python
-import torchvision
-import albumentations
-import torchlm
 transform = torchlm.LandmarksCompose([
-        # use native torchlm transforms
-        torchlm.LandmarksRandomScale(prob=0.5),
-        # bind torchvision image only transforms, bind with a given prob
-        torchlm.bind(torchvision.transforms.GaussianBlur(kernel_size=(5, 25)), prob=0.5),  
-        torchlm.bind(torchvision.transforms.RandomAutocontrast(p=0.5)),
-        # bind albumentations image only transforms
-        torchlm.bind(albumentations.ColorJitter(p=0.5)),
-        torchlm.bind(albumentations.GlassBlur(p=0.5)),
-        # bind albumentations dual transforms
-        torchlm.bind(albumentations.RandomCrop(height=200, width=200, p=0.5)),
-        torchlm.bind(albumentations.Rotate(p=0.5)),
-        # ...
-    ])
+    torchlm.bind(torchvision.transforms.GaussianBlur(kernel_size=(5, 25)), prob=0.5),  
+    torchlm.bind(albumentations.ColorJitter(p=0.5))
+])
 ```
-* **bind** custom callable array or Tensor functions through **torchlm.bind**  
+See [transforms.md](docs/api/transforms.md) for supported transforms sets and more example can be found at [test/transforms.py](test/transforms.py).
+
+<details>
+<summary> bind custom callable array or Tensor functions through torchlm.bind </summary>  
 
 ```python
 # First, defined your custom functions
-def callable_array_noop(img: np.ndarray, landmarks: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
-    # do some transform here ...
+def callable_array_noop(img: np.ndarray, landmarks: np.ndarray) -> Tuple[np.ndarray, np.ndarray]: # do some transform here ...
     return img.astype(np.uint32), landmarks.astype(np.float32)
 
-def callable_tensor_noop(img: Tensor, landmarks: Tensor) -> Tuple[Tensor, Tensor]:
-    # do some transform here ...
+def callable_tensor_noop(img: Tensor, landmarks: Tensor) -> Tuple[Tensor, Tensor]: # do some transform here ...
     return img, landmarks
 ```
 
 ```python
 # Then, bind your functions and put it into the transforms pipeline.
 transform = torchlm.LandmarksCompose([
-        # use native torchlm transforms
-        torchlm.LandmarksRandomScale(prob=0.5),
-        # bind custom callable array functions
         torchlm.bind(callable_array_noop, bind_type=torchlm.BindEnum.Callable_Array),
-        # bind custom callable Tensor functions with a given prob
-        torchlm.bind(callable_tensor_noop, bind_type=torchlm.BindEnum.Callable_Tensor, prob=0.5),  
-        # ...
-    ])
+        torchlm.bind(callable_tensor_noop, bind_type=torchlm.BindEnum.Callable_Tensor, prob=0.5)
+])
 ```
-<div align='center'>
-  <img src='docs/res/124.jpg' height="100px" width="100px">
-  <img src='docs/res/158.jpg' height="100px" width="100px">
-  <img src='docs/res/386.jpg' height="100px" width="100px">
-  <img src='docs/res/478.jpg' height="100px" width="100px">
-  <img src='docs/res/537.jpg' height="100px" width="100px">
-  <img src='docs/res/605.jpg' height="100px" width="100px">
-  <img src='docs/res/802.jpg' height="100px" width="100px">
-<br>
-  <img src='docs/res/2_wflw_484.jpg' height="100px" width="100px">
-  <img src='docs/res/2_wflw_505.jpg' height="100px" width="100px">
-  <img src='docs/res/2_wflw_529.jpg' height="100px" width="100px">
-  <img src='docs/res/2_wflw_536.jpg' height="100px" width="100px">
-  <img src='docs/res/2_wflw_669.jpg' height="100px" width="100px">
-  <img src='docs/res/2_wflw_672.jpg' height="100px" width="100px">
-  <img src='docs/res/2_wflw_741.jpg' height="100px" width="100px">
-</div>  
+</details>
 
+<details>
+<summary> some global debug setting for torchlm's transform </summary>  
 
 * setup logging mode as `True` globally might help you figure out the runtime details
 ```python
-import torchlm
 # some global setting
 torchlm.set_transforms_debug(True)
 torchlm.set_transforms_logging(True)
 torchlm.set_autodtype_logging(True)
-```
+```  
+
 some detail information will show you at each runtime, the infos might look like
 ```shell
 LandmarksRandomScale() AutoDtype Info: AutoDtypeEnum.Array_InOut
@@ -194,21 +163,98 @@ LandmarksRandomTranslate() Execution Flag: False
 
   But, is ok if you pass a Tensor to a np.ndarray-like transform, **torchlm** will automatically be compatible with different data types and then wrap it back to the original type through a **autodtype** wrapper.
 
+</details>
+
+
+### 🎉🎉Training
+In **torchlm**, each model have a high level and user-friendly API named `training`, here is a example of [PIPNet](https://github.com/jhb86253817/PIPNet).
+```python
+from torchlm.models import pipnet
+
+model = pipnet(
+        backbone="resnet18",
+        pretrained=False,
+        num_nb=10,
+        num_lms=98,
+        net_stride=32,
+        input_size=256,
+        meanface_type="wflw",
+        backbone_pretrained=True,
+        map_location="cuda",
+        checkpoint=None
+)
+
+model.training(
+        self,
+        annotation_path: str,
+        criterion_cls: nn.Module = nn.MSELoss(),
+        criterion_reg: nn.Module = nn.L1Loss(),
+        learning_rate: float = 0.0001,
+        cls_loss_weight: float = 10.,
+        reg_loss_weight: float = 1.,
+        num_nb: int = 10,
+        num_epochs: int = 60,
+        save_dir: Optional[str] = "./save",
+        save_interval: Optional[int] = 10,
+        save_prefix: Optional[str] = "",
+        decay_steps: Optional[List[int]] = (30, 50),
+        decay_gamma: Optional[float] = 0.1,
+        device: Optional[Union[str, torch.device]] = "cuda",
+        transform: Optional[transforms.LandmarksCompose] = None,
+        coordinates_already_normalized: Optional[bool] = False,
+        **kwargs: Any  # params for DataLoader
+) -> nn.Module:
+```
+Please jump to the entry point of the function for the detail documentations of **training** API for each defined models in torchlm, e.g [pipnet/_impls.py#L159](https://github.com/DefTruth/torchlm/blob/main/torchlm/models/pipnet/_impls.py#L159). Further, the model implementation plan is as follows:
 
-* Supported Transforms Sets, see [transforms.md](docs/api/transforms.md). A detail example can be found at [test/transforms.py](test/transforms.py).
+❔ YOLOX ❔ YOLOv5 ❔ NanoDet ✅ [PIPNet](https://github.com/jhb86253817/PIPNet) ❔ ResNet ❔ MobileNet ❔ ShuffleNet ❔...
 
-### Training(TODO)
-* [ ] YOLOX
-* [ ] YOLOv5
-* [ ] NanoDet
-* [ ] PIPNet
-* [ ] ResNet
-* [ ] MobileNet
-* [ ] ShuffleNet
-* [ ] ...
+✅ = known work and official supported, ❔ = in my plan, but not coming soon.
 
-### Inference
+### 👀👇 Inference
+#### C++ API
 The ONNXRuntime(CPU/GPU), MNN, NCNN and TNN C++ inference of **torchlm** will be release at [lite.ai.toolkit](https://github.com/DefTruth/lite.ai.toolkit).
+#### Python API
+In **torchlm**, we offer a high level API named `runtime.bind` to bind any models in torchlm and then you can run the `runtime.forward` API to get the output landmarks and bboxes, here is a example of [PIPNet](https://github.com/jhb86253817/PIPNet).
+```python
+import cv2
+import torchlm
+from torchlm.tools import faceboxesv2
+from torchlm.models import pipnet
+
+def test_pipnet_runtime():
+    img_path = "./1.jpg"
+    save_path = "./1.jpg"
+    checkpoint = "./pipnet_resnet18_10x98x32x256_wflw.pth"
+    image = cv2.imread(img_path)
+
+    torchlm.runtime.bind(faceboxesv2())
+    torchlm.runtime.bind(
+        pipnet(
+            backbone="resnet18",
+            pretrained=True,
+            num_nb=10,
+            num_lms=98,
+            net_stride=32,
+            input_size=256,
+            meanface_type="wflw",
+            backbone_pretrained=True,
+            map_location="cpu",
+            checkpoint=checkpoint
+        )
+    )
+    landmarks, bboxes = torchlm.runtime.forward(image)
+    image = torchlm.utils.draw_bboxes(image, bboxes=bboxes)
+    image = torchlm.utils.draw_landmarks(image, landmarks=landmarks)
+
+    cv2.imwrite(save_path, image)
+```
+<div align='center'>
+  <img src='docs/assets/pipnet0.jpg' height="180px" width="180px">
+  <img src='docs/assets/pipnet_300W_CELEBA_model.gif' height="180px" width="180px">
+  <img src='docs/assets/pipnet_shaolin_soccer.gif' height="180px" width="180px">
+  <img src='docs/assets/pipnet_WFLW_model.gif' height="180px" width="180px">
+</div>  
 
 ## 📖 Documentations
 * [x] [Data Augmentation's API](docs/api/transforms.md) 
 
@@ -1,5 +1,5 @@
 # torchlm
-opencv-python-headless>=4.5.2
+opencv-python-headless>=4.3.0
 numpy>=1.14.4
 torch>=1.6.0
 torchvision>=0.9.0
 
@@ -25,14 +25,14 @@ def get_long_description():
     url="https://github.com/DefTruth/torchlm",
     packages=setuptools.find_packages(),
     install_requires=[
-        "opencv-python-headless>=4.5.2",
+        "opencv-python-headless>=4.3.0",
         "numpy>=1.14.4",
         "torch>=1.6.0",
         "torchvision>=0.8.0",
         "albumentations>=1.1.0",
         "onnx>=1.8.0",
         "onnxruntime>=1.7.0",
-        "tqdm>=4.60.0"
+        "tqdm>=4.10.0"
     ],
     classifiers=[
         "Programming Language :: Python :: 3",
 
@@ -0,0 +1,13 @@
+import os
+import cv2
+import numpy as np
+from abc import ABCMeta, abstractmethod
+from typing import Tuple, Optional, List
+
+
+class BaseConverter(object):
+    __metaclass__ = ABCMeta
+
+    @abstractmethod
+    def convert(self, *args, **kwargs):
+        raise NotImplementedError
@@ -157,6 +157,33 @@ def training(
             coordinates_already_normalized: Optional[bool] = False,
             **kwargs: Any  # params for DataLoader
     ) -> nn.Module:
+        """
+        :param annotation_path: the path to a annotation file, the format must be
+           "img0_path img_path x0 y0 x1 y1 ... xn-1,yn-1"
+           "img1_path img_path x0 y0 x1 y1 ... xn-1,yn-1"
+           "img2_path img_path x0 y0 x1 y1 ... xn-1,yn-1"
+           "img3_path img_path x0 y0 x1 y1 ... xn-1,yn-1"
+           ...
+        :param criterion_cls: loss criterion for PIPNet heatmap classification, default MSELoss
+        :param criterion_reg: loss criterion for PIPNet offsets regression, default L1Loss
+        :param learning_rate: learning rate, default 0.0001
+        :param cls_loss_weight: weight for heatmap classification
+        :param reg_loss_weight: weight for offsets regression
+        :param num_nb: the number of Nearest-neighbor landmarks for NRM, default 10
+        :param num_epochs: the number of training epochs
+        :param save_dir: the dir to save checkpoints
+        :param save_interval: the interval to save checkpoints
+        :param save_prefix: the prefix to save checkpoints, the saved name would look like
+         {save_prefix}-epoch{epoch}-loss{epoch_loss}.pth
+        :param decay_steps: decay steps for learning rate
+        :param decay_gamma: decay gamma for learning rate
+        :param device: training device, default cuda.
+        :param transform: user specific transform. If None, torchlm will build a default transform,
+         more details can be found at `torchlm.transforms.build_default_transform`
+        :param coordinates_already_normalized: denoted the label in annotation_path is normalized(by image size) of not
+        :param kwargs:  params for DataLoader
+        :return: A trained model.
+        """
         device = device if torch.cuda.is_available() else "cpu"
         # prepare dataset
         default_dataset = _PIPTrainDataset(