|
1 | 1 | torchvision.models
|
2 |
| -================== |
| 2 | +################## |
| 3 | + |
| 4 | + |
| 5 | +The models subpackage contains definitions of models for addressing |
| 6 | +different tasks, including: image classification, pixelwise semantic |
| 7 | +segmentation, object detection, instance segmentation and person |
| 8 | +keypoint detection. |
| 9 | + |
| 10 | + |
| 11 | +Classification |
| 12 | +============== |
3 | 13 |
|
4 | 14 | The models subpackage contains definitions for the following model
|
5 |
| -architectures: |
| 15 | +architectures for image classification: |
6 | 16 |
|
7 | 17 | - `AlexNet`_
|
8 | 18 | - `VGG`_
|
@@ -182,8 +192,149 @@ MobileNet v2
|
182 | 192 | .. autofunction:: mobilenet_v2
|
183 | 193 |
|
184 | 194 | ResNext
|
185 |
| -------------- |
| 195 | +------- |
186 | 196 |
|
187 | 197 | .. autofunction:: resnext50_32x4d
|
188 | 198 | .. autofunction:: resnext101_32x8d
|
189 | 199 |
|
| 200 | + |
| 201 | +Semantic Segmentation |
| 202 | +===================== |
| 203 | + |
| 204 | +As with image classification models, all pre-trained models expect input images normalized in the same way. |
| 205 | +The images have to be loaded in to a range of ``[0, 1]`` and then normalized using |
| 206 | +``mean = [0.485, 0.456, 0.406]`` and ``std = [0.229, 0.224, 0.225]``. |
| 207 | +They have been trained on images resized such that their minimum size is 520. |
| 208 | + |
| 209 | +The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are |
| 210 | +present in the Pascal VOC dataset. You can see more information on how the subset has been selected in |
| 211 | +``references/segmentation/coco_utils.py``. The classes that the pre-trained model outputs are the following, |
| 212 | +in order: |
| 213 | + |
| 214 | + .. code-block:: python |
| 215 | +
|
| 216 | + ['__background__', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', |
| 217 | + 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', |
| 218 | + 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'] |
| 219 | +
|
| 220 | +The accuracies of the pre-trained models evaluated on COCO val2017 are as follows |
| 221 | + |
| 222 | +================================ ============= ==================== |
| 223 | +Network mean IoU global pixelwise acc |
| 224 | +================================ ============= ==================== |
| 225 | +FCN ResNet101 63.7 91.9 |
| 226 | +DeepLabV3 ResNet101 67.4 92.4 |
| 227 | +================================ ============= ==================== |
| 228 | + |
| 229 | + |
| 230 | +Fully Convolutional Networks |
| 231 | +---------------------------- |
| 232 | + |
| 233 | +.. autofunction:: torchvision.models.segmentation.fcn_resnet50 |
| 234 | +.. autofunction:: torchvision.models.segmentation.fcn_resnet101 |
| 235 | + |
| 236 | + |
| 237 | +DeepLabV3 |
| 238 | +--------- |
| 239 | + |
| 240 | +.. autofunction:: torchvision.models.segmentation.deeplabv3_resnet50 |
| 241 | +.. autofunction:: torchvision.models.segmentation.deeplabv3_resnet101 |
| 242 | + |
| 243 | + |
| 244 | +Object Detection, Instance Segmentation and Person Keypoint Detection |
| 245 | +===================================================================== |
| 246 | + |
| 247 | +The pre-trained models for detection, instance segmentation and |
| 248 | +keypoint detection are initialized with the classification models |
| 249 | +in torchvision. |
| 250 | + |
| 251 | +The models expect a list of ``Tensor[C, H, W]``, in the range ``0-1``. |
| 252 | +The models internally resize the images so that they have a minimum size |
| 253 | +of ``800``. This option can be changed by passing the option ``min_size`` |
| 254 | +to the constructor of the models. |
| 255 | + |
| 256 | + |
| 257 | +For object detection and instance segmentation, the pre-trained |
| 258 | +models return the predictions of the following classes: |
| 259 | + |
| 260 | + .. code-block:: python |
| 261 | +
|
| 262 | + COCO_INSTANCE_CATEGORY_NAMES = [ |
| 263 | + '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', |
| 264 | + 'train', 'truck', 'boat', 'traffic', 'light', 'fire', 'hydrant', 'N/A', 'stop', |
| 265 | + 'sign', 'parking', 'meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', |
| 266 | + 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A', |
| 267 | + 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports', 'ball', |
| 268 | + 'kite', 'baseball', 'bat', 'baseball', 'glove', 'skateboard', 'surfboard', 'tennis', |
| 269 | + 'racket', 'bottle', 'N/A', 'wine', 'glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', |
| 270 | + 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot', 'dog', 'pizza', |
| 271 | + 'donut', 'cake', 'chair', 'couch', 'potted', 'plant', 'bed', 'N/A', 'dining', 'table', |
| 272 | + 'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell', |
| 273 | + 'phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book', |
| 274 | + 'clock', 'vase', 'scissors', 'teddy', 'bear', 'hair', 'drier', 'toothbrush' |
| 275 | + ] |
| 276 | +
|
| 277 | +
|
| 278 | +Here are the summary of the accuracies for the models trained on |
| 279 | +the instances set of COCO train2017 and evaluated on COCO val2017. |
| 280 | + |
| 281 | +================================ ======= ======== =========== |
| 282 | +Network box AP mask AP keypoint AP |
| 283 | +================================ ======= ======== =========== |
| 284 | +Faster R-CNN ResNet-50 FPN 37.0 - - |
| 285 | +Mask R-CNN ResNet-50 FPN 37.9 34.6 - |
| 286 | +================================ ======= ======== =========== |
| 287 | + |
| 288 | +For person keypoint detection, the accuracies for the pre-trained |
| 289 | +models are as follows |
| 290 | + |
| 291 | +================================ ======= ======== =========== |
| 292 | +Network box AP mask AP keypoint AP |
| 293 | +================================ ======= ======== =========== |
| 294 | +Keypoint R-CNN ResNet-50 FPN 54.6 - 65.0 |
| 295 | +================================ ======= ======== =========== |
| 296 | + |
| 297 | +For person keypoint detection, the pre-trained model return the |
| 298 | +keypoints in the following order: |
| 299 | + |
| 300 | + .. code-block:: python |
| 301 | +
|
| 302 | + COCO_PERSON_KEYPOINT_NAMES = [ |
| 303 | + 'nose', |
| 304 | + 'left_eye', |
| 305 | + 'right_eye', |
| 306 | + 'left_ear', |
| 307 | + 'right_ear', |
| 308 | + 'left_shoulder', |
| 309 | + 'right_shoulder', |
| 310 | + 'left_elbow', |
| 311 | + 'right_elbow', |
| 312 | + 'left_wrist', |
| 313 | + 'right_wrist', |
| 314 | + 'left_hip', |
| 315 | + 'right_hip', |
| 316 | + 'left_knee', |
| 317 | + 'right_knee', |
| 318 | + 'left_ankle', |
| 319 | + 'right_ankle' |
| 320 | + ] |
| 321 | +
|
| 322 | +
|
| 323 | +
|
| 324 | +Faster R-CNN |
| 325 | +------------ |
| 326 | + |
| 327 | +.. autofunction:: torchvision.models.detection.fasterrcnn_resnet50_fpn |
| 328 | + |
| 329 | + |
| 330 | +Mask R-CNN |
| 331 | +---------- |
| 332 | + |
| 333 | +.. autofunction:: torchvision.models.detection.maskrcnn_resnet50_fpn |
| 334 | + |
| 335 | + |
| 336 | +Keypoint R-CNN |
| 337 | +-------------- |
| 338 | + |
| 339 | +.. autofunction:: torchvision.models.detection.keypointrcnn_resnet50_fpn |
| 340 | + |
0 commit comments