FoundationVision · parthmalpathak · Jul 26, 2022 · Jul 26, 2022 · Jul 27, 2022 · Jul 27, 2022
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# ByteTrack
+# YOLOv5 Implementation of ByteTrack
 
 [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bytetrack-multi-object-tracking-by-1/multi-object-tracking-on-mot17)](https://paperswithcode.com/sota/multi-object-tracking-on-mot17?p=bytetrack-multi-object-tracking-by-1)
 
@@ -40,10 +40,10 @@ Multi-object tracking (MOT) aims at estimating bounding boxes and identities of
 ### 1. Installing on the host machine
 Step1. Install ByteTrack.
 ```shell
-git clone https://github.com/ifzhang/ByteTrack.git
-cd ByteTrack
+git clone https://github.com/parthmalpathak/ByteTrack_YOLOv5_v6.0.git
+cd ByteTrack_YOLOv5_v6.0
 pip3 install -r requirements.txt
-python3 setup.py develop
+pip3 install -e .
 ```
 
 Step2. Install [pycocotools](https://github.com/cocodataset/cocoapi).
@@ -105,7 +105,7 @@ datasets
 Then, you need to turn the datasets to COCO format and mix different training data:
 
 ```shell
-cd <ByteTrack_HOME>
+cd ByteTrack_YOLOv5_v6.0
 python3 tools/convert_mot17_to_coco.py
 python3 tools/convert_mot20_to_coco.py
 python3 tools/convert_crowdhuman_to_coco.py
@@ -116,7 +116,7 @@ python3 tools/convert_ethz_to_coco.py
 Before mixing different datasets, you need to follow the operations in [mix_xxx.py](https://github.com/ifzhang/ByteTrack/blob/c116dfc746f9ebe07d419caa8acba9b3acfa79a6/tools/mix_data_ablation.py#L6) to create a data folder and link. Finally, you can mix the training data:
 
 ```shell
-cd <ByteTrack_HOME>
+cd ByteTrack_YOLOv5_v6.0
 python3 tools/mix_data_ablation.py
 python3 tools/mix_data_test_mot17.py
 python3 tools/mix_data_test_mot20.py
@@ -165,21 +165,23 @@ Train on CrowdHuman and MOT20, evaluate on MOT20 train.
 |bytetrack_x_mot20 [[google]](https://drive.google.com/file/d/1HX2_JpMOjOIj1Z9rJjoet9XNy_cCAs5U/view?usp=sharing), [[baidu(code:3apd)]](https://pan.baidu.com/s/1bowJJj0bAnbhEQ3_6_Am0A) | 93.4 | 89.3 | 1057 | 17.5 |
 
 
+NOTE: ```Training``` and ```Tracking``` is not required for ```Demo```. User can directly jump to ```Demo``` section to test results on YOLOx or YOLOv5 detectors.
+
 ## Training
 
 The COCO pretrained YOLOX model can be downloaded from their [model zoo](https://github.com/Megvii-BaseDetection/YOLOX/tree/0.1.0). After downloading the pretrained models, you can put them under <ByteTrack_HOME>/pretrained.
 
 * **Train ablation model (MOT17 half train and CrowdHuman)**
 
 ```shell
-cd <ByteTrack_HOME>
+cd ByteTrack_YOLOv5_v6.0
 python3 tools/train.py -f exps/example/mot/yolox_x_ablation.py -d 8 -b 48 --fp16 -o -c pretrained/yolox_x.pth
 ```
 
 * **Train MOT17 test model (MOT17 train, CrowdHuman, Cityperson and ETHZ)**
 
 ```shell
-cd <ByteTrack_HOME>
+cd ByteTrack_YOLOv5_v6.0
 python3 tools/train.py -f exps/example/mot/yolox_x_mix_det.py -d 8 -b 48 --fp16 -o -c pretrained/yolox_x.pth
 ```
 
@@ -190,7 +192,7 @@ For MOT20, you need to clip the bounding boxes inside the image.
 Add clip operation in [line 134-135 in data_augment.py](https://github.com/ifzhang/ByteTrack/blob/72cd6dd24083c337a9177e484b12bb2b5b3069a6/yolox/data/data_augment.py#L134), [line 122-125 in mosaicdetection.py](https://github.com/ifzhang/ByteTrack/blob/72cd6dd24083c337a9177e484b12bb2b5b3069a6/yolox/data/datasets/mosaicdetection.py#L122), [line 217-225 in mosaicdetection.py](https://github.com/ifzhang/ByteTrack/blob/72cd6dd24083c337a9177e484b12bb2b5b3069a6/yolox/data/datasets/mosaicdetection.py#L217), [line 115-118 in boxes.py](https://github.com/ifzhang/ByteTrack/blob/72cd6dd24083c337a9177e484b12bb2b5b3069a6/yolox/utils/boxes.py#L115).
 
 ```shell
-cd <ByteTrack_HOME>
+cd ByteTrack_YOLOv5_v6.0
 python3 tools/train.py -f exps/example/mot/yolox_x_mix_mot20_ch.py -d 8 -b 48 --fp16 -o -c pretrained/yolox_x.pth
 ```
 
@@ -199,7 +201,7 @@ python3 tools/train.py -f exps/example/mot/yolox_x_mix_mot20_ch.py -d 8 -b 48 --
 First, you need to prepare your dataset in COCO format. You can refer to [MOT-to-COCO](https://github.com/ifzhang/ByteTrack/blob/main/tools/convert_mot17_to_coco.py) or [CrowdHuman-to-COCO](https://github.com/ifzhang/ByteTrack/blob/main/tools/convert_crowdhuman_to_coco.py). Then, you need to create a Exp file for your dataset. You can refer to the [CrowdHuman](https://github.com/ifzhang/ByteTrack/blob/main/exps/example/mot/yolox_x_ch.py) training Exp file. Don't forget to modify get_data_loader() and get_eval_loader in your Exp file. Finally, you can train bytetrack on your dataset by running:
 
 ```shell
-cd <ByteTrack_HOME>
+cd ByteTrack_YOLOv5_v6.0
 python3 tools/train.py -f exps/example/mot/your_exp_file.py -d 8 -b 48 --fp16 -o -c pretrained/yolox_x.pth
 ```
 
@@ -211,7 +213,7 @@ python3 tools/train.py -f exps/example/mot/your_exp_file.py -d 8 -b 48 --fp16 -o
 Run ByteTrack:
 
 ```shell
-cd <ByteTrack_HOME>
+cd ByteTrack_YOLOv5_v6.0
 python3 tools/track.py -f exps/example/mot/yolox_x_ablation.py -c pretrained/bytetrack_ablation.pth.tar -b 1 -d 1 --fp16 --fuse
 ```
 You can get 76.6 MOTA using our pretrained model.
@@ -228,7 +230,7 @@ python3 tools/track_motdt.py -f exps/example/mot/yolox_x_ablation.py -c pretrain
 Run ByteTrack:
 
 ```shell
-cd <ByteTrack_HOME>
+cd ByteTrack_YOLOv5_v6.0
 python3 tools/track.py -f exps/example/mot/yolox_x_mix_det.py -c pretrained/bytetrack_x_mot17.pth.tar -b 1 -d 1 --fp16 --fuse
 python3 tools/interpolation.py
 ```
@@ -241,7 +243,7 @@ We use the input size 1600 x 896 for MOT20-04, MOT20-07 and 1920 x 736 for MOT20
 Run ByteTrack:
 
 ```shell
-cd <ByteTrack_HOME>
+cd ByteTrack_YOLOv5_v6.0
 python3 tools/track.py -f exps/example/mot/yolox_x_mix_mot20_ch.py -c pretrained/bytetrack_x_mot20.pth.tar -b 1 -d 1 --fp16 --fuse --match_thresh 0.7 --mot20
 python3 tools/interpolation.py
 ```
@@ -269,10 +271,22 @@ You can get the tracking results in each frame from 'online_targets'. You can re
 
 <img src="assets/palace_demo.gif" width="600"/>
 
+In order to test ByteTrack with YOLOx:
+
 ```shell
-cd <ByteTrack_HOME>
+cd ByteTrack_YOLOv5_v6.0
 python3 tools/demo_track.py video -f exps/example/mot/yolox_x_mix_det.py -c pretrained/bytetrack_x_mot17.pth.tar --fp16 --fuse --save_result
 ```
+In order to test ByteTrack with YOLOv5 version 6.0:
+
+```shell
+cd ByteTrack_YOLOv5_v6.0
+python3 tools_yolov5/demo_track_yolov5_v6.py webcam -f exps/example/mot/yolov5_mix_det.py --save_result --ckpt pretrained/yolov5n6.pt
+```
+User can pass ```video``` or ```image``` instead of ```webcam``` and pass the path in ```demo_track_yolov5_v6.py```.
+
+YOLOv5 Version 6 pretrained models can be downloaded from [YOLOv5 Model Zoo](https://github.com/ultralytics/yolov5/releases) under the ```Assets``` subsection of ```Version 6.1``` and ```Version 6.0```.
+
 
 ## Deploy
 
@@ -295,4 +309,8 @@ python3 tools/demo_track.py video -f exps/example/mot/yolox_x_mix_det.py -c pret
 
 ## Acknowledgement
 
-A large part of the code is borrowed from [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX), [FairMOT](https://github.com/ifzhang/FairMOT), [TransTrack](https://github.com/PeizeSun/TransTrack) and [JDE-Cpp](https://github.com/samylee/Towards-Realtime-MOT-Cpp). Many thanks for their wonderful works.
+1. A large part of the code is borrowed from [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX), [FairMOT](https://github.com/ifzhang/FairMOT)[TransTrack](https://github.com/PeizeSun/TransTrack) and [JDE-Cpp](https://github.com/samylee/Towards-Realtime-MOT-Cpp). Many thanks for their wonderful works.
+2. Many thanks to [@ifzhang](https://github.com/ifzhang/ByteTrack) and [@gmt710](https://github.com/gmt710/yolov5ByteTrack) for their contributions to ByteTrack.
+
+## Copyright
+Adding YOLOv5 v6.0, v6.1 functionality to ByteTrack has been authored by [Parth Malpathak](https://github.com/parthmalpathak). Please mention proper citations while using this repository
diff --git a/exps/example/mot/yolov5_mix_det.py b/exps/example/mot/yolov5_mix_det.py
@@ -0,0 +1,138 @@
+# encoding: utf-8
+import os
+import random
+import torch
+import torch.nn as nn
+import torch.distributed as dist
+
+from yolox.exp import Exp as MyExp
+from yolox.data import get_yolox_datadir
+
+class Exp(MyExp):
+    def __init__(self):
+        super(Exp, self).__init__()
+        self.num_classes = 1
+        self.depth = 0.33
+        self.width = 0.50
+        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
+        self.train_ann = "train.json"
+        self.val_ann = "test.json" #convert to train.json for training
+        self.input_size = (640, 640) #inpute size changed as compared to original code.
+        self.test_size = (640, 640)
+        self.random_size = (12, 26)
+        self.max_epoch = 80
+        self.print_interval = 20
+        self.eval_interval = 5
+        self.test_conf = 0.001
+        self.nmsthre = 0.7
+        self.no_aug_epochs = 10
+        self.basic_lr_per_img = 0.001 / 64.0
+        self.warmup_epochs = 1
+
+    def get_data_loader(self, batch_size, is_distributed, no_aug=False):
+        from yolox.data import (
+            MOTDataset,
+            TrainTransform,
+            YoloBatchSampler,
+            DataLoader,
+            InfiniteSampler,
+            MosaicDetection,
+        )
+
+        dataset = MOTDataset(
+            data_dir=os.path.join(get_yolox_datadir(), "mix_det"),
+            json_file=self.train_ann,
+            name='',
+            img_size=self.input_size,
+            preproc=TrainTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+                max_labels=500,
+            ),
+        )
+
+        dataset = MosaicDetection(
+            dataset,
+            mosaic=not no_aug,
+            img_size=self.input_size,
+            preproc=TrainTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+                max_labels=1000,
+            ),
+            degrees=self.degrees,
+            translate=self.translate,
+            scale=self.scale,
+            shear=self.shear,
+            perspective=self.perspective,
+            enable_mixup=self.enable_mixup,
+        )
+
+        self.dataset = dataset
+
+        if is_distributed:
+            batch_size = batch_size // dist.get_world_size()
+
+        sampler = InfiniteSampler(
+            len(self.dataset), seed=self.seed if self.seed else 0
+        )
+
+        batch_sampler = YoloBatchSampler(
+            sampler=sampler,
+            batch_size=batch_size,
+            drop_last=False,
+            input_dimension=self.input_size,
+            mosaic=not no_aug,
+        )
+
+        dataloader_kwargs = {"num_workers": self.data_num_workers, "pin_memory": True}
+        dataloader_kwargs["batch_sampler"] = batch_sampler
+        train_loader = DataLoader(self.dataset, **dataloader_kwargs)
+
+        return train_loader
+
+    def get_eval_loader(self, batch_size, is_distributed, testdev=False):
+        from yolox.data import MOTDataset, ValTransform
+
+        valdataset = MOTDataset(
+            data_dir=os.path.join(get_yolox_datadir(), "mot"),
+            json_file=self.val_ann,
+            img_size=self.test_size,
+            name='train',
+            preproc=ValTransform(
+                rgb_means=(0.485, 0.456, 0.406),
+                std=(0.229, 0.224, 0.225),
+            ),
+        )
+
+        if is_distributed:
+            batch_size = batch_size // dist.get_world_size()
+            sampler = torch.utils.data.distributed.DistributedSampler(
+                valdataset, shuffle=False
+            )
+        else:
+            sampler = torch.utils.data.SequentialSampler(valdataset)
+
+        dataloader_kwargs = {
+            "num_workers": self.data_num_workers,
+            "pin_memory": True,
+            "sampler": sampler,
+        }
+        dataloader_kwargs["batch_size"] = batch_size
+        val_loader = torch.utils.data.DataLoader(valdataset, **dataloader_kwargs)
+
+        return val_loader
+
+    def get_evaluator(self, batch_size, is_distributed, testdev=False):
+        from yolox.evaluators import COCOEvaluator
+
+        val_loader = self.get_eval_loader(batch_size, is_distributed, testdev=testdev)
+        evaluator = COCOEvaluator(
+            dataloader=val_loader,
+            img_size=self.test_size,
+            confthre=self.test_conf,
+            nmsthre=self.nmsthre,
+            num_classes=self.num_classes,
+            testdev=testdev,
+        )
+        return evaluator