NVIDIA-AI-IOT
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CONTRIBUTORS.md‎
Lines changed: 2 additions & 1 deletion b/‎CONTRIBUTORS.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 14 additions & 1 deletion b/‎README.md‎
Lines changed: 14 additions & 1 deletion
diff --git a/‎examples/contrib/quantization_aware_training/README.md‎
Lines changed: 93 additions & 0 deletions b/‎examples/contrib/quantization_aware_training/README.md‎
Lines changed: 93 additions & 0 deletions
diff --git a/‎examples/contrib/quantization_aware_training/__init__.py‎
Lines changed: 1 addition & 0 deletions b/‎examples/contrib/quantization_aware_training/__init__.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎examples/contrib/quantization_aware_training/datasets/__init__.py‎ b/‎examples/contrib/quantization_aware_training/datasets/__init__.py‎
diff --git a/‎examples/contrib/quantization_aware_training/datasets/cifar10.py‎
Lines changed: 38 additions & 0 deletions b/‎examples/contrib/quantization_aware_training/datasets/cifar10.py‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎examples/contrib/quantization_aware_training/infer.py‎
Lines changed: 81 additions & 0 deletions b/‎examples/contrib/quantization_aware_training/infer.py‎
Lines changed: 81 additions & 0 deletions
diff --git a/‎examples/contrib/quantization_aware_training/models/__init__.py‎ b/‎examples/contrib/quantization_aware_training/models/__init__.py‎
diff --git a/‎examples/contrib/quantization_aware_training/models/models.py‎
Lines changed: 36 additions & 0 deletions b/‎examples/contrib/quantization_aware_training/models/models.py‎
Lines changed: 36 additions & 0 deletions
@@ -2,6 +2,7 @@
 
 ## [Master]
 
+- Added Quantization Aware Training (QAT) workflow to contrib
 - Added converter for ``torch.roll``
 - Added converter for ``torch.nn.functional.layer_norm``
 - Added converter for ``torch.nn.functional.gelu``
 
@@ -3,6 +3,7 @@
 Below is a list of developers who have contributed to torch2trt.  This is also used to track contributors
 who have agreed to torch2trt's Contributor License Agreement.
 
+- [John Welsh](https://github.com/jaybdub) (CLA)
 - John Welsh
 
 ## Becoming a Contributor
@@ -42,6 +43,6 @@ In some instances, you may be requested to sign torch2trt's Contributor License
 4. Make a signed commit with the following text
 
    ```md
-   git commit -S -m "I have read and agree to the Contributor License Agreement as written in the file CLA.pdf of this project.  Signed, <Full Name>"
+   git commit -S -m "I have read and agree to the Contributor License Agreement as written in the file CLA.md of this project.  Signed, <Full Name>"
    ```
 
@@ -115,7 +115,7 @@ cd torch2trt
 python setup.py install
 ```
 
-### Option 2 - With plugins (experimental)
+### Option 2 - With plugins 
 
 To install with plugins to support some operations in PyTorch that are not natviely supported with TensorRT, call the following
 
@@ -127,6 +127,19 @@ cd torch2trt
 sudo python setup.py install --plugins
 ```
 
+### Option 3 - With support for experimental community contributed features
+
+To install torch2trt with experimental community contributed features under ``torch2trt.contrib``, like Quantization Aware Training (QAT)(`requires TensorRT>=7.0`), call the following,      
+
+```bash
+git clone https://github.com/NVIDIA-AI-IOT/torch2trt
+cd torch2trt/scripts    
+bash build_contrib.sh   
+```
+  
+This enables you to run the QAT example located [here](examples/contrib/quantization_aware_training).   
+    
+
 ## How does it work?
 
 This converter works by attaching conversion functions (like ``convert_ReLU``) to the original 
 
@@ -0,0 +1,93 @@
+## QAT working example
+
+This example is using QAT library open sourced by nvidia. [Github link](https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization)
+
+## Directory overview
+
+1. This directory contains
+   1. `dataset` : contains code for cifar-10 dataset
+   2. `layers` : contains implementation for inference. More details under `layers/README.md`
+   3. `models`: contains two models. `resnet18` and `vanilla_cnn`
+   4. `utils` : contains various utility functions for loading state dict, custom wrapper for training and inference & calculating accuracy during training
+   5. `train.py` and `infer.py` : contains code for training and inference (including trt conversion)
+
+2. Usually, nvidia quantization library doesn't provide control per layer for quantization. Custom wrapper under `utils/utilities.py` helps us in quantization selective layers in our model.
+
+## Environment
+
+**Filename** : pytorch_ngc_container_20.09     
+
+```
+FROM nvcr.io/nvidia/pytorch:20.09-py3
+RUN apt-get update && apt-get install -y software-properties-common && apt-get update
+RUN add-apt-repository ppa:git-core/ppa && \
+    apt install -y git    
+
+RUN pip install termcolor graphviz
+
+## If you have followed instructions on main README.md file to install torch2trt using scripts/build_contrib.sh
+## You dont require rest of the steps
+
+RUN git clone https://github.com/NVIDIA/TensorRT.git /sw/TensorRT/
+
+##Make sure that patch file is under the same folder where dockerfile is being called
+
+ADD pytorch_nvidia_quantization.patch /sw/TensorRT
+
+RUN cd /sw/TensorRT/ && \
+    git sparse-checkout init --cone && \
+    git sparse-checkout set /tools/pytorch-quantization/ && \
+    git apply --reject --whitespace=fix pytorch_nvidia_quantization.patch && \
+    cd tools/pytorch-quantization/ && \
+    python setup.py install 
+
+RUN git clone https://github.com/NVIDIA-AI-IOT/torch2trt.git /sw/TensorRT/ && \
+    cd /sw/TensorRT/ && \
+    git fetch origin pull/514/head:PR514 && \
+    git checkout PR514 && \
+    python setup.py install --plugins
+
+```
+
+Docker build: `docker build -f pytorch_ngc_container_20.09 -t pytorch_ngc_container_20.09 .`
+
+`docker_image=pytorch_ngc_container_20.09`
+
+Docker run : `docker run -e NVIDIA_VISIBLE_DEVICES=0 --gpus 0 -it --shm-size=1g --ulimit memlock=-1  --rm  -v $PWD:/workspace/work $docker_image` 
+
+**Important Notes** : 
+
+- Sparse checkout helps us in checking out a part of the github repo. 
+- Patch file can be found under `examples/quantization_aware_training/utils`
+
+## Workflow
+
+Workflow consists of three parts. 
+1. Train without quantization:
+
+Here pretrained weights from imagenet are used. 
+
+`python train.py --m resnet34-tl / resnet18-tl --num_epochs 45 --test_trt --FP16 --INT8PTC`
+
+2. Train with quantization (weights are mapped using a custom function to make sure that each weight is loaded correctly)
+
+`python train.py --m resnet34/ resnet18 --netqat --partial_ckpt --tl --load_ckpt /tmp/pytorch_exp/{} --num_epochs 25 --lr 1e-4 --lrdt 10`
+
+3. Infer with and without TRT
+
+`python infer.py --m resnet34/resnet18 --load_ckpt /tmp/pytorch_exp_1/ckpt_{} --netqat --INT8QAT`
+
+
+## Accuracy Results 
+
+| Model | FP32 | FP16 | INT8 (QAT) | INT(PTC) |
+|-------|------|------|------------|----------|
+| Resnet18 | 83.08 | 83.12 | 83.12 | 83.06 |
+| Resnet34 | 84.65 | 84.65 | 83.26 | 84.5 |  
+
+
+**Please note that the idea behind these experiments is to see if TRT conversion is working properly rather than achieving industry standard accuracy results**
+
+## Future Work
+
+- Add results for Resnet50, EfficientNet and Mobilenet
@@ -0,0 +1 @@
+from .layers import *
@@ -0,0 +1,38 @@
+import torch
+import torchvision
+import torchvision.transforms as transforms
+
+class Cifar10Loaders:
+    """
+    Data loaders for cifar 10 dataset
+    """
+    def __init__(self, data_dir='/tmp/cifar10', download=True, batch_size=128, pin_memory=True, num_workers=4):
+        self.data_dir = data_dir
+        self.download = download
+        self.batch_size= batch_size
+        self.pin_memory = pin_memory
+        self.num_workers = num_workers
+        self.train_transform = transforms.Compose([
+            transforms.RandomCrop(32, padding=4),
+            transforms.RandomHorizontalFlip(),
+            transforms.ToTensor(),
+            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+        ])
+        self.test_transform = transforms.Compose([
+            transforms.ToTensor(),
+            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+        ])
+    
+    def train_loader(self,shuffle=True):
+        trainset = torchvision.datasets.CIFAR10(root=self.data_dir, train=True, download=True, transform=self.train_transform)
+        trainloader = torch.utils.data.DataLoader(trainset, batch_size=self.batch_size, shuffle=shuffle, num_workers=self.num_workers, pin_memory=self.pin_memory)
+        return trainloader
+    
+    def test_loader(self,shuffle=False):
+        testset = torchvision.datasets.CIFAR10(root=self.data_dir, train=False, download=True, transform=self.test_transform)
+        testloader = torch.utils.data.DataLoader(testset, batch_size=self.batch_size, shuffle=shuffle, num_workers=self.num_workers, pin_memory=self.pin_memory)
+        return testloader
+    
+    
+
+
@@ -0,0 +1,81 @@
+import timeit
+import torch 
+import torch.nn as nn
+import numpy as np 
+import torchvision
+import argparse
+import os,sys 
+from datasets.cifar10 import Cifar10Loaders
+from utils.utilities import calculate_accuracy, timeGraph,printStats
+from models.resnet import resnet18,resnet34
+from parser import parse_args
+from torch2trt import torch2trt
+import tensorrt as trt
+torch.set_printoptions(precision=5)
+
+def main():
+    args = parse_args()
+
+    args.cuda = not args.no_cuda and torch.cuda.is_available()
+    torch.manual_seed(78543)
+
+    if args.cuda:
+        torch.backends.cudnn.benchmark = True
+        torch.cuda.manual_seed(args.seed)
+    
+    loaders = Cifar10Loaders()
+    train_loader = loaders.train_loader()
+    test_loader = loaders.test_loader()
+
+    if args.m == "resnet18":
+        if args.netqat:
+            model=resnet18(qat_mode=True,infer=True)
+        else:
+            model=resnet18()
+    elif args.m == "resnet34":
+        if args.netqat:
+            model=resnet34(qat_mode=True,infer=True)
+        else:
+            model=resnet34()
+    else:
+        raise NotImplementedError("{} model not found".format(args.m))
+
+
+    model = model.cuda().eval()
+
+    if args.load_ckpt:
+        checkpoint = torch.load(args.load_ckpt)
+        if not args.netqat:
+            checkpoint = mapping_names_resnets(checkpoint)
+        model.load_state_dict(checkpoint['model_state_dict'],strict=True)
+        print("===>>> Checkpoint loaded successfully from {} ".format(args.load_ckpt))
+    
+    test_accuracy = calculate_accuracy(model,test_loader)
+    print(" Test accuracy for Pytorch model: {0} ".format(test_accuracy))
+    rand_in = torch.randn([128,3,32,32],dtype=torch.float32).cuda()
+    
+    #Converting the model to TRT
+    if args.FP16:
+        trt_model_fp16 = torch2trt(model,[rand_in],log_level=trt.Logger.INFO,fp16_mode=True,max_batch_size=128)
+        test_accuracy = calculate_accuracy(trt_model_fp16,test_loader)
+        print(" TRT test accuracy at FP16: {0}".format(test_accuracy))
+    
+    if args.INT8QAT:
+        trt_model_int8 = torch2trt(model,[rand_in],log_level=trt.Logger.INFO,fp16_mode=True,int8_mode=True,max_batch_size=128,qat_mode=True)
+        test_accuracy = calculate_accuracy(trt_model_int8,test_loader)
+        print(" TRT test accuracy at INT8 QAT: {0}".format(test_accuracy))
+    
+    if args.INT8PTC:
+        ##preparing calib dataset
+        calib_dataset = list()
+        for i, sam in enumerate(test_loader):
+            calib_dataset.extend(sam[0])
+            if i ==5:
+                break
+
+        trt_model_calib_int8 = torch2trt(model,[rand_in],log_level=trt.Logger.INFO,fp16_mode=True,int8_calib_dataset=calib_dataset,int8_mode=True,max_batch_size=128)
+        test_accuracy = calculate_accuracy(trt_model_calib_int8,test_loader)
+        print(" TRT test accuracy at INT8 PTC: {0}".format(test_accuracy))
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,36 @@
+'''
+Contains basic model definitions 
+'''
+
+import torch 
+import torch.nn as nn
+from utils.utilities import qrelu,qconv2d
+
+class vanilla_cnn(nn.Module):
+    def __init__(self,qat_mode=False,infer=False):
+        super().__init__()
+        self.qat = qat_mode
+        self.layer1=qconv2d(3,32,padding=1,qat=qat_mode,infer=infer)
+        self.layer2=qconv2d(32,64,padding=1,qat=qat_mode,infer=infer)
+        self.layer3=qconv2d(64,128,padding=1,qat=qat_mode,infer=infer)
+        self.layer4=qconv2d(128,256,padding=1,qat=qat_mode,infer=infer)
+        self.layer5 = nn.MaxPool2d(kernel_size=2,stride=8)
+        self.fcs = nn.Sequential(
+                nn.Linear(4096,1024),
+                nn.ReLU(),
+                nn.Linear(1024,512),
+                nn.ReLU(),
+                nn.Linear(512,10))
+
+    def forward(self,x):
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.layer4(x)
+        x = self.layer5(x)
+        x = x.view(x.size(0),-1)
+        x = self.fcs(x)
+        return x
+
+
+