Updated docs and setup.py for publishing wheel to PyPI (#214)

Ashwin Ramesh · web-flow · commit 7cd311f8b34c · 2021-09-15T17:01:16.000-05:00
diff --git a/docs/install.md b/docs/install.md
@@ -16,138 +16,144 @@ limitations under the License.
 
 # Installation
 
-There are three ways to use Triton Model Analyzer:
-
-1. The recommended way to use Model Analyzer is with the Triton SDK docker
-   container available on the [NVIDIA GPU Cloud
-   Catalog](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver). You
-   can pull and run the SDK container with the following commands:
-
-   ```
-   $ docker pull nvcr.io/nvidia/tritonserver:21.08-py3-sdk
-   ```
-
-   If you are not planning to run Model Analyzer with
-   `--triton-launch-mode=docker` you can run the container with the following
-   command:
-
-   ```
-   $ docker run -it --gpus all --net=host nvcr.io/nvidia/tritonserver:21.08-py3-sdk
-   ```
-
-   If you intend to use `--triton-launch-mode=docker`, you will need to mount
-   the following: 
-      * `-v /var/run/docker.sock:/var/run/docker.sock` allows running docker
-        containers as sibling containers from inside the Triton SDK container.
-        Model Analyzer will require this if run  with
-        `--triton-launch-mode=docker`.
-      * `-v <path-to-output-model-repo>:<path-to-output-model-repo>` The
-        ***absolute*** path to the directory where the output model repository
-        will be located (i.e. parent directory of the output model repository).
-        This is so that the launched Triton container has access to the model
-        config variants that Model Analyzer creates.
-
-   ```
-   $ docker run -it --gpus all \
-        -v /var/run/docker.sock:/var/run/docker.sock \
-        -v <path-to-output-model-repo>:<path-to-output-model-repo> \
-        --net=host nvcr.io/nvidia/tritonserver:21.08-py3-sdk
-   ```
-
-   Model Analyzer uses `pdfkit` for report generation. If you are running Model
-   Analyzer inside the Triton SDK container, then you will need to download
-   `wkhtmltopdf`.
-
-   ```
-   $ sudo apt-get update && sudo apt-get install wkhtmltopdf
-   ```
-
-   Once you do this, Model Analyzer will able to use `pdfkit` to generate
-   reports.
-
-2. Building the Dockerfile:
-
-   You can also build the Model Analyzer's dockerfile yourself. First, clone the
-   Model Analyzer's git repository, then build the docker image.
-
-   ```
-   $ git clone https://github.com/triton-inference-server/model_analyzer
-   $ docker build --pull -t model-analyzer .
-   ```
-
-   The above command will pull all the containers that model analyzer needs to
-   run. The Model Analyzer's Dockerfile bases the container on the latest
-   `tritonserver` containers from NGC. Now you can run the container with:
-
-   ```
-   $ docker run -it --rm --gpus all \
-        -v /var/run/docker.sock:/var/run/docker.sock \
-        -v <path-to-triton-model-repository>:<path-to-triton-model-repository> \
-        -v <path-to-output-model-repo>:<path-to-output-model-repo> \
-        --net=host model-analyzer
-
-   root@hostname:/opt/triton-model-analyzer# 
-   ```
-
-3. Using `pip3`:
-
-   You can install pip using:
-   ```
-   $ sudo apt-get update && sudo apt-get install python3-pip
-   ```
-
-   Model analyzer can be installed with: 
-   ```
-   $ pip3 install nvidia-pyindex
-   $ pip3 install triton-model-analyzer
-   ```
-
-   If you encounter any errors installing dependencies like `numba`, make sure
-   that you have the latest version of `pip` using:
-
-   ```
-   $ pip3 install --upgrade pip
-   ```
-
-   You can then try installing model analyzer again.
-
-   If you are using this approach you need to install DCGM on your machine.
-
-   For installing DCGM on Ubuntu 20.04 you can use the following commands:
-   ```
-   $ export DCGM_VERSION=2.0.13
-   $ wget -q https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb && \
-    dpkg -i datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb
-   ```
-
-4. Building from source:
-
-   To build model analyzer form source, you'll need to install the same
-   dependencies (tritonclient and DCGM) mentioned in the "Using pip section".
-   After that, you can use the following commands:
-
-   ```
-   $ git clone https://github.com/triton-inference-server/model_analyzer
-   $ cd model_analyzer
-   $ ./build_wheel.sh <path to perf_analyzer> true
-   ```
-
-   In the final command above we are building the triton-model-analyzer wheel.
-   You will need to provide the `build_wheel.sh` script with two arguments. The
-   first is the path to the `perf_analyzer` binary that you would like Model
-   Analyzer to use. The second is whether you want this wheel to be linux
-   specific. Currently, this argument must be set to `true` as perf analyzer is
-   supported only on linux. This will create a wheel file in the `wheels`
-   directory named
-   `triton-model-analyzer-<version>-py3-none-manylinux1_x86_64.whl`. We can now
-   install this with:
-
-   ```
-   $ pip3 install wheels/triton-model-analyzer-*.whl
-   ```
-
-   After these steps, `model-analyzer` executable should be available in
-   `$PATH`.
+There are 4 ways to use Triton Model Analyzer:
+
+## Triton SDK Container
+
+The recommended way to use Model Analyzer is with the Triton SDK docker
+container available on the [NVIDIA GPU Cloud
+Catalog](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver). You can
+pull and run the SDK container with the following commands:
+
+```
+$ docker pull nvcr.io/nvidia/tritonserver:21.08-py3-sdk
+```
+
+If you are not planning to run Model Analyzer with
+`--triton-launch-mode=docker`, You can run the SDK container with the following
+command: 
+
+```
+$ docker run -it --gpus all --net=host nvcr.io/nvidia/tritonserver:21.08-py3-sdk
+```
+
+You will need to build and install the Triton server binary inside the SDK
+container. See the Triton [Installation
+docs](https://github.com/triton-inference-server/server/blob/main/docs/build.md)
+for more details. 
+
+If you intend to use `--triton-launch-mode=docker`, which is recommended with 
+this method of using Model Analyzer, you will need to mount the
+following: 
+   * `-v /var/run/docker.sock:/var/run/docker.sock` allows running docker
+      containers as sibling containers from inside the Triton SDK container.
+      Model Analyzer will require this if run  with
+      `--triton-launch-mode=docker`.
+   * `-v <path-to-output-model-repo>:<path-to-output-model-repo>` The
+      ***absolute*** path to the directory where the output model repository
+      will be located (i.e. parent directory of the output model repository).
+      This is so that the launched Triton container has access to the model
+      config variants that Model Analyzer creates.
+
+```
+$ docker run -it --gpus all \
+      -v /var/run/docker.sock:/var/run/docker.sock \
+      -v <path-to-output-model-repo>:<path-to-output-model-repo> \
+      --net=host nvcr.io/nvidia/tritonserver:21.08-py3-sdk
+```
+
+Model Analyzer uses `pdfkit` for report generation. If you are running Model
+Analyzer inside the Triton SDK container, then you will need to download
+`wkhtmltopdf`.
+
+```
+$ sudo apt-get update && sudo apt-get install wkhtmltopdf
+```
+
+Once you do this, Model Analyzer will able to use `pdfkit` to generate reports.
+
+## Building the Dockerfile
+
+You can also build the Model Analyzer's dockerfile yourself. First, clone the
+Model Analyzer's git repository, then build the docker image. This is the
+recommended installation method if you mainly intend to use
+`--triton-launch-mode=local`, as all the dependencies will be available.
+
+```
+$ git clone https://github.com/triton-inference-server/model_analyzer
+$ docker build --pull -t model-analyzer .
+```
+
+The above command will pull all the containers that Model Analyzer needs to run.
+The Model Analyzer's Dockerfile bases the container on the latest `tritonserver`
+containers from NGC. Now you can run the container with:
+
+```
+$ docker run -it --rm --gpus all \
+      -v /var/run/docker.sock:/var/run/docker.sock \
+      -v <path-to-triton-model-repository>:<path-to-triton-model-repository> \
+      -v <path-to-output-model-repo>:<path-to-output-model-repo> \
+      --net=host model-analyzer
+
+root@hostname:/opt/triton-model-analyzer# 
+```
+
+## Using `pip3`
+
+You can install pip using:
+```
+$ sudo apt-get update && sudo apt-get install python3-pip
+```
+
+Model analyzer can be installed with: 
+```
+$ pip3 install triton-model-analyzer
+```
+
+If you encounter any errors installing dependencies like `numba`, make sure that
+you have the latest version of `pip` using:
+
+```
+$ pip3 install --upgrade pip
+```
+
+You can then try installing model analyzer again.
+
+If you are using this approach you need to install DCGM on your machine.
+
+For installing DCGM on Ubuntu 20.04 you can use the following commands:
+```
+$ export DCGM_VERSION=2.0.13
+$ wget -q https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb && \
+   dpkg -i datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb
+```
+
+## Building from source
+
+To build model analyzer form source, you'll need to install the same
+dependencies (tritonclient and DCGM) mentioned in the "Using pip section". After
+that, you can use the following commands:
+
+```
+$ git clone https://github.com/triton-inference-server/model_analyzer
+$ cd model_analyzer
+$ ./build_wheel.sh <path to perf_analyzer> true
+```
+
+In the final command above we are building the triton-model-analyzer wheel. You
+will need to provide the `build_wheel.sh` script with two arguments. The first
+is the path to the `perf_analyzer` binary that you would like Model Analyzer to
+use. The second is whether you want this wheel to be linux specific. Currently,
+this argument must be set to `true` as perf analyzer is supported only on linux.
+This will create a wheel file in the `wheels` directory named
+`triton-model-analyzer-<version>-py3-none-manylinux1_x86_64.whl`. We can now
+install this with:
+
+```
+$ pip3 install wheels/triton-model-analyzer-*.whl
+```
+
+After these steps, `model-analyzer` executable should be available in `$PATH`.
 
 **Notes:**
 * Triton Model Analyzer supports all the GPUs supported by the DCGM library. See
diff --git a/docs/metrics.md b/docs/metrics.md
@@ -66,7 +66,7 @@ recorded and aggregated over fixed intervals during a perf analyzer run.
 * `cpu_used_ram`: The total amount of memory used by all CPUs
 * `cpu_available_ram`: The total amount of availble CPU memory.
 
-**Warning**: Collecting CPU metrics might affect model inference metrics such as throughput and latency. By default, CPU metrics are not collected. To collect CPU metrics, set `collect_cpu_metrics` flag to `true`, see [Configuring Model Analyzer](docs/config.md) for details.
+**Warning**: Collecting CPU metrics might affect model inference metrics such as throughput and latency. By default, CPU metrics are not collected. To collect CPU metrics, set `collect_cpu_metrics` flag to `true`, see [Configuring Model Analyzer](./config.md) for details.
 
 ## Additional tags for output headers
 
diff --git a/setup.py b/setup.py
@@ -74,7 +74,11 @@ def get_tag(self):
     author='NVIDIA Inc.',
     author_email='sw-dl-triton@nvidia.com',
     description=
-    "The Model Analyzer is a tool to analyze the runtime performance of one or more models on the Triton Inference Server",
+    "Triton Model Analyzer is a tool to analyze the runtime performance of one or more models on the Triton Inference Server",
+    long_description=
+    """See the Model Analyzer's [installation documentation](https://github.com/triton-inference-server/model_analyzer/blob/main/docs/install.md#using-pip3) """
+    """for package details. The [quick start](https://github.com/triton-inference-server/model_analyzer/blob/main/docs/quick_start.md) documentation """
+    """describes how to get started with profiling and analysis using Triton Model Analyzer."""
     license='BSD',
     url='https://developer.nvidia.com/nvidia-triton-inference-server',
     keywords=[