You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/blog/production-ready-object-detection-model-training-workflow-with-hpe-machine-learning-development-environment.md
+9-15Lines changed: 9 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ tags:
11
11
- machine-learning
12
12
- data-ml-engineer
13
13
---
14
-
This blog will recount the seamless user experience with [HPE Machine Learning Development Environment](https://www.hpe.com/us/en/solutions/artificial-intelligence/machine-learning-development-environment.html), pointing out how easy it is to achieve machine learning at scale with HPE.
14
+
This in-depth blog tutorial is divided into five separate sections, where I will recount the seamless user experience one has when working with [HPE Machine Learning Development Environment](https://www.hpe.com/us/en/solutions/artificial-intelligence/machine-learning-development-environment.html), pointing out how easy it is to achieve machine learning at scale with HPE.
15
15
16
16
Over the five parts of this blog, we’re going to review end-to-end training of an object detection model using NVIDIA’s PyTorch Container from [NVIDIA's NGC Catalog](https://www.nvidia.com/en-us/gpu-cloud/), a Jupyter Notebook, the open-source training platform from [Determined AI](http://www.determined.ai/), and [Kserve](https://www.kubeflow.org/docs/external-add-ons/kserve/kserve/) to deploy the model into production.
17
17
@@ -184,10 +184,7 @@ Here we are using the SAHI library to slice our large satellite images. Satellit
184
184
## 4. Upload to s3 bucket to support distributed training
185
185
186
186
We will now upload our exported data to a publically accessible S3 bucket. This will enable for a large scale distributed experiment to have access to the dataset without installing the dataset on device.
187
-
View these links to learn how to upload your dataset to an S3 bucket. Review the `S3Backend` class in `data.py`
View [Determined Documentation](<* https://docs.determined.ai/latest/training/load-model-data.html#streaming-from-object-storage>) and [AWS instructions](<* https://codingsight.com/upload-files-to-aws-s3-with-the-aws-cli/>) to learn how to upload your dataset to an S3 bucket. Review the `S3Backend` class in `data.py`
191
188
192
189
Once you create an S3 bucket that is publically accessible, here are example commands to upload the preprocessed dataset to S3:
193
190
@@ -215,7 +212,7 @@ Let's get started!
215
212
216
213
## Execute docker run to create NGC environment for Data Prep
217
214
218
-
make sure to map host directory to docker directory, we will use the host directory again to
215
+
Make sure to map host directory to docker directory, we will use the host directory again to
219
216
220
217
*`docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v /home/ubuntu:/home/ubuntu -p 8008:8888 -it nvcr.io/nvidia/pytorch:21.11-py3 /bin/bash`
*Note that completing this tutorial requires you to upload your dataset from Step 2 into a publically accessible S3 bucket. This will enable for a large scale distributed experiment to have access to the dataset without installing the dataset on device. View these links to learn how to upload your dataset to an S3 bucket. Review the `S3Backend` class in `data.py`.*
*Note that completing this tutorial requires you to upload your dataset from Step 2 into a publically accessible S3 bucket. This will enable for a large scale distributed experiment to have access to the dataset without installing the dataset on device. View [Determined Documentation](<* https://docs.determined.ai/latest/training/load-model-data.html#streaming-from-object-storage>) and [AWS instructions](<* https://codingsight.com/upload-files-to-aws-s3-with-the-aws-cli/>) to learn how to upload your dataset to an S3 bucket. Review the*`S3Backend` class in `data.py`
523
517
524
518
When you define your S3 bucket and uploaded your dataset, make sure to change the `TARIN_DATA_DIR` in `build_training_data_loader` with the defined path in the S3 bucket.
525
519
@@ -559,7 +553,7 @@ Run the below commands in a terminal, and complete logging into the determined c
559
553
560
554
## Define Determined Experiment
561
555
562
-
In [Determined](www.determined.ai), a *trial* is a training task that consists of a dataset, a deep learning model, and values for all of the model’s hyperparameters. An *experiment* is a collection of one or more trials: an experiment can either train a single model (with a single trial), or can train multiple models via. a hyperparameter sweep a user-defined hyperparameter space.
556
+
In [Determined](https://www.determined.ai/), a *trial* is a training task that consists of a dataset, a deep learning model, and values for all of the model’s hyperparameters. An *experiment* is a collection of one or more trials: an experiment can either train a single model (with a single trial), or can train multiple models via. a hyperparameter sweep a user-defined hyperparameter space.
563
557
564
558
Here is what a configuration file looks like for a distributed training experiment.
565
559
@@ -752,7 +746,7 @@ Let's get started!
752
746
753
747
Run the below commands to set up a python virtual environment, and install all the python packages needed for this tutorial
Run the below command to export the Pytorch Checkpoint into a .mar file that is required for torchserve inference. Our Kserve InferenceService will automatically deploy a Pod with a docker image that support TorchServe inferencing.
0 commit comments