Skip to content

Commit 426d2fa

Browse files
committed
Update Blog “production-ready-object-detection-model-training-workflow-with-hpe-machine-learning-development-environment”
1 parent 299c207 commit 426d2fa

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

content/blog/production-ready-object-detection-model-training-workflow-with-hpe-machine-learning-development-environment.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -236,9 +236,9 @@ The below cell will run a multi-gpu training job. This job will train an object
236236
--lr-steps 16 22 --aspect-ratio-group-factor 3
237237
```
238238

239-
### 1. Object Detection on Satellite Imagery with PyTorch (Single GPU)
239+
### 1. Object detection on satellite imagery with PyTorch (single GPU)
240240

241-
Follow and Run the code to train a Faster RCNN FPN (Resnet50 backbone) that classifies images of clothing.
241+
Follow and run the code to train a Faster RCNN FPN (Resnet50 backbone) that classifies images of clothing.
242242

243243
```python
244244
import sys
@@ -476,17 +476,17 @@ _=model.eval()
476476
_=predict(model,images_t_list,targets_t_list)
477477
```
478478

479-
In the next part of our blog, we scale our model training using distributed training within HPE Machine Learning Development Environment & System.
479+
In the next part of this blog post, I will show you how to scale your model training using using distributed training within HPE Machine Learning Development Environment & System.
480480

481481
# Part 4: Training on HPE Machine Learning Development & System
482482

483-
[HPE Machine Learning Development Environment](https://www.hpe.com/us/en/solutions/artificial-intelligence/machine-learning-development-environment.html) is a training platform software that reduces complexity for ML researchers and helps research teams collaborate. HPE combines this incredibly powerful training platform with best-of-breed hardware and interconnect in [HPE Machine Learning Development System](https://www.hpe.com/us/en/hpe-machine-learning-development-system.html), an AI turnkey solution that we'll be using for the duration of the tutorial.
483+
[HPE Machine Learning Development Environment](https://www.hpe.com/us/en/solutions/artificial-intelligence/machine-learning-development-environment.html) is a training platform software that reduces complexity for ML researchers and helps research teams collaborate. HPE combines this incredibly powerful training platform with best-of-breed hardware and interconnect in [HPE Machine Learning Development System](https://www.hpe.com/us/en/hpe-machine-learning-development-system.html), an AI turnkey solution that will be used for the duration of the tutorial.
484484

485-
This notebook walks you the commands to run the same training as Step 3, but using the HPE Machine Learning Development Environment together with the PyTorchTrial API.
485+
This notebook walks you through the commands to run the same training you did in stepin Step 3, but using the HPE Machine Learning Development Environment together with the PyTorchTrial API.
486486
All the code is configured to run out of the box. The main change is defining a `class ObjectDetectionTrial(PyTorchTrial)` to incorporate the model, optimizer, dataset, and other training loop essentials.
487-
You can view implementation details looking at `determined_files/model_def.py`
487+
You can view implementation details by looking at `determined_files/model_def.py`
488488

489-
We will show you how to:
489+
Here, I will show you how to:
490490

491491
* Run a distributed training experiment
492492
* Run a distributed hyperparameter search
@@ -511,7 +511,7 @@ mkdir /tmp/val_sliced_no_neg
511511
mv val_300_02.json /tmp/val_sliced_no_neg/val_300_02.json
512512
```
513513

514-
*Note that completing this tutorial requires you to upload your dataset from Step 2 into a publically accessible S3 bucket. This will enable for a large scale distributed experiment to have access to the dataset without installing the dataset on device. View [Determined Documentation](<* https://docs.determined.ai/latest/training/load-model-data.html#streaming-from-object-storage>) and [AWS instructions](<* https://codingsight.com/upload-files-to-aws-s3-with-the-aws-cli/>) to learn how to upload your dataset to an S3 bucket. Review the* `S3Backend` class in `data.py`
514+
*Note that completing this tutorial requires you to upload your dataset from Step 2 into a publicly accessible S3 bucket. This will enable for a large scale distributed experiment to have access to the dataset without installing the dataset on device. View [Determined Documentation](<* https://docs.determined.ai/latest/training/load-model-data.html#streaming-from-object-storage>) and [AWS instructions](<* https://codingsight.com/upload-files-to-aws-s3-with-the-aws-cli/>) to learn how to upload your dataset to an S3 bucket. Review the* `S3Backend` class in `data.py`
515515

516516
When you define your S3 bucket and uploaded your dataset, make sure to change the `TARIN_DATA_DIR` in `build_training_data_loader` with the defined path in the S3 bucket.
517517

@@ -544,14 +544,14 @@ def build_training_data_loader(self) -> DataLoader:
544544

545545
## Define environment variable DET_MASTER and login in terminal
546546

547-
Run the below commands in a terminal, and complete logging into the determined cluster by chaning <username> to your username.
547+
Run the below commands in a terminal, and complete logging into the Determined cluster by changing <username> to your username.
548548

549549
* `export DET_MASTER=10.182.1.43`
550550
* `det user login <username>`
551551

552-
## Define Determined Experiment
552+
## Define Determined experiment
553553

554-
In [Determined](https://www.determined.ai/), a *trial* is a training task that consists of a dataset, a deep learning model, and values for all of the model’s hyperparameters. An *experiment* is a collection of one or more trials: an experiment can either train a single model (with a single trial), or can train multiple models via. a hyperparameter sweep a user-defined hyperparameter space.
554+
In [Determined](https://www.determined.ai/), a *trial* is a training task that consists of a dataset, a deep learning model, and values for all of the model’s hyperparameters. An *experiment* is a collection of one or more trials: an experiment can either train a single model (with a single trial), or can train multiple models via a hyperparameter sweep a user-defined hyperparameter space.
555555

556556
Here is what a configuration file looks like for a distributed training experiment.
557557

@@ -627,9 +627,9 @@ Preparing files to send to master... 237.5KB and 36 files
627627
Created experiment 77
628628
```
629629

630-
## Launching a Distributed Hyperparameter Search Experiment
630+
## Launching a distributed hyperparameter search experiment
631631

632-
To implement an automatic hyperparameter tuning experiment, we need to define the hyperparameter space, e.g., by listing the decisions that may impact model performance. We can specify a range of possible values in the experiment configuration for each hyperparameter in the search space.
632+
To implement an automatic hyperparameter tuning experiment, define the hyperparameter space, e.g. by listing the decisions that may impact model performance. You can specify a range of possible values in the experiment configuration for each hyperparameter in the search space.
633633

634634
View the `x.yaml` file that defines a hyperparameter search where we find the model architecture that achieves the best performance on the dataset.
635635

0 commit comments

Comments
 (0)