Skip to content

Commit 0f207b6

Browse files
Merge branch 'GoogleCloudPlatform:master' into master
2 parents 9387cab + 40c1505 commit 0f207b6

File tree

25 files changed

+488
-86
lines changed

25 files changed

+488
-86
lines changed

continuous_training/kubeflow/labs/multiple_frameworks_lab.ipynb

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1022,6 +1022,38 @@
10221022
"!head census_training_pipeline.yaml"
10231023
]
10241024
},
1025+
{
1026+
"cell_type": "markdown",
1027+
"metadata": {},
1028+
"source": [
1029+
"#### Set the command fields in the pipeline YAML"
1030+
]
1031+
},
1032+
{
1033+
"cell_type": "code",
1034+
"execution_count": null,
1035+
"metadata": {},
1036+
"outputs": [],
1037+
"source": [
1038+
"!sed -i 's/\\\"command\\\": \\[\\]/\\\"command\\\": \\[python, -u, -m, kfp_component.launcher\\]/g' census_training_pipeline.yaml"
1039+
]
1040+
},
1041+
{
1042+
"cell_type": "code",
1043+
"execution_count": null,
1044+
"metadata": {},
1045+
"outputs": [],
1046+
"source": [
1047+
"!cat census_training_pipeline.yaml | grep \"component.launcher\""
1048+
]
1049+
},
1050+
{
1051+
"cell_type": "markdown",
1052+
"metadata": {},
1053+
"source": [
1054+
"You should see 6 lines in the output that were modified by the sed command."
1055+
]
1056+
},
10251057
{
10261058
"cell_type": "markdown",
10271059
"metadata": {},
@@ -1064,6 +1096,11 @@
10641096
"* For **Run Type** select **One-Off**\n",
10651097
"* Enter your **Project ID** and hit **Start**. You can now monitor your pipeline run in the UI (it will take about **10 minutes** to complete the run).\n",
10661098
"\n",
1099+
"\n",
1100+
"\n",
1101+
"**NOTE that your pipeline run may fail due to the bug in a BigQuery component that does not handle certain race conditions. If you observe the pipeline failure, submit another pipeline run using the KFP UI using the steps above**\n",
1102+
"\n",
1103+
"\n",
10671104
"Now let's set up a recurring run:\n",
10681105
"* Select **Pipelines** then **census_trainer_multiple_models** and click **Create Run**\n",
10691106
"* In the **Experiement** field select **Default**\n",

continuous_training/kubeflow/solutions/multiple_frameworks_kubeflow.ipynb

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1020,6 +1020,38 @@
10201020
"!head census_training_pipeline.yaml"
10211021
]
10221022
},
1023+
{
1024+
"cell_type": "markdown",
1025+
"metadata": {},
1026+
"source": [
1027+
"#### Set the command fields in the pipeline YAML"
1028+
]
1029+
},
1030+
{
1031+
"cell_type": "code",
1032+
"execution_count": null,
1033+
"metadata": {},
1034+
"outputs": [],
1035+
"source": [
1036+
"!sed -i 's/\\\"command\\\": \\[\\]/\\\"command\\\": \\[python, -u, -m, kfp_component.launcher\\]/g' census_training_pipeline.yaml"
1037+
]
1038+
},
1039+
{
1040+
"cell_type": "code",
1041+
"execution_count": null,
1042+
"metadata": {},
1043+
"outputs": [],
1044+
"source": [
1045+
"!cat census_training_pipeline.yaml | grep \"component.launcher\""
1046+
]
1047+
},
1048+
{
1049+
"cell_type": "markdown",
1050+
"metadata": {},
1051+
"source": [
1052+
"You should see 6 lines in the output that were modified by the sed command."
1053+
]
1054+
},
10231055
{
10241056
"cell_type": "markdown",
10251057
"metadata": {},
@@ -1062,6 +1094,10 @@
10621094
"* For **Run Type** select **One-Off**\n",
10631095
"* Enter your **Project ID** and hit **Start**. You can now monitor your pipeline run in the UI (it will take about **10 minutes** to complete the run).\n",
10641096
"\n",
1097+
"\n",
1098+
"**NOTE that your pipeline run may fail due to the bug in a BigQuery component that does not handle certain race conditions. If you observe the pipeline failure, submit another pipeline run using the KFP UI using the steps above**\n",
1099+
"\n",
1100+
"\n",
10651101
"Now let's set up a recurring run:\n",
10661102
"* Select **Pipelines** then census_trainer_multiple_models and click **Create Run**\n",
10671103
"* In the **Experiement** field select **Default**\n",

model_serving/caip-load-testing/01-prepare-and-deploy.ipynb

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
"source": [
2323
"## Setup\n",
2424
"\n",
25-
"This Notebook was tested on **AI Platform Notebooks** using the standard TF 2.2 image."
25+
"This Notebook was tested on **AI Platform Notebooks** using the standard TF 2.8 image."
2626
]
2727
},
2828
{
@@ -72,9 +72,7 @@
7272
"GCS_MODEL_LOCATION = 'gs://{}/models/{}/{}'.format(BUCKET, MODEL_NAME, MODEL_VERSION)\n",
7373
"THUB_MODEL_HANDLE = 'https://tfhub.dev/google/imagenet/resnet_v2_101/classification/4'\n",
7474
"IMAGENET_LABELS_URL = 'https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt'\n",
75-
"IMAGES_FOLDER = 'test_images'\n",
76-
"\n",
77-
"!gcloud config set project $PROJECT_ID"
75+
"IMAGES_FOLDER = 'test_images'"
7876
]
7977
},
8078
{
@@ -550,7 +548,7 @@
550548
"source": [
551549
"!gcloud ai-platform models create {MODEL_NAME} \\\n",
552550
" --project {PROJECT_ID} \\\n",
553-
" --regions {REGION}"
551+
" --region {REGION}"
554552
]
555553
},
556554
{
@@ -559,7 +557,9 @@
559557
"metadata": {},
560558
"outputs": [],
561559
"source": [
562-
"!gcloud ai-platform models list --project {PROJECT_ID} "
560+
"!gcloud ai-platform models list \\\n",
561+
" --project {PROJECT_ID} \\\n",
562+
" --region {REGION}"
563563
]
564564
},
565565
{
@@ -581,12 +581,13 @@
581581
"!gcloud beta ai-platform versions create {MODEL_VERSION} \\\n",
582582
" --model={MODEL_NAME} \\\n",
583583
" --origin={GCS_MODEL_LOCATION} \\\n",
584-
" --runtime-version=2.1 \\\n",
584+
" --runtime-version=2.8 \\\n",
585585
" --framework=TENSORFLOW \\\n",
586586
" --python-version=3.7 \\\n",
587587
" --machine-type={MACHINE_TYPE} \\\n",
588588
" --accelerator={ACCELERATOR} \\\n",
589-
" --project={PROJECT_ID}"
589+
" --project={PROJECT_ID} \\\n",
590+
" --region={REGION}"
590591
]
591592
},
592593
{
@@ -595,7 +596,8 @@
595596
"metadata": {},
596597
"outputs": [],
597598
"source": [
598-
"!gcloud ai-platform versions list --model={MODEL_NAME} --project={PROJECT_ID}"
599+
"!gcloud ai-platform versions list \\\n",
600+
" --model={MODEL_NAME} --project={PROJECT_ID} --region={REGION}"
599601
]
600602
},
601603
{
@@ -612,8 +614,14 @@
612614
"outputs": [],
613615
"source": [
614616
"import googleapiclient.discovery\n",
615-
"\n",
616-
"service = googleapiclient.discovery.build('ml', 'v1')\n",
617+
"from google.api_core.client_options import ClientOptions\n",
618+
"\n",
619+
"prefix = '{}-ml'.format(REGION) if REGION else 'ml'\n",
620+
"api_endpoint = 'https://{}.googleapis.com'.format(prefix)\n",
621+
"client_options = ClientOptions(api_endpoint=api_endpoint)\n",
622+
"service = googleapiclient.discovery.build('ml', 'v1',\n",
623+
" cache_discovery=False,\n",
624+
" client_options=client_options)\n",
617625
"name = 'projects/{}/models/{}/versions/{}'.format(PROJECT_ID, MODEL_NAME, MODEL_VERSION)\n",
618626
"print(\"Service name: {}\".format(name))\n",
619627
"\n",

model_serving/caip-load-testing/02-perf-testing.ipynb

Lines changed: 34 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,16 @@
1919
"metadata": {},
2020
"source": [
2121
"## Setup\n",
22-
"This notebook was tested on **AI Platform Notebooks** using the standard TF 2.2 image."
22+
"This notebook was tested on **AI Platform Notebooks** using the standard TF 2.8 image."
2323
]
2424
},
2525
{
2626
"cell_type": "markdown",
2727
"metadata": {},
2828
"source": [
29-
"### Install required packages"
29+
"### Install required packages\n",
30+
"\n",
31+
"You can safely ignore the dependency errors. Confirm the last message starting from \"Successfully installed...\""
3032
]
3133
},
3234
{
@@ -35,7 +37,10 @@
3537
"metadata": {},
3638
"outputs": [],
3739
"source": [
38-
"%pip install -q -U locust google-cloud-monitoring google-cloud-logging google-cloud-monitoring-dashboards"
40+
"!pip install --user locust==2.11.1\\\n",
41+
" google-cloud-monitoring==2.11.1\\\n",
42+
" google-cloud-logging==3.2.2\\\n",
43+
" google-cloud-monitoring-dashboards==2.7.2"
3944
]
4045
},
4146
{
@@ -80,11 +85,11 @@
8085
"from google.api_core.exceptions import GoogleAPICallError \n",
8186
"\n",
8287
"from google.cloud import logging_v2\n",
83-
"from google.cloud.logging_v2 import MetricsServiceV2Client\n",
84-
"from google.cloud.logging_v2 import LoggingServiceV2Client\n",
88+
"from google.cloud.logging_v2.services.metrics_service_v2 import MetricsServiceV2Client\n",
89+
"from google.cloud.logging_v2.services.logging_service_v2 import LoggingServiceV2Client\n",
8590
"\n",
86-
"from google.cloud.monitoring_dashboard.v1.types import Dashboard\n",
87-
"from google.cloud.monitoring_dashboard.v1 import DashboardsServiceClient\n",
91+
"from google.cloud.monitoring_dashboard_v1.types import Dashboard\n",
92+
"from google.cloud.monitoring_dashboard_v1 import DashboardsServiceClient\n",
8893
"from google.cloud.monitoring_v3 import MetricServiceClient\n",
8994
"from google.cloud.monitoring_v3.query import Query\n",
9095
"from google.cloud.monitoring_v3.types import TimeInterval\n",
@@ -160,7 +165,7 @@
160165
" value_field:str, \n",
161166
" bucket_bounds:List[int]):\n",
162167
" \n",
163-
" metric_path = logging_client.metric_path(PROJECT_ID, metric_name)\n",
168+
" metric_path = logging_client.log_metric_path(PROJECT_ID, metric_name)\n",
164169
" log_entry_filter = 'resource.type=global AND logName={}'.format(log_path)\n",
165170
" \n",
166171
" metric_descriptor = {\n",
@@ -203,7 +208,11 @@
203208
" logging_client.get_log_metric(metric_path)\n",
204209
" print('Metric: {} already exists'.format(metric_path))\n",
205210
" except:\n",
206-
" logging_client.create_log_metric(parent, metric)\n",
211+
" request = logging_v2.types.logging_metrics.CreateLogMetricRequest(\n",
212+
" parent=parent,\n",
213+
" metric=metric,\n",
214+
" )\n",
215+
" logging_client.create_log_metric(request)\n",
207216
" print('Created metric {}'.format(metric_path))"
208217
]
209218
},
@@ -225,7 +234,7 @@
225234
"creds , _ = google.auth.default()\n",
226235
"logging_client = MetricsServiceV2Client(credentials=creds)\n",
227236
"\n",
228-
"parent = logging_client.project_path(PROJECT_ID)\n",
237+
"parent = logging_client.common_project_path(PROJECT_ID)\n",
229238
"log_path = LoggingServiceV2Client.log_path(PROJECT_ID, log_name)"
230239
]
231240
},
@@ -284,12 +293,13 @@
284293
"metadata": {},
285294
"outputs": [],
286295
"source": [
287-
"metrics = logging_client.list_log_metrics(parent)\n",
296+
"request = {'parent': parent}\n",
297+
"metrics = logging_client.list_log_metrics(request)\n",
288298
"\n",
289299
"if not list(metrics):\n",
290300
" print(\"There are not any log based metrics defined in the the project\")\n",
291301
"else:\n",
292-
" for element in logging_client.list_log_metrics(parent):\n",
302+
" for element in logging_client.list_log_metrics(request):\n",
293303
" print(element.metric_descriptor.name)"
294304
]
295305
},
@@ -337,8 +347,12 @@
337347
"outputs": [],
338348
"source": [
339349
"dashboard_proto = Dashboard()\n",
340-
"dashboard_proto = ParseDict(dashboard_template, dashboard_proto)\n",
341-
"dashboard = dashboard_service_client.create_dashboard(parent, dashboard_proto)"
350+
"request = {\n",
351+
" 'parent': parent,\n",
352+
" 'dashboard': dashboard_proto,\n",
353+
"}\n",
354+
"dashboard_proto = ParseDict(dashboard_template, dashboard_proto._pb)\n",
355+
"dashboard = dashboard_service_client.create_dashboard(request)"
342356
]
343357
},
344358
{
@@ -347,7 +361,7 @@
347361
"metadata": {},
348362
"outputs": [],
349363
"source": [
350-
"for dashboard in dashboard_service_client.list_dashboards(parent):\n",
364+
"for dashboard in dashboard_service_client.list_dashboards({'parent': parent}):\n",
351365
" print('Dashboard name: {}, Dashboard ID: {}'.format(dashboard.display_name, dashboard.name))"
352366
]
353367
},
@@ -357,7 +371,7 @@
357371
"source": [
358372
"## 3. Deploying Locust to a GKE cluster\n",
359373
"\n",
360-
"Before proceeding, you need access to a GKE cluster. The described deployment process can deploy Locust to any GKE cluster as long as there are enough compute resources to support your Locust configuration. The default configuration follows the Locust's best practices and requests one processor core and 4Gi of memory for the Locust master and one processor core and 2Gi of memory for each Locust worker. As you run your tests, it is important to monitor the the master and the workers for resource utilization and fine tune the allocated resources as required.\n",
374+
"Before proceeding, you need access to a GKE cluster. You can find a command to create a GKE cluster in [Environment setup](https://github.com/GoogleCloudPlatform/mlops-on-gcp/blob/master/model_serving/caip-load-testing/README.md#environment-setup) section of [README.md](https://github.com/GoogleCloudPlatform/mlops-on-gcp/blob/master/model_serving/caip-load-testing/README.md). The described deployment process can deploy Locust to any GKE cluster as long as there are enough compute resources to support your Locust configuration. The default configuration follows the Locust's best practices and requests one processor core and 4Gi of memory for the Locust master and one processor core and 2Gi of memory for each Locust worker. As you run your tests, it is important to monitor the the master and the workers for resource utilization and fine tune the allocated resources as required.\n",
361375
"\n",
362376
"The deployment process has been streamlined using [Kustomize](https://kustomize.io/). As described in the following steps, you can fine tune the baseline configuration by modifying the default `kustomization.yaml` and `patch.yaml` files in the `locust/manifests` folder.\n",
363377
"\n"
@@ -623,10 +637,10 @@
623637
"source": [
624638
"You can try using the following parameter configurations:\n",
625639
"1. Number of total users to simulate: 152\n",
626-
"2. Hatch rate: 1\n",
627-
"3. Host: http://ml.googleapis.com\n",
628-
"4. Number of users to increase by step: 8\n",
629-
"5. Step duration: 1m "
640+
"2. Spawn rate: 1\n",
641+
"3. Host: `http://[your-region]-ml.googleapis.com`\n",
642+
"\n",
643+
"**NOTE**: `[your-region]` is the region for deploying the model that you configured as `REGION` in the first notebook. "
630644
]
631645
},
632646
{

0 commit comments

Comments
 (0)