Skip to content

Commit e30e042

Browse files
Minwoo Parkcopybara-github
authored andcommitted
Use TensorBoard notebook extension
PiperOrigin-RevId: 683778406
1 parent df55634 commit e30e042

File tree

4 files changed

+76
-47
lines changed

4 files changed

+76
-47
lines changed

notebooks/community/model_garden/model_garden_gemma2_finetuning_on_vertex.ipynb

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -132,10 +132,8 @@
132132
"# @markdown 3. For serving, **[click here](https://console.cloud.google.com/iam-admin/quotas?location=us-central1&metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_l4_gpus)** to check if your project already has the required 1 L4 GPU in the us-central1 region. If yes, then run this notebook in the us-central1 region. If you need more L4 GPUs for your project, then you can follow [these instructions](https://cloud.google.com/docs/quotas/view-manage#viewing_your_quota_console) to request more. Alternatively, if you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus).\n",
133133
"\n",
134134
"# @markdown > | Machine Type | Accelerator Type | Recommended Regions |\n",
135-
"# @markdown | ----------- | ----------- | ----------- | \n",
135+
"# @markdown | ----------- | ----------- | ----------- |\n",
136136
"# @markdown | a2-ultragpu-1g | 1 NVIDIA_A100_80GB | us-central1, us-east4, europe-west4, asia-southeast1, us-east4 |\n",
137-
"# @markdown | a3-highgpu-2g | 2 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
138-
"# @markdown | a3-highgpu-4g | 4 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
139137
"# @markdown | a3-highgpu-8g | 8 NVIDIA_H100_80GB | us-central1, us-west1, europe-west4, asia-southeast1 |\n",
140138
"\n",
141139
"# @markdown 4. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
@@ -180,7 +178,7 @@
180178
"# Cloud Storage bucket for storing the experiment artifacts.\n",
181179
"# A unique GCS bucket will be created for the purpose of this notebook. If you\n",
182180
"# prefer using your own GCS bucket, change the value yourself below.\n",
183-
"now = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
181+
"now = datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
184182
"BUCKET_NAME = \"/\".join(BUCKET_URI.split(\"/\")[:3])\n",
185183
"\n",
186184
"if BUCKET_URI is None or BUCKET_URI.strip() == \"\" or BUCKET_URI == \"gs://\":\n",
@@ -582,14 +580,22 @@
582580
"outputs": [],
583581
"source": [
584582
"# @title Run TensorBoard\n",
585-
"# @markdown This section shows how to launch TensorBoard in a [Cloud Shell](https://cloud.google.com/shell/docs).\n",
586-
"# @markdown 1. Click the Cloud Shell icon(![terminal](https://github.com/google/material-design-icons/blob/master/png/action/terminal/materialicons/24dp/1x/baseline_terminal_black_24dp.png?raw=true)) on the top right to open the Cloud Shell.\n",
587-
"# @markdown 2. Copy the `tensorboard` command shown below by running this cell.\n",
588-
"# @markdown 3. Paste and run the command in the Cloud Shell to launch TensorBoard.\n",
589-
"# @markdown 4. Once the command runs (You may have to click `Authorize` if prompted), click the link starting with `http://localhost`.\n",
590-
"\n",
583+
"# @markdown This section launches TensorBoard and displays it. You can re-run the cell to display an updated information about the training job.\n",
584+
"# @markdown See the link to the training job in the above cell to see the status of the Custom Training Job.\n",
591585
"# @markdown Note: You may need to wait around 10 minutes after the job starts in order for the TensorBoard logs to be written to the GCS bucket.\n",
592-
"print(f\"Command to copy: tensorboard --logdir {base_output_dir}/logs\")\n"
586+
"\n",
587+
"now = datetime.datetime.now(tz=datetime.timezone.utc)\n",
588+
"\n",
589+
"if train_job.end_time is not None:\n",
590+
" min_since_end = int((now - train_job.end_time).total_seconds() // 60)\n",
591+
" print(f\"Training Job finished {min_since_end} minutes ago.\")\n",
592+
"\n",
593+
"if train_job.has_failed:\n",
594+
" print(\n",
595+
" \"The job has failed. See the link to the training job in the above cell to see the logs.\"\n",
596+
" )\n",
597+
"\n",
598+
"%tensorboard --logdir {base_output_dir}/logs"
593599
]
594600
},
595601
{
@@ -819,11 +825,12 @@
819825
"# endpoint = aiplatform.Endpoint(aip_endpoint_name)\n",
820826
"\n",
821827
"prompt = \"What is a car?\" # @param {type: \"string\"}\n",
822-
"# @markdown If you encounter the issue like `ServiceUnavailable: 503 Took too long to respond when processing`, you can reduce the maximum number of output tokens, by lowering `max_tokens`.\n",
828+
"# @markdown If you encounter an issue like `ServiceUnavailable: 503 Took too long to respond when processing`, you can reduce the maximum number of output tokens, by lowering `max_tokens`.\n",
823829
"max_tokens = 50 # @param {type:\"integer\"}\n",
824830
"temperature = 1.0 # @param {type:\"number\"}\n",
825831
"top_p = 1.0 # @param {type:\"number\"}\n",
826832
"top_k = 1 # @param {type:\"integer\"}\n",
833+
"# @markdown Set `raw_response` to `True` to obtain the raw model output. Set `raw_response` to `False` to apply additional formatting in the structure of `\"Prompt:\\n{prompt.strip()}\\nOutput:\\n{output}\"`.\n",
827834
"raw_response = False # @param {type:\"boolean\"}\n",
828835
"\n",
829836
"# Overrides parameters for inferences.\n",

notebooks/community/model_garden/model_garden_pytorch_llama3_1_finetuning.ipynb

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -129,10 +129,8 @@
129129
"# @markdown 3. For serving, **[click here](https://console.cloud.google.com/iam-admin/quotas?location=us-central1&metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_l4_gpus)** to check if your project already has the required 1 L4 GPU in the us-central1 region. If yes, then run this notebook in the us-central1 region. If you need more L4 GPUs for your project, then you can follow [these instructions](https://cloud.google.com/docs/quotas/view-manage#viewing_your_quota_console) to request more. Alternatively, if you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus).\n",
130130
"\n",
131131
"# @markdown > | Machine Type | Accelerator Type | Recommended Regions |\n",
132-
"# @markdown | ----------- | ----------- | ----------- | \n",
132+
"# @markdown | ----------- | ----------- | ----------- |\n",
133133
"# @markdown | a2-ultragpu-1g | 1 NVIDIA_A100_80GB | us-central1, us-east4, europe-west4, asia-southeast1, us-east4 |\n",
134-
"# @markdown | a3-highgpu-2g | 2 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
135-
"# @markdown | a3-highgpu-4g | 4 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
136134
"# @markdown | a3-highgpu-8g | 8 NVIDIA_H100_80GB | us-central1, us-west1, europe-west4, asia-southeast1 |\n",
137135
"\n",
138136
"# @markdown 4. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
@@ -177,7 +175,7 @@
177175
"# Cloud Storage bucket for storing the experiment artifacts.\n",
178176
"# A unique GCS bucket will be created for the purpose of this notebook. If you\n",
179177
"# prefer using your own GCS bucket, change the value yourself below.\n",
180-
"now = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
178+
"now = datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
181179
"BUCKET_NAME = \"/\".join(BUCKET_URI.split(\"/\")[:3])\n",
182180
"\n",
183181
"if BUCKET_URI is None or BUCKET_URI.strip() == \"\" or BUCKET_URI == \"gs://\":\n",
@@ -642,19 +640,28 @@
642640
"outputs": [],
643641
"source": [
644642
"# @title Run TensorBoard\n",
645-
"# @markdown This section shows how to launch TensorBoard in a [Cloud Shell](https://cloud.google.com/shell/docs).\n",
646-
"# @markdown 1. Click the Cloud Shell icon(![terminal](https://github.com/google/material-design-icons/blob/master/png/action/terminal/materialicons/24dp/1x/baseline_terminal_black_24dp.png?raw=true)) on the top right to open the Cloud Shell.\n",
647-
"# @markdown 2. Copy the `tensorboard` command shown below by running this cell.\n",
648-
"# @markdown 3. Paste and run the command in the Cloud Shell to launch TensorBoard.\n",
649-
"# @markdown 4. Once the command runs (You may have to click `Authorize` if prompted), click the link starting with `http://localhost`.\n",
650-
"\n",
643+
"# @markdown This section launches TensorBoard and displays it. You can re-run the cell to display an updated information about the training job.\n",
644+
"# @markdown See the link to the training job in the above cell to see the status of the Custom Training Job.\n",
651645
"# @markdown Note: You may need to wait around 10 minutes after the job starts in order for the TensorBoard logs to be written to the GCS bucket.\n",
652-
"print(f\"Command to copy: tensorboard --logdir {base_output_dir}/logs\")\n"
646+
"\n",
647+
"now = datetime.datetime.now(tz=datetime.timezone.utc)\n",
648+
"\n",
649+
"if train_job.end_time is not None:\n",
650+
" min_since_end = int((now - train_job.end_time).total_seconds() // 60)\n",
651+
" print(f\"Training Job finished {min_since_end} minutes ago.\")\n",
652+
"\n",
653+
"if train_job.has_failed:\n",
654+
" print(\n",
655+
" \"The job has failed. See the link to the training job in the above cell to see the logs.\"\n",
656+
" )\n",
657+
"\n",
658+
"%tensorboard --logdir {base_output_dir}/logs"
653659
]
654660
},
655661
{
656662
"cell_type": "code",
657663
"execution_count": null,
664+
"language": "python",
658665
"metadata": {
659666
"cellView": "form",
660667
"id": "qmHW6m8xG_4U"
@@ -860,6 +867,7 @@
860867
{
861868
"cell_type": "code",
862869
"execution_count": null,
870+
"language": "python",
863871
"metadata": {
864872
"cellView": "form",
865873
"id": "2UYUNn60G_4U"

notebooks/community/model_garden/model_garden_pytorch_mistral_peft_tuning.ipynb

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -126,10 +126,8 @@
126126
"# @markdown 3. For serving, **[click here](https://console.cloud.google.com/iam-admin/quotas?location=us-central1&metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_l4_gpus)** to check if your project already has the required 1 L4 GPU in the us-central1 region. If yes, then run this notebook in the us-central1 region. If you need more L4 GPUs for your project, then you can follow [these instructions](https://cloud.google.com/docs/quotas/view-manage#viewing_your_quota_console) to request more. Alternatively, if you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus).\n",
127127
"\n",
128128
"# @markdown > | Machine Type | Accelerator Type | Recommended Regions |\n",
129-
"# @markdown | ----------- | ----------- | ----------- | \n",
129+
"# @markdown | ----------- | ----------- | ----------- |\n",
130130
"# @markdown | a2-ultragpu-1g | 1 NVIDIA_A100_80GB | us-central1, us-east4, europe-west4, asia-southeast1, us-east4 |\n",
131-
"# @markdown | a3-highgpu-2g | 2 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
132-
"# @markdown | a3-highgpu-4g | 4 NVIDIA_H100_80GB | us-west1, asia-southeast1 |\n",
133131
"# @markdown | a3-highgpu-8g | 8 NVIDIA_H100_80GB | us-central1, us-west1, europe-west4, asia-southeast1 |\n",
134132
"\n",
135133
"# @markdown 4. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
@@ -174,7 +172,7 @@
174172
"# Cloud Storage bucket for storing the experiment artifacts.\n",
175173
"# A unique GCS bucket will be created for the purpose of this notebook. If you\n",
176174
"# prefer using your own GCS bucket, change the value yourself below.\n",
177-
"now = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
175+
"now = datetime.datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
178176
"BUCKET_NAME = \"/\".join(BUCKET_URI.split(\"/\")[:3])\n",
179177
"\n",
180178
"if BUCKET_URI is None or BUCKET_URI.strip() == \"\" or BUCKET_URI == \"gs://\":\n",
@@ -559,14 +557,22 @@
559557
"outputs": [],
560558
"source": [
561559
"# @title Run TensorBoard\n",
562-
"# @markdown This section shows how to launch TensorBoard in a [Cloud Shell](https://cloud.google.com/shell/docs).\n",
563-
"# @markdown 1. Click the Cloud Shell icon(![terminal](https://github.com/google/material-design-icons/blob/master/png/action/terminal/materialicons/24dp/1x/baseline_terminal_black_24dp.png?raw=true)) on the top right to open the Cloud Shell.\n",
564-
"# @markdown 2. Copy the `tensorboard` command shown below by running this cell.\n",
565-
"# @markdown 3. Paste and run the command in the Cloud Shell to launch TensorBoard.\n",
566-
"# @markdown 4. Once the command runs (You may have to click `Authorize` if prompted), click the link starting with `http://localhost`.\n",
567-
"\n",
560+
"# @markdown This section launches TensorBoard and displays it. You can re-run the cell to display an updated information about the training job.\n",
561+
"# @markdown See the link to the training job in the above cell to see the status of the Custom Training Job.\n",
568562
"# @markdown Note: You may need to wait around 10 minutes after the job starts in order for the TensorBoard logs to be written to the GCS bucket.\n",
569-
"print(f\"Command to copy: tensorboard --logdir {base_output_dir}/logs\")\n"
563+
"\n",
564+
"now = datetime.datetime.now(tz=datetime.timezone.utc)\n",
565+
"\n",
566+
"if train_job.end_time is not None:\n",
567+
" min_since_end = int((now - train_job.end_time).total_seconds() // 60)\n",
568+
" print(f\"Training Job finished {min_since_end} minutes ago.\")\n",
569+
"\n",
570+
"if train_job.has_failed:\n",
571+
" print(\n",
572+
" \"The job has failed. See the link to the training job in the above cell to see the logs.\"\n",
573+
" )\n",
574+
"\n",
575+
"%tensorboard --logdir {base_output_dir}/logs"
570576
]
571577
},
572578
{
@@ -777,11 +783,12 @@
777783
"# endpoint = aiplatform.Endpoint(aip_endpoint_name)\n",
778784
"\n",
779785
"prompt = \"What is a car?\" # @param {type: \"string\"}\n",
780-
"# @markdown If you encounter the issue like `ServiceUnavailable: 503 Took too long to respond when processing`, you can reduce the maximum number of output tokens, by lowering `max_tokens`.\n",
786+
"# @markdown If you encounter an issue like `ServiceUnavailable: 503 Took too long to respond when processing`, you can reduce the maximum number of output tokens, by lowering `max_tokens`.\n",
781787
"max_tokens = 50 # @param {type:\"integer\"}\n",
782788
"temperature = 1.0 # @param {type:\"number\"}\n",
783789
"top_p = 1.0 # @param {type:\"number\"}\n",
784790
"top_k = 1 # @param {type:\"integer\"}\n",
791+
"# @markdown Set `raw_response` to `True` to obtain the raw model output. Set `raw_response` to `False` to apply additional formatting in the structure of `\"Prompt:\\n{prompt.strip()}\\nOutput:\\n{output}\"`.\n",
785792
"raw_response = False # @param {type:\"boolean\"}\n",
786793
"\n",
787794
"# Overrides parameters for inferences.\n",

0 commit comments

Comments
 (0)