You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you are building your own services Docker image, this is necessary. If you are only reusing the image provided by Data Commons with no customizations, you can skip this step.
59
59
60
60
`website/deploy/terraform-custom-datacommons/create_artifact_repository.sh` is a convenience script to create a repository in the [Google Artifact Registry](https://cloud.google.com/artifact-registry/docs/overview){: target="_blank"}. The script creates a repository called <code><var>PROJECT_ID</var>-artifacts</code>, where you store uploaded Docker images you build. You will upload a custom image in the subsequent steps.
The project ID may be the same project you are using for all other resources, or it may be a separate one you use for pushing releases.
67
+
The project ID may be the same project you are using for all other resources, or it may be a separate one you use for pushing releases.
68
68
69
69
To verify that the repository is created, go to [https://console.cloud.google.com/artifacts](https://console.cloud.google.com/artifacts){target="_blank"} for your project. You should see the repository in the list.
70
70
71
71
## Configure and run a Terraform deployment {#terraform}
72
72
73
-
We recommend using the Data Commons Terraform scripts to greatly simplify and automate the deployment of all the required GCP services. The scripts are located at [website/deploy/terraform-custom-datacommons](https://github.com/datacommonsorg/website/edit/master/deploy/terraform-custom-datacommons/){: target="_blank"}.
73
+
We recommend using the Data Commons Terraform scripts to greatly simplify and automate the deployment of all the required GCP services. The scripts are located at [website/deploy/terraform-custom-datacommons](https://github.com/datacommonsorg/website/edit/master/deploy/terraform-custom-datacommons/){: target="_blank"}.
74
74
75
75
Terraform provisions and runs all the necessary Cloud Platform services:
76
76
77
-
- Creates a Cloud Storage bucket and top-level folder, which will store your data files. You will upload your input data in the subsequent steps.
77
+
- Creates a Cloud Storage bucket and top-level folder, which will store your data files. You will upload your input data in the subsequent steps.
78
78
- Creates a Cloud SQL MySQL instance, with basic resources, a default database user and a random password.
79
-
- Creates the Data Commons data management container as a Cloud Run job, with basic resources.
79
+
- Creates the Data Commons data management container as a Cloud Run job, with basic resources.
80
80
- Creates a single instance of the Data Commons services container as a Cloud Run service, with basic resources. By default this uses the prebuilt image provided by Data Commons team; you will change this to your custom image in subsequent steps.
81
81
- Stores all secrets (API keys and database passwords) in the [Cloud Secret Manager](https://cloud.google.com/secret-manager/docs/overview){: target="_blank"}.
82
82
- Creates a URL for accessing your service in the browser.
@@ -85,7 +85,7 @@ Follow the steps below to create and run a Terraform deployment.
85
85
86
86
### Configure the Terraform deployment
87
87
88
-
1. From the root directory of the `website` repo, using your favorite editor, copy `deploy/terraform-custom-datacommons/modules/terraform.tfvars.sample` and save it as a new file `deploy/terraform-custom-datacommons/modules/terraform.tfvars`.
88
+
1. From the root directory of the `website` repo, using your favorite editor, copy `deploy/terraform-custom-datacommons/modules/terraform.tfvars.sample` and save it as a new file `deploy/terraform-custom-datacommons/modules/terraform.tfvars`.
89
89
1. Edit the required variables to specify the relevant values. The `namespace` variable allows you uniquely identify the Data Commons deployment, in the case that you decide to set up [multiple instances](#multiple), e.g. development, staging, testing, production, etc. Since this is a development environment, you may want to have a suffix such as `-dev`.
90
90
91
91
{:.no_toc}
@@ -117,12 +117,12 @@ region = "us-east1"
117
117
118
118
### Run the Terraform deployment {#run-terraform}
119
119
120
-
1. Open a terminal and navigate to the `website/deploy/deploy/terraform-custom-datacommons/modules` directory.
120
+
1. Open a terminal and navigate to the `website/deploy/terraform-custom-datacommons/modules` directory.
121
121
1. Initialize Terraform and validate the configuration:
122
122
123
123
```shell
124
124
terraform init
125
-
terraform plan
125
+
terraform plan
126
126
```
127
127
1. Review the plan for any possible configuration errors and fix them if needed.
128
128
1. Deploy the instance:
@@ -137,7 +137,7 @@ region = "us-east1"
137
137
138
138
### Upload data files to Google Cloud Storage
139
139
140
-
By default, the Terraform scripts create a Cloud Storage bucket called <code><var>NAMESPACE</var>-datacommons-data-<var>PROJECT_ID</var></code>, with a top-level folder `input`. You upload your CSV, JSON, and MCF files to this folder. You can create subfolders of `input`, but remember to set `"includeInputSubdirs": true` in `config.json`.
140
+
By default, the Terraform scripts create a Cloud Storage bucket called <code><var>NAMESPACE</var>-datacommons-data-<var>PROJECT_ID</var></code>, with a top-level folder `input`. You upload your CSV, JSON, and MCF files to this folder. You can create subfolders of `input`, but remember to set `"includeInputSubdirs": true` in `config.json`.
141
141
142
142
As you are iterating on changes to the files, you can re-upload them at any time, either overwriting existing files or creating new folders. If you want versioned snapshots, you can create new folders to store them. A simple strategy would be to move the older versions to other folders, and keep the latest versions in `input`, to avoid having to update configuration variables. If you prefer to simply incrementally update, you can simply overwrite files. Creating new versions of files is slower but safer. Overwriting files is faster but riskier.
143
143
@@ -169,11 +169,11 @@ To upload data files:
169
169
170
170
> **Note:** Do not upload the local `datacommons` subdirectory or its files.
171
171
172
-
Once you have uploaded the new data, you must [rerun the data management Cloud Run job](#run-job) and [restart the services Cloud Run service](#start-service).
172
+
Once you have uploaded the new data, you must [rerun the data management Cloud Run job](#run-job) and [restart the services Cloud Run service](#start-service).
173
173
174
174
### Run the data management container {#run-job}
175
175
176
-
By default, the Terraform scripts create and run a Google Run job called <code><var>NAMESPACE</var>-datacommons-data-job</code>. When you run the data management job, it converts CSV (and MCF) data into tables in the Cloud SQL database and generates embeddings in the `output` folder of the Cloud Storage bucket.
176
+
By default, the Terraform scripts create and run a Google Run job called <code><var>NAMESPACE</var>-datacommons-data-job</code>. When you run the data management job, it converts CSV (and MCF) data into tables in the Cloud SQL database and generates embeddings in the `output` folder of the Cloud Storage bucket.
177
177
178
178
Every time you upload new input files to Google Cloud Storage, you will need to rerun the job. You can simply run `terraform apply` again, or use any of the other methods described below.
179
179
@@ -203,7 +203,7 @@ Every time you upload new input files to Google Cloud Storage, you will need to
203
203
</div>
204
204
</div>
205
205
206
-
When it completes, to verify that the data has been loaded correctly, see [Inspect the Cloud SQL database](#inspect-sql). Then [restart the services Cloud Run service](#start-service).
206
+
When it completes, to verify that the data has been loaded correctly, see [Inspect the Cloud SQL database](#inspect-sql). Then [restart the services Cloud Run service](#start-service).
207
207
208
208
{:.no_toc}
209
209
#### (Optional) Run the data management Cloud Run job in schema update mode {#schema-update-mode}
@@ -241,17 +241,17 @@ If you have tried to start a container, and have received a `SQL check failed` e
241
241
242
242
By default, the Terraform scripts create a Cloud SQL instance called <code><var>PROJECT_ID</var>:us-central1:<var>NAMESPACE</var>-datacommons-mysql-instance</code>, with a database named `datacommons`, and a default user with admin permissions called `datacommons`.
243
243
244
-
Before you can inspect the database, you need to retrieve the password created by the Terraform scripts:
244
+
Before you can inspect the database, you need to retrieve the password created by the Terraform scripts:
245
245
246
246
1. Go to <https://console.cloud.google.com/security/secret-manager>{: target="_blank"} for your project and in the list of secrets, select <code><var>NAMESPACE</var>-datacommons-mysql-password</code>.
247
-
1. Click the **Versions** tab, and select **Actions > View secret value**. Record the password.
247
+
1. Click the **Versions** tab, and select **Actions > View secret value**. Record the password.
248
248
249
249
To view the tables:
250
250
251
251
1. Go to [https://console.cloud.google.com/sql/instances](https://console.cloud.google.com/sql/instances){: target="_blank"} for your project.
252
-
1. Select the instance created by the Terraform script.
252
+
1. Select the instance created by the Terraform script.
253
253
1. In the left panel, select **Cloud SQL Studio**.
254
-
1. In the **Sign in to SQL Studio** page, from the **Database** field, select the database created by the Terraform script.
254
+
1. In the **Sign in to SQL Studio** page, from the **Database** field, select the database created by the Terraform script.
255
255
1. In the **User** field, select the user created by the Terraform script.
256
256
1. In the **Password** field, enter the password you have retrieved from the Cloud Secret Manager
257
257
1. In the left Explorer pane that appears, expand the **Databases** icon, your database name, and **Tables**. The table of interest is **observations**. You can see column names and other metadata.
If you don't see any data, go to <ahref="https://console.cloud.google.com/run/jobs"target="_blank">https://console.cloud.google.com/run/jobs</a> for your project, select
263
+
If you don't see any data, go to <ahref="https://console.cloud.google.com/run/jobs"target="_blank">https://console.cloud.google.com/run/jobs</a> for your project, select
264
264
the job you ran in the previous step, and click the **Logs** tab to look for errors.
265
-
265
+
266
266
## Manage your service
267
267
268
268
### Upload a custom Docker image to the Artifact Registry {#upload}
@@ -278,12 +278,12 @@ Any time you make changes to the website and want to deploy your changes to the
278
278
1. When prompted to confirm creating the credentials file, click `Y` to accept.
279
279
1. Create a package from the source image you created in step 1:
280
280
281
-
<pre>docker tag <var>SOURCE_IMAGE_NAME</var>:<var>SOURCE_IMAGE_TAG</var> \
@@ -304,7 +304,7 @@ When the push completes, verify that the container has been uploaded in the Clou
304
304
305
305
### Start/restart the services container {#start-service}
306
306
307
-
By default, the Terraform scripts point the service at the prebuilt Data Commons services image, `gcr.io/datcom-ci/datacommons-services:stable`. If you just want to see the running default website in action with your data, run `terraform apply` again.
307
+
By default, the Terraform scripts point the service at the prebuilt Data Commons services image, `gcr.io/datcom-ci/datacommons-services:stable`. If you just want to see the running default website in action with your data, run `terraform apply` again.
308
308
309
309
If you are using a custom image, which is normally the case, you first need to repoint the service to your own image and then restart the service:
310
310
@@ -339,7 +339,7 @@ You need to restart the services container every time you make changes to the co
339
339
<li>To view the startup status, run the following command:
340
340
<pre>gcloud beta run jobs logs tail <var>SERVICE_NAME</var></pre>
341
341
</li>
342
-
The service name is <code><var>NAMESPACE</var>-datacommons-web-service</code>.
342
+
The service name is <code><var>NAMESPACE</var>-datacommons-web-service</code>.
343
343
The container image URL is the name of the package you created in the previous step.
344
344
</ol>
345
345
</div>
@@ -354,7 +354,7 @@ You need to restart the services container every time you make changes to the co
354
354
The URL for your service is in the form <code>https://<var>NAMESPACE</var>-datacommons-web-service-<var>XXXXX</var>.<var>REGION</var>.run.app</code>. To get the exact URL:
355
355
356
356
1. Go to the <ahref="https://console.cloud.google.com/run/services"target="_blank">https://console.cloud.google.com/run/services</a> page for your project.
357
-
1. From the list of services, click the link the service created by the Terraform script. The app URL appears at the top of the page. If the service is running, the URL will be a clickable link. When you click on it, it should open in in another browser window or tab.
357
+
1. From the list of services, click the link the service created by the Terraform script. The app URL appears at the top of the page. If the service is running, the URL will be a clickable link. When you click on it, it should open in in another browser window or tab.
358
358
359
359
If the link is not clickable and the service is not running, go back to the Console Cloud Run page, click the **Logs** tab and look for errors. Also check the output of your `terraform apply` run.
360
360
@@ -364,7 +364,7 @@ If the link is not clickable and the service is not running, go back to the Cons
364
364
365
365
If you want to continue to use Terraform to deploy changes to your service, do the following:
366
366
1. Add your updated variables in the `terraform.tfvars` file.
367
-
1.[Authenticate to GCP](#gen-creds).
367
+
1.[Authenticate to GCP](#gen-creds).
368
368
1. Run all the Terraform commands as listed in [Run the Terraform deployment](#run-terraform).
369
369
370
370
> **Note:** Whenever you make future updates to your deployments, we recommend always using Terraform to do so. If you use the Cloud Console or gcloud to make updates and try to run Terraform again, it will override any changes you have made outside of Terraform. For options that are available as variables in the Data Commons `variables.tf`, you must sync your `terraform.tfvars` options to the same values you have set outside Terraform before running Terraform commands again. If you use the Cloud Console or gcloud to configure options that are not available as Data Commons variables, you _must not_ run Terraform again.
@@ -373,7 +373,7 @@ If you intend to deploy several Google Cloud instances, see the next section for
If you would like to create multiple Terraform deployments, for example, development, staging, and production, you can easily do so using Terraform Workspaces and multiple `tfvars` configuration files. You can run the deployments in different projects, or run them in the same project using namespaces to keep them separate.
376
+
If you would like to create multiple Terraform deployments, for example, development, staging, and production, you can easily do so using Terraform Workspaces and multiple `tfvars` configuration files. You can run the deployments in different projects, or run them in the same project using namespaces to keep them separate.
0 commit comments