Skip to content

Commit 6cbcdc5

Browse files
missing syntax type
1 parent 3c43f65 commit 6cbcdc5

File tree

1 file changed

+23
-23
lines changed
  • tutorials/create-serverless-scraping

1 file changed

+23
-23
lines changed

tutorials/create-serverless-scraping/index.mdx

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ We start by creating the scraper program, or the "data producer".
4747

4848
SQS credentials and queue URL are read by the function from environment variables. Those variables are set by Terraform as explained in [one of the next sections](#create-a-terraform-file-to-provision-the-necessary-scaleway-resources). *If you choose another deployment method, such as the [console](https://console.scaleway.com/), do not forget to set them.*
4949
```python
50-
queue_url = os.getenv('QUEUE_URL')
50+
queue_url = os.getenv('QUEUE_URL')
5151
sqs_access_key = os.getenv('SQS_ACCESS_KEY')
5252
sqs_secret_access_key = os.getenv('SQS_SECRET_ACCESS_KEY')
5353
```
@@ -65,10 +65,10 @@ We start by creating the scraper program, or the "data producer".
6565
Using the AWS python sdk `boto3`, connect to the SQS queue and push the `title` and `url` of articles published less than 15 minutes ago.
6666
```python
6767
sqs = boto3.client(
68-
'sqs',
69-
endpoint_url=SCW_SQS_URL,
70-
aws_access_key_id=sqs_access_key,
71-
aws_secret_access_key=sqs_secret_access_key,
68+
'sqs',
69+
endpoint_url=SCW_SQS_URL,
70+
aws_access_key_id=sqs_access_key,
71+
aws_secret_access_key=sqs_secret_access_key,
7272
region_name='fr-par')
7373

7474
for age, titleline in zip(ages, titlelines):
@@ -117,7 +117,7 @@ Next, let's create our consumer function. When receiving a message containing th
117117
Lastly, we write the information into the database. *To keep the whole process completely automatic the* `CREATE_TABLE_IF_NOT_EXISTS` *query is run each time. If you integrate the functions into an existing database, there is no need for it.*
118118
```python
119119
conn = None
120-
try:
120+
try:
121121
conn = pg8000.native.Connection(host=db_host, database=db_name, port=db_port, user=db_user, password=db_password, timeout=15)
122122

123123
conn.run(CREATE_TABLE_IF_NOT_EXISTS)
@@ -136,7 +136,7 @@ As explained in the [Scaleway Functions documentation](/serverless/functions/how
136136

137137
## Create a Terraform file to provision the necessary Scaleway resources
138138

139-
For the purposes of this tutorial, we show how to provision all resources via Terraform.
139+
For the purposes of this tutorial, we show how to provision all resources via Terraform.
140140

141141
<Message type="tip">
142142
If you do not want to use Terraform, you can also create the required resources via the [console](https://console.scaleway.com/), the [Scaleway API](https://www.scaleway.com/en/developers/api/), or any other [developer tool](https://www.scaleway.com/en/developers/). Remember that if you do so, you will need to set up environment variables for functions as previously specified. The following documentation may help create the required resources:
@@ -149,7 +149,7 @@ If you do not want to use Terraform, you can also create the required resources
149149
1. Create a directory called `terraform` (at the same level as the `scraper` and `consumer` directories created in the previous steps).
150150
2. Inside it, create a file called `main.tf`.
151151
3. In the file you just created, add the code below to set up the [Scaleway Terraform provider](https://registry.terraform.io/providers/scaleway/scaleway/latest/docs) and your Project:
152-
```
152+
```hcl
153153
terraform {
154154
required_providers {
155155
scaleway = {
@@ -167,7 +167,7 @@ If you do not want to use Terraform, you can also create the required resources
167167
}
168168
```
169169
4. Still in the same file, add the code below to provision the SQS resources: SQS activation for the project, separate credentials with appropriate permissions for producer and consumer, and an SQS queue:
170-
```
170+
```hcl
171171
resource "scaleway_mnq_sqs" "main" {
172172
project_id = scaleway_account_project.mnq_tutorial.id
173173
}
@@ -202,7 +202,7 @@ If you do not want to use Terraform, you can also create the required resources
202202
}
203203
```
204204
5. Add the code below to provision the Managed Database for PostgreSQL resources. Note that here we are creating a random password and using it for the default and worker user:
205-
```
205+
```hcl
206206
resource "random_password" "dev_mnq_pg_exporter_password" {
207207
length = 16
208208
special = true
@@ -219,7 +219,7 @@ If you do not want to use Terraform, you can also create the required resources
219219
node_type = "db-dev-s"
220220
engine = "PostgreSQL-15"
221221
is_ha_cluster = false
222-
disable_backup = true
222+
disable_backup = true
223223
user_name = "mnq_initial_user"
224224
password = random_password.dev_mnq_pg_exporter_password.result
225225
}
@@ -240,7 +240,7 @@ If you do not want to use Terraform, you can also create the required resources
240240
}
241241
242242
resource "scaleway_rdb_database" "main" {
243-
instance_id = scaleway_rdb_instance.main.id
243+
instance_id = scaleway_rdb_instance.main.id
244244
name = "hn-database"
245245
}
246246
@@ -252,14 +252,14 @@ If you do not want to use Terraform, you can also create the required resources
252252
}
253253
254254
resource "scaleway_rdb_privilege" "mnq_user_role" {
255-
instance_id = scaleway_rdb_instance.main.id
255+
instance_id = scaleway_rdb_instance.main.id
256256
user_name = scaleway_rdb_user.worker.name
257257
database_name = scaleway_rdb_database.main.name
258258
permission = "all"
259259
}
260260
```
261261
6. Add the code below to provision the functions resources. First, activate the namespace, then locally zip the code and create the functions in the cloud. Note that we are referencing variables from other resources, to completely automate the deployment process:
262-
```
262+
```hcl
263263
locals {
264264
scraper_folder_path = "../scraper"
265265
consumer_folder_path = "../consumer"
@@ -354,17 +354,17 @@ If you do not want to use Terraform, you can also create the required resources
354354
}
355355
}
356356
```
357-
Note that a folder `archives` needs to be created manually if you started from scratch. It is included in the git repository.
358-
7. Add the code below to provision the triggers resources. The cron trigger activates at the minutes `[0, 15, 30, 45]` of every hour. No arguments are passed, but we could do so by specifying them in JSON format in the `args` parameter.
359-
```
357+
Note that a folder `archives` needs to be created manually if you started from scratch. It is included in the git repository.
358+
7. Add the code below to provision the triggers resources. The cron trigger activates at the minutes `[0, 15, 30, 45]` of every hour. No arguments are passed, but we could do so by specifying them in JSON format in the `args` parameter.
359+
```hcl
360360
resource "scaleway_function_cron" "scraper_cron" {
361-
function_id = scaleway_function.scraper.id
361+
function_id = scaleway_function.scraper.id
362362
schedule = "0,15,30,45 * * * *"
363363
args = jsonencode({})
364364
}
365365
366366
resource "scaleway_function_trigger" "consumer_sqs_trigger" {
367-
function_id = scaleway_function.consumer.id
367+
function_id = scaleway_function.consumer.id
368368
name = "hn-sqs-trigger"
369369
sqs {
370370
project_id = scaleway_mnq_sqs.main.project_id
@@ -378,22 +378,22 @@ Terraform makes this very straightforward. To provision all the resources and ge
378378
```
379379
cd terraform
380380
terraform init
381-
terraform plan
381+
terraform plan
382382
terraform apply
383383
```
384384

385385
### How to check that everything is working correctly
386386

387387
Go to the [Scaleway console](https://console.scaleway.com/), and check the logs and metrics for Serverless Functions' execution and Messaging and Queuing SQS queue statistics.
388388

389-
To make sure the data is correctly stored in the database, you can [connect to it directly](/managed-databases/postgresql-and-mysql/how-to/connect-database-instance/) via a CLI tool such as `psql`.
389+
To make sure the data is correctly stored in the database, you can [connect to it directly](/managed-databases/postgresql-and-mysql/how-to/connect-database-instance/) via a CLI tool such as `psql`.
390390
Retrieve the instance IP and port of your Managed Database from the console, under the [Managed Database section](https://console.scaleway.com/rdb/instances).
391391
Use the following command to connect to your database. When prompted for a password, you can find it by running `terraform output -json`.
392392
```
393393
psql -h <DB_INSTANCE_IP> --port <DB_INSTANCE_PORT> -d hn-database -U worker
394394
```
395395

396-
When you are done testing, don't forget to clean up! To do so, run:
396+
When you are done testing, don't forget to clean up! To do so, run:
397397
```
398398
cd terraform
399399
terraform destroy
@@ -405,7 +405,7 @@ We have shown how to asynchronously decouple the producer and the consumer using
405405
While the volume of data processed in this example is quite small, thanks to the Messaging and Queuing SQS queue's robustness and the auto-scaling capabilities of the Serverless Functions, you can adapt this example to manage larger workloads.
406406

407407
Here are some possible extensions to this basic example:
408-
- Replace the simple proposed logic with your own. What about counting how many times some keywords (e.g: copilot, serverless, microservice) appear in Hacker News articles?
408+
- Replace the simple proposed logic with your own. What about counting how many times some keywords (e.g: copilot, serverless, microservice) appear in Hacker News articles?
409409
- Define multiple cron triggers for different websites and pass the website as an argument to the function. Or, create multiple functions that feed the same queue.
410410
- Use a [Serverless Container](/serverless/containers/quickstart/) instead of the consumer function, and use a command line tool such as `htmldoc` or `pandoc` to convert the scraped articles to PDF and upload the result to a [Scaleway Object Storage](/storage/object/quickstart/) S3 bucket.
411411
- Replace the Managed Database for PostgreSQL with a [Scaleway Serverless Database](/serverless/sql-databases/quickstart/), so that all the infrastructure lives in the serverless ecosystem! *Note that at the moment there is no Terraform support for Serverless Database, hence the choice here to use Managed Database for PostgreSQL*.

0 commit comments

Comments
 (0)