You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_python/13_platform.md
+61-6Lines changed: 61 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,7 +80,9 @@ The file contains a single asynchronous function, `main()`. At the beginning, it
80
80
81
81
Every program that runs on the Apify platform first needs to be packaged as a so-called Actor—a standardized container with designated places for input and output. Crawlee scrapers automatically connect their default dataset to the Actor output, but input must be handled explicitly in the code.
82
82
83
-
We'll now adjust the template so it runs our program for watching prices. As a first step, we'll create a new empty file, `crawler.py`, inside the `warehouse-watchdog/src` directory. Then, we'll fill this file with the [final code](./12_framework.md#logging) from the previous lesson:
We'll now adjust the template so that it runs our program for watching prices. As the first step, we'll create a new empty file, `crawler.py`, inside the `warehouse-watchdog/src` directory. Then, we'll fill this file with final, unchanged code from the previous lesson:
84
86
85
87
```py title=warehouse-watchdog/src/crawler.py
86
88
import asyncio
@@ -95,7 +97,50 @@ async def main():
95
97
context.log.info("Looking for product detail pages")
Now, let's replace the contents of `warehouse-watchdog/src/main.py` with this:
@@ -109,7 +154,7 @@ async def main():
109
154
await crawl()
110
155
```
111
156
112
-
We import our program as a function and await the result inside the Actor block. Unlike the sample scraper, our program doesn't expect any input data, so we can delete the code handling that part.
157
+
We import our scraper as a function and await the result inside the Actor block. Unlike the sample scraper, the one we made in the previous lesson doesn't expect any input data, so we can omit the code that handles that part.
113
158
114
159
Next, we'll change to the `warehouse-watchdog` directory in our terminal and verify that everything works locally before deploying the project to the cloud:
115
160
@@ -203,11 +248,15 @@ Actor build detail https://console.apify.com/actors/a123bCDefghiJkLMN#/builds/0.
203
248
? Do you want to open the Actor detail in your browser? (Y/n)
204
249
```
205
250
206
-
After agreeing to open the Actor details in our browser, assuming we're logged in, we'll see an option to **Start Actor**. Clicking it opens the execution settings. We won’t change anything—just hit **Start**, and we should see logs similar to what we see locally, but this time our scraper is running in the cloud.
251
+
After opening the link in our browser, assuming we're logged in, we'll see the **Source** screen on the Actor's detail page. We'll go to the **Input** tab of that screen. We won't change anything—just hit **Start**, and we should see logs similar to what we see locally, but this time our scraper will be running in the cloud.
When the run finishes, the interface will turn green. On the **Output** tab, we can preview the results as a table or JSON. We can even export the data to formats like CSV, XML, Excel, RSS, and more.
You don't need to click buttons to download the data. You can also retrieve it using [Apify's API](https://docs.apify.com/api/v2/dataset-items-get), the [`apify datasets`](https://docs.apify.com/cli/docs/reference#datasets) CLI command, or the [`apify`](https://docs.apify.com/api/client/python/docs/examples/retrieve-actor-data) Python SDK.
213
262
@@ -219,6 +268,8 @@ Now that our scraper is deployed, let's automate its execution. In the Apify web
219
268
220
269
From now on, the Actor will execute daily. We can inspect each run, view logs, check collected data, see stats, monitor charts, and even set up alerts.
If monitoring shows that our scraper frequently fails to reach the Warehouse Shop website, it's likely being blocked. To avoid this, we can configure proxies so our requests come from different locations, reducing the chances of detection and blocking.
@@ -331,7 +382,11 @@ Run: Building Actor warehouse-watchdog
331
382
? Do you want to open the Actor detail in your browser? (Y/n)
332
383
```
333
384
334
-
Back in the Apify console, go to the **Source** screen and switch to the **Input** tab. You'll see the new **Proxy config** option, which defaults to **Datacenter - Automatic**. Leave it as is and click **Start**. This time, the logs should show `Using proxy: yes`, as the scraper uses proxies provided by the platform:
385
+
Back in the Apify console, go to the **Source** screen and switch to the **Input** tab. You'll see the new **Proxy config** option, which defaults to **Datacenter - Automatic**.
386
+
387
+

388
+
389
+
Leave it as is and click **Start**. This time, the logs should show `Using proxy: yes`, as the scraper uses proxies provided by the platform:
335
390
336
391
```text
337
392
(timestamp) ACTOR: Pulling Docker image of build o6vHvr5KwA1sGNxP0 from repository.
0 commit comments