You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/academy/web_scraping_for_beginners/challenge.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ paths:
10
10
11
11
Before moving onto the other courses in the academy, we recommend following along with this section, as it combines everything you've learned in the previous lessons into one cohesive project that helps you prove to yourself that you've thoroughly understood the material.
12
12
13
-
We recommended that you make sure you've gone through both the [data collection]({{@link web_scraping_for_beginners/data_collection.md}}) [crawing]({{@link web_scraping_for_beginners/crawling.md}}) sections of this course to ensure the smoothest development process.
13
+
We recommend that you make sure you've gone through both the [data collection]({{@link web_scraping_for_beginners/data_collection.md}}) and [crawling]({{@link web_scraping_for_beginners/crawling.md}}) sections of this course to ensure the smoothest development process.
14
14
15
15
## [](#learning) Learning 🧠
16
16
@@ -29,7 +29,7 @@ On Amazon, we can use this link to get to the results page of any product we wan
Copy file name to clipboardExpand all lines: content/academy/web_scraping_for_beginners/challenge/initializing_and_setting_up.md
+12-19Lines changed: 12 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,43 +11,36 @@ paths:
11
11
The Crawlee CLI makes it extremely easy for us to set up a project in Crawlee and hit the ground running. Navigate to the directory you'd like your project's folder to live, then open up a terminal instance and run the following command:
12
12
13
13
```shell
14
-
npx crawlee create demo-actor
15
-
```
16
-
17
-
> You don't have to call it **demo-actor**, but that's what we'll be calling it in this tutorial.
14
+
npx crawlee create amazon-crawler
18
15
19
16
Once you run this command, you'll get prompted into a menu which you can navigate using your arrow keys. Each of these options will generate different boilerplate code when selected. We're going to work with CheerioCrawler today, so we'll select the **CheerioCrawler template project** template, then press **Enter**.
Once it's completed, open up the **demo-actor** folder that was generated by the `npx crawlee create` command. We're going to modify the **main.js** boilerplate to fit our needs:
20
+
Once it's completed, open up the **amazon-crawler** folder that was generated by the `npx crawlee create` command. We're going to modify the **main.js** boilerplate to fit our needs:
24
21
25
22
```JavaScript
26
23
// main.js
27
24
import { CheerioCrawler, KeyValueStore, log } from 'crawlee';
> If you are sometimes getting an error along the lines of **RequestError: Proxy responded with 407**, don't worry, this is totally normal. The request will retry and succeed.
30
29
31
30
Great! But wait, where do we go from here? We need to go to the offers page next and scrape each offer, but how can we do that? Let's take a small break from writing the scraper and open up [Proxyman]({{@link tools/proxyman.md}}) to analyze requests which we might be difficult to find in the network tab, then we'll click the button on the product page that loads up all of the product offers:
0 commit comments