Merge branch 'revamp-beginners-course' of https://github.com/apify/apify-docs into revamp-beginners-course

mstephen19 · mstephen19 · commit 9c702c9e855a · 2022-09-05T21:13:11.000+02:00
diff --git a/content/academy/web_scraping_for_beginners/challenge.md b/content/academy/web_scraping_for_beginners/challenge.md
@@ -10,7 +10,7 @@ paths:
 
 Before moving onto the other courses in the academy, we recommend following along with this section, as it combines everything you've learned in the previous lessons into one cohesive project that helps you prove to yourself that you've thoroughly understood the material.
 
-We recommended that you make sure you've gone through both the [data collection]({{@link web_scraping_for_beginners/data_collection.md}}) [crawing]({{@link web_scraping_for_beginners/crawling.md}}) sections of this course to ensure the smoothest development process.
+We recommend that you make sure you've gone through both the [data collection]({{@link web_scraping_for_beginners/data_collection.md}}) and [crawling]({{@link web_scraping_for_beginners/crawling.md}}) sections of this course to ensure the smoothest development process.
 
 ## [](#learning) Learning 🧠
 
@@ -29,7 +29,7 @@ On Amazon, we can use this link to get to the results page of any product we wan
 https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=KEYWORD
 ```
 
-Our actor's input will look like this:
+Our crawler's input will look like this:
 
 ```JSON
 {
diff --git a/content/academy/web_scraping_for_beginners/challenge/initializing_and_setting_up.md b/content/academy/web_scraping_for_beginners/challenge/initializing_and_setting_up.md
@@ -11,43 +11,36 @@ paths:
 The Crawlee CLI makes it extremely easy for us to set up a project in Crawlee and hit the ground running. Navigate to the directory you'd like your project's folder to live, then open up a terminal instance and run the following command:
 
 ```shell
-npx crawlee create demo-actor
-```
-
-> You don't have to call it **demo-actor**, but that's what we'll be calling it in this tutorial.
+npx crawlee create amazon-crawler
 
 Once you run this command, you'll get prompted into a menu which you can navigate using your arrow keys. Each of these options will generate different boilerplate code when selected. We're going to work with CheerioCrawler today, so we'll select the **CheerioCrawler template project** template, then press **Enter**.
 
 ![Crawlee CLI "create" command]({{@asset web_scraping_for_beginners/challenge/images/crawlee-create.webp}})
 
-Once it's completed, open up the **demo-actor** folder that was generated by the `npx crawlee create` command. We're going to modify the **main.js** boilerplate to fit our needs:
+Once it's completed, open up the **amazon-crawler** folder that was generated by the `npx crawlee create` command. We're going to modify the **main.js** boilerplate to fit our needs:
 
 ```JavaScript
 // main.js
 import { CheerioCrawler, KeyValueStore, log } from 'crawlee';
 import { router } from './routes.js';
 
 // Grab our keyword from the input
-const { keyword = 'iphone' } = (await KeyValueStore.getInput()) ?? {};
+const { keyword } = await KeyValueStore.getInput();
 
 const crawler = new CheerioCrawler({
     requestHandler: router,
 });
 
-// Add our initial requests
-await crawler.addRequests([
-    {
-        // Turn the inputted keyword into a link we can make a request with
-        url: `https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=${keyword}`,
-        label: 'START',
-        userData: {
-            keyword,
-        },
-    },
-]);
 
 log.info('Starting the crawl.');
-await crawler.run();
+await crawler.run([{
+    // Turn the keyword into a link we can make a request with
+    url: `https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=${keyword}`,
+    label: 'START',
+    userData: {
+        keyword,
+    },    
+}]);
 log.info('Crawl finished.');
 ```
 
@@ -62,7 +55,7 @@ router.addDefaultHandler(({ log }) => {
 });
 ```
 
-Finally, we'll modify our input file in **storage/key_value_stores/default/INPUT.json** to look like this:
+Finally, we'll add the following input file to **INPUT.json** in the project's root directory (next to `package.json`, `node_modules` and others)
 
 ```JSON
 {
diff --git a/content/academy/web_scraping_for_beginners/challenge/scraping_amazon.md b/content/academy/web_scraping_for_beginners/challenge/scraping_amazon.md
@@ -26,7 +26,6 @@ router.addHandler(labels.PRODUCT, async ({ $, crawler, request }) => {
 });
 ```
 
-> If you are sometimes getting an error along the lines of **RequestError: Proxy responded with 407**, don't worry, this is totally normal. The request will retry and succeed.
 
 Great! But wait, where do we go from here? We need to go to the offers page next and scrape each offer, but how can we do that? Let's take a small break from writing the scraper and open up [Proxyman]({{@link tools/proxyman.md}}) to analyze requests which we might be difficult to find in the network tab, then we'll click the button on the product page that loads up all of the product offers: