Merge pull request #428 from apify/general-improvements

mstephen19 · web-flow · commit e92530388edf · 2022-08-19T12:50:47.000+02:00
fix(academy): inconsistent filename style
diff --git a/content/academy/web_scraping_for_beginners.md b/content/academy/web_scraping_for_beginners.md
@@ -30,6 +30,7 @@ This is what you'll learn in the **Web scraping for beginners** course:
 * [Web scraping for beginners]({{@link web_scraping_for_beginners.md}})
   * [Basics of data collection]({{@link web_scraping_for_beginners/data_collection.md}})
   * [Basics of crawling]({{@link web_scraping_for_beginners/crawling.md}})
+  * [Best practices]({{@link web_scraping_for_beginners/best_practices.md}})
 
 <!-- Other courses and lessons (coming soon) in the Academy:
 
@@ -56,7 +57,9 @@ This is what you'll learn in the **Web scraping for beginners** course:
 
 ## [](#requirements) Requirements
 
-You don't need to be a developer or a software engineer to complete this course, but basic programming knowledge is recommended. Don't be afraid, though. We explain everything in great detail in the Web scraping for beginners course and provide external references that can help you level up your web scraping and development skills. If you're new to programming, pay very close attention to the instructions and examples. A seemingly insignificant thing like using `[]` instead of `()` can make a lot of difference.
+You don't need to be a developer or a software engineer to complete this course, but basic programming knowledge is recommended. Don't be afraid, though. We explain everything in great detail in the **Web scraping for beginners** course and provide external references that can help you level up your web scraping and development skills. If you're new to programming, pay very close attention to the instructions and examples. A seemingly insignificant thing like using `[]` instead of `()` can make a lot of difference.
+
+> If you don't already have basic programming knowledge and would like to be well-prepared for this course, we recommend taking a [JavaScript course](https://www.codecademy.com/learn/introduction-to-javascript) and learning about [CSS Selectors](https://www.w3schools.com/css/css_selectors.asp).
 
 As you progress to the Advanced and Pro courses, the coding will get more challenging, but still manageable to a person with an intermediate level of programming skills.
 
diff --git a/content/academy/web_scraping_for_beginners/crawling/dealing_with_dynamic_pages.md b/content/academy/web_scraping_for_beginners/crawling/dealing_with_dynamic_pages.md
@@ -18,7 +18,7 @@ From our adored and beloved [Fakestore](https://demo-webstore.apify.org/), we ha
 
 ![New arrival products in Fakestore]({{@asset web_scraping_for_beginners/crawling/images/new-arrivals.webp}})
 
-In your project from the previous lessons, or in a new project, create a file called `dynamic.js` and copy-paste the following boiler plate code into it:
+In your project from the previous lessons, or in a new project, create a file called **dynamic.js** and copy-paste the following boiler plate code into it:
 
 ```JavaScript
 import { CheerioCrawler } from 'crawlee';
diff --git a/content/academy/web_scraping_for_beginners/crawling/finding_links.md b/content/academy/web_scraping_for_beginners/crawling/finding_links.md
@@ -40,7 +40,7 @@ for (const link of links) {
 
 ## [](#collecting-links-in-node) Collecting links in Node.js
 
-DevTools is a fun playground, but Node.js is way more useful. Let's create a new file in our project called `crawler.js` and start adding some basic crawling code. We'll start with the same boilerplate as with our original scraper, but this time, we'll download the HTML of [the demo site's main page](https://demo-webstore.apify.org/).
+DevTools is a fun playground, but Node.js is way more useful. Let's create a new file in our project called **crawler.js** and start adding some basic crawling code. We'll start with the same boilerplate as with our original scraper, but this time, we'll download the HTML of [the demo site's main page](https://demo-webstore.apify.org/).
 
 ```JavaScript
 // crawler.js
diff --git a/content/academy/web_scraping_for_beginners/crawling/headless_browser.md b/content/academy/web_scraping_for_beginners/crawling/headless_browser.md
@@ -19,7 +19,7 @@ Building a Playwright scraper with Crawlee is extremely easy. To show you how ea
 First, we must not forget to install Playwright into our project.
 
 ```shell
-npm install --save playwright
+npm install playwright
 ```
 
 After Playwright installs, we can proceed with updating the scraper code. As always, the comments describe changes in the code. Everything else is the same as before.
diff --git a/content/academy/web_scraping_for_beginners/crawling/processing_data.md b/content/academy/web_scraping_for_beginners/crawling/processing_data.md
@@ -17,7 +17,7 @@ But when we look inside the folder, we see that there's A LOT of files, and we d
 
 ## [](#loading-data) Loading dataset data
 
-To access the default dataset, we can use the  [`Dataset`](https://crawlee.dev/api/types/interface/Dataset) class exported from `crawlee`. We can then easily work with all the items in the dataset. Let's put the processing into a separate file in our project called `dataset.js`.
+To access the default dataset, we can use the  [`Dataset`](https://crawlee.dev/api/types/interface/Dataset) class exported from `crawlee`. We can then easily work with all the items in the dataset. Let's put the processing into a separate file in our project called **dataset.js**.
 
 ```JavaScript
 // dataset.js
diff --git a/content/academy/web_scraping_for_beginners/crawling/scraping_the_data.md b/content/academy/web_scraping_for_beginners/crawling/scraping_the_data.md
@@ -90,7 +90,7 @@ Using this flow as guidance, we should be able to connect the pieces of code tog
 
 ## [](#building-scraper) Building the scraper
 
-Let's create a brand new file called `final.js` and write our scraper there. Then, we'll put our imports at the top of the file:
+Let's create a brand new file called **final.js** and write our scraper there. Then, we'll put our imports at the top of the file:
 
 ```JavaScript
 // final.js
@@ -107,7 +107,7 @@ const response = await gotScraping(`${BASE_URL}/search/on-sale`);
 const $ = cheerio.load(response.body);
 ```
 
-Next, we need to **collect the next URLs** we want to visit (the product URLs). So far, the code is nearly exactly the same as the `crawler.js` code.
+Next, we need to **collect the next URLs** we want to visit (the product URLs). So far, the code is nearly exactly the same as the **crawler.js** code.
 
 ```JavaScript
 const BASE_URL = 'https://demo-webstore.apify.org';
diff --git a/content/academy/web_scraping_for_beginners/data_collection/node_js_scraper.md b/content/academy/web_scraping_for_beginners/data_collection/node_js_scraper.md
@@ -23,7 +23,7 @@ const html = response.body;
 console.log(html);
 ```
 
-Now run the script (using `node main.js`). After a brief moment, you should see the page's HTML printed to your terminal. If you get an error that says something along the lines of **urlToHttpOptions is not a function**, you need to update Node.js to version 15.10 or higher. If you followed the installation instructions earlier, you don't need to worry about this, because you have the correct version installed.
+Now run the script (using the `node main.js` command). After a brief moment, you should see the page's HTML printed to your terminal. If you get an error that says something along the lines of **urlToHttpOptions is not a function**, you need to update Node.js to version 15.10 or higher. If you followed the installation instructions earlier, you don't need to worry about this, because you have the correct version installed.
 
 > `gotScraping` is an `async` function and the `await` keyword is used to pause execution of the script until it returns the `response`. [Learn more about `async` and `await`](https://javascript.info/async-await)
 
diff --git a/content/academy/web_scraping_for_beginners/introduction.md b/content/academy/web_scraping_for_beginners/introduction.md
@@ -11,19 +11,19 @@ paths:
 
 Web scraping or crawling? Data collection, mining, or extraction? You can find various definitions on the web. Let's agree on simple explanations that we will use throughout this beginner course on web scraping.
 
-## [](#data-collection) What is data collection?
+## [](#what-is-data-collection) What is data collection?
 
 For us, data collection is a process that takes a web page, like an Amazon product page, and collects useful information from the page, such as the product's name and price. Web pages are an unstructured data source and the goal of data collection is to make the information structured and readable to computers. The main sources of data on a web page are HTML documents and API calls, but also images, PDFs, and so on.
 
 ![product data collection from Amazon]({{@asset web_scraping_for_beginners/images/beginners-data-collection.webp}})
 
-## [](#crawling) What is crawling?
+## [](#what-is-crawling) What is crawling?
 
 Where data collection focuses on a single page, web crawling (sometimes called spidering 🕷) is all about movement between pages or websites. The purpose of crawling is to travel across the website to find pages with the information we want. Crawling and collection can happen simultaneously, while moving from page to page, or separately, where one scraper focuses solely on finding pages with data and another scraper collects the data. The main purpose of crawling is to collect URLs or other identifiers that can be used to move around.
 
 <!-- TODO: An illustration of moving between pages -->
 
-## [](#web-scraping) What is web scraping?
+## [](#what-is-web-scraping) What is web scraping?
 
 We use web scraping as a general term for crawling, collection and all other activities that have the purpose of converting unstructured data from the web to a structured format. In the advanced courses, you'll learn that modern web scraping is about much more than just HTML and URLs.