Skip to content

Commit e925303

Browse files
authored
Merge pull request #428 from apify/general-improvements
fix(academy): inconsistent filename style
2 parents 3bf64a9 + 928765d commit e925303

File tree

8 files changed

+14
-11
lines changed

8 files changed

+14
-11
lines changed

content/academy/web_scraping_for_beginners.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ This is what you'll learn in the **Web scraping for beginners** course:
3030
* [Web scraping for beginners]({{@link web_scraping_for_beginners.md}})
3131
* [Basics of data collection]({{@link web_scraping_for_beginners/data_collection.md}})
3232
* [Basics of crawling]({{@link web_scraping_for_beginners/crawling.md}})
33+
* [Best practices]({{@link web_scraping_for_beginners/best_practices.md}})
3334

3435
<!-- Other courses and lessons (coming soon) in the Academy:
3536
@@ -56,7 +57,9 @@ This is what you'll learn in the **Web scraping for beginners** course:
5657
5758
## [](#requirements) Requirements
5859

59-
You don't need to be a developer or a software engineer to complete this course, but basic programming knowledge is recommended. Don't be afraid, though. We explain everything in great detail in the Web scraping for beginners course and provide external references that can help you level up your web scraping and development skills. If you're new to programming, pay very close attention to the instructions and examples. A seemingly insignificant thing like using `[]` instead of `()` can make a lot of difference.
60+
You don't need to be a developer or a software engineer to complete this course, but basic programming knowledge is recommended. Don't be afraid, though. We explain everything in great detail in the **Web scraping for beginners** course and provide external references that can help you level up your web scraping and development skills. If you're new to programming, pay very close attention to the instructions and examples. A seemingly insignificant thing like using `[]` instead of `()` can make a lot of difference.
61+
62+
> If you don't already have basic programming knowledge and would like to be well-prepared for this course, we recommend taking a [JavaScript course](https://www.codecademy.com/learn/introduction-to-javascript) and learning about [CSS Selectors](https://www.w3schools.com/css/css_selectors.asp).
6063
6164
As you progress to the Advanced and Pro courses, the coding will get more challenging, but still manageable to a person with an intermediate level of programming skills.
6265

content/academy/web_scraping_for_beginners/crawling/dealing_with_dynamic_pages.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ From our adored and beloved [Fakestore](https://demo-webstore.apify.org/), we ha
1818

1919
![New arrival products in Fakestore]({{@asset web_scraping_for_beginners/crawling/images/new-arrivals.webp}})
2020

21-
In your project from the previous lessons, or in a new project, create a file called `dynamic.js` and copy-paste the following boiler plate code into it:
21+
In your project from the previous lessons, or in a new project, create a file called **dynamic.js** and copy-paste the following boiler plate code into it:
2222

2323
```JavaScript
2424
import { CheerioCrawler } from 'crawlee';

content/academy/web_scraping_for_beginners/crawling/finding_links.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ for (const link of links) {
4040

4141
## [](#collecting-links-in-node) Collecting links in Node.js
4242

43-
DevTools is a fun playground, but Node.js is way more useful. Let's create a new file in our project called `crawler.js` and start adding some basic crawling code. We'll start with the same boilerplate as with our original scraper, but this time, we'll download the HTML of [the demo site's main page](https://demo-webstore.apify.org/).
43+
DevTools is a fun playground, but Node.js is way more useful. Let's create a new file in our project called **crawler.js** and start adding some basic crawling code. We'll start with the same boilerplate as with our original scraper, but this time, we'll download the HTML of [the demo site's main page](https://demo-webstore.apify.org/).
4444

4545
```JavaScript
4646
// crawler.js

content/academy/web_scraping_for_beginners/crawling/headless_browser.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Building a Playwright scraper with Crawlee is extremely easy. To show you how ea
1919
First, we must not forget to install Playwright into our project.
2020

2121
```shell
22-
npm install --save playwright
22+
npm install playwright
2323
```
2424

2525
After Playwright installs, we can proceed with updating the scraper code. As always, the comments describe changes in the code. Everything else is the same as before.

content/academy/web_scraping_for_beginners/crawling/processing_data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ But when we look inside the folder, we see that there's A LOT of files, and we d
1717

1818
## [](#loading-data) Loading dataset data
1919

20-
To access the default dataset, we can use the [`Dataset`](https://crawlee.dev/api/types/interface/Dataset) class exported from `crawlee`. We can then easily work with all the items in the dataset. Let's put the processing into a separate file in our project called `dataset.js`.
20+
To access the default dataset, we can use the [`Dataset`](https://crawlee.dev/api/types/interface/Dataset) class exported from `crawlee`. We can then easily work with all the items in the dataset. Let's put the processing into a separate file in our project called **dataset.js**.
2121

2222
```JavaScript
2323
// dataset.js

content/academy/web_scraping_for_beginners/crawling/scraping_the_data.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ Using this flow as guidance, we should be able to connect the pieces of code tog
9090
9191
## [](#building-scraper) Building the scraper
9292

93-
Let's create a brand new file called `final.js` and write our scraper there. Then, we'll put our imports at the top of the file:
93+
Let's create a brand new file called **final.js** and write our scraper there. Then, we'll put our imports at the top of the file:
9494

9595
```JavaScript
9696
// final.js
@@ -107,7 +107,7 @@ const response = await gotScraping(`${BASE_URL}/search/on-sale`);
107107
const $ = cheerio.load(response.body);
108108
```
109109

110-
Next, we need to **collect the next URLs** we want to visit (the product URLs). So far, the code is nearly exactly the same as the `crawler.js` code.
110+
Next, we need to **collect the next URLs** we want to visit (the product URLs). So far, the code is nearly exactly the same as the **crawler.js** code.
111111

112112
```JavaScript
113113
const BASE_URL = 'https://demo-webstore.apify.org';

content/academy/web_scraping_for_beginners/data_collection/node_js_scraper.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ const html = response.body;
2323
console.log(html);
2424
```
2525

26-
Now run the script (using `node main.js`). After a brief moment, you should see the page's HTML printed to your terminal. If you get an error that says something along the lines of **urlToHttpOptions is not a function**, you need to update Node.js to version 15.10 or higher. If you followed the installation instructions earlier, you don't need to worry about this, because you have the correct version installed.
26+
Now run the script (using the `node main.js` command). After a brief moment, you should see the page's HTML printed to your terminal. If you get an error that says something along the lines of **urlToHttpOptions is not a function**, you need to update Node.js to version 15.10 or higher. If you followed the installation instructions earlier, you don't need to worry about this, because you have the correct version installed.
2727

2828
> `gotScraping` is an `async` function and the `await` keyword is used to pause execution of the script until it returns the `response`. [Learn more about `async` and `await`](https://javascript.info/async-await)
2929

content/academy/web_scraping_for_beginners/introduction.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,19 +11,19 @@ paths:
1111

1212
Web scraping or crawling? Data collection, mining, or extraction? You can find various definitions on the web. Let's agree on simple explanations that we will use throughout this beginner course on web scraping.
1313

14-
## [](#data-collection) What is data collection?
14+
## [](#what-is-data-collection) What is data collection?
1515

1616
For us, data collection is a process that takes a web page, like an Amazon product page, and collects useful information from the page, such as the product's name and price. Web pages are an unstructured data source and the goal of data collection is to make the information structured and readable to computers. The main sources of data on a web page are HTML documents and API calls, but also images, PDFs, and so on.
1717

1818
![product data collection from Amazon]({{@asset web_scraping_for_beginners/images/beginners-data-collection.webp}})
1919

20-
## [](#crawling) What is crawling?
20+
## [](#what-is-crawling) What is crawling?
2121

2222
Where data collection focuses on a single page, web crawling (sometimes called spidering 🕷) is all about movement between pages or websites. The purpose of crawling is to travel across the website to find pages with the information we want. Crawling and collection can happen simultaneously, while moving from page to page, or separately, where one scraper focuses solely on finding pages with data and another scraper collects the data. The main purpose of crawling is to collect URLs or other identifiers that can be used to move around.
2323

2424
<!-- TODO: An illustration of moving between pages -->
2525

26-
## [](#web-scraping) What is web scraping?
26+
## [](#what-is-web-scraping) What is web scraping?
2727

2828
We use web scraping as a general term for crawling, collection and all other activities that have the purpose of converting unstructured data from the web to a structured format. In the advanced courses, you'll learn that modern web scraping is about much more than just HTML and URLs.
2929

0 commit comments

Comments
 (0)