Skip to content

Commit ce3b417

Browse files
authored
Merge pull request #452 from apify/advanced-scraping-course
docs: advanced-web-scraping course
2 parents 5f941f0 + d0f7942 commit ce3b417

File tree

144 files changed

+1273
-126
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

144 files changed

+1273
-126
lines changed
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
title: Advanced web scraping
3+
description: Take your scrapers to the next level by learning various advanced concepts and techniques that will help you build highly scalable and reliable crawlers.
4+
menuWeight: 6
5+
category: web scraping & automation
6+
paths:
7+
- advanced-web-scraping
8+
---
9+
10+
# Advanced web scraping
11+
12+
In this course, we'll be tackling some of the most challenging and advanced web-scraping cases, such as mobile-app scraping, scraping sites with limited pagination, and handling large-scale cases where millions of items are scraped. Are **you** ready to take your scrapers to the next level?
13+
14+
If you've managed to follow along with all of the courses prior to this one, then you're more than ready to take these upcoming lessons on 😎
15+
16+
<!-- Just like the [**Web scraping for beginners**]({{@link web_scraping_for_beginners.md}}) course, this course is divided into two main sections: **Data collection** and **Crawling**. -->
17+
18+
## [](#first-up) First up
19+
20+
This course's [first lesson]({{@link advanced_web_scraping/scraping_paginated_sites.md}}) dives head-first into one of the most valuable skills you can have as a scraper developer: **Scraping paginated sites**.
17.4 KB
Loading
Binary file not shown.
1.78 KB
Loading
Binary file not shown.

content/academy/advanced_web_scraping/scraping_paginated_sites.md

Lines changed: 286 additions & 0 deletions
Large diffs are not rendered by default.

content/academy/anti_scraping.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: Anti-scraping protections
33
description: Understand the various anti-scraping measures different sites use to prevent bots from accessing them, and how to appear more human to fix these issues.
4-
menuWeight: 3
5-
category: courses
4+
menuWeight: 4
5+
category: web scraping & automation
66
paths:
77
- anti-scraping
88
---
@@ -73,7 +73,7 @@ Solely based on the way how the bots operate. It comperes data-rich pages visits
7373

7474
By definition, this is not an anti-scraping method, but it can heavily affect the reliability of a scraper. If your target website drastically changes its CSS selectors, and your scraper is heavily reliant on selectors, it could break. In principle, websites using this method change their HTML structure or CSS selectors randomly and frequently, making the parsing of the data harder, and requiring more maintenance of the bot.
7575

76-
One of the best ways of avoiding the possible breaking of your scraper due to website structure changes is to limit your reliance on data from HTML elements as much as possible (see [API Scraping]({{@link api_scraping.md}}) and [JavaScript objects within HTML]({{@link js_in_html.md}}))
76+
One of the best ways of avoiding the possible breaking of your scraper due to website structure changes is to limit your reliance on data from HTML elements as much as possible (see [API Scraping]({{@link api_scraping.md}}) and [JavaScript objects within HTML]({{@link node_js/js_in_html.md}}))
7777

7878
### IP session consistency
7979

content/academy/api_scraping.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: API scraping
33
description: Learn all about how the professionals scrape various types of APIs with various configurations, parameters, and requirements.
4-
menuWeight: 4
5-
category: courses
4+
menuWeight: 3
5+
category: web scraping & automation
66
paths:
77
- api-scraping
88
---

content/academy/apify_platform.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
---
2-
title: Apify platform
2+
title: About this category
33
description: Learn all about the Apify platform, all of the tools it offers, and how it can improve your overall development experience.
4-
menuWeight: 5
5-
category: courses
4+
menuWeight: 7
5+
category: apify platform
66
paths:
77
- apify-platform
88
---
99

10-
# [](#about-the-platform) Apify platform
10+
# [](#about-the-platform) About this category
1111

12-
The [Apify platform](https://apify.com) was built to serve large-scale and high-performance web scraping and automation needs. It provides easy access to compute instances ([actors]({{@link apify_platform/getting_started/actors.md}})), convenient request and result storages, proxies, scheduling, webhooks and more - all accessible through the **Console** web interface, [Apify's API](https://docs.apify.com/api/v2), or our [JavaScript](https://docs.apify.com/apify-client-js) and [Python](https://docs.apify.com/apify-client-python) API clients.
12+
The [Apify platform](https://apify.com) was built to serve large-scale and high-performance web scraping and automation needs. It provides easy access to compute instances ([actors]({{@link getting_started/actors.md}})), convenient request and result storages, proxies, scheduling, webhooks and more - all accessible through the **Console** web interface, [Apify's API](https://docs.apify.com/api/v2), or our [JavaScript](https://docs.apify.com/apify-client-js) and [Python](https://docs.apify.com/apify-client-python) API clients.
1313

14-
## [](#this-course) Course outline
14+
## [](#this-category) Category outline
1515

16-
In this course, you'll be learning how to become an Apify platform developer from the ground up. From creating your first account, to developing actors, this is your one-stop-shop for understanding how the platform works, and how to work with it.
16+
In this category, you'll be learning how to become an Apify platform developer from the ground up. From creating your first account, to developing actors, this is your one-stop-shop for understanding how the platform works, and how to work with it.
1717

1818
## [](#first) First up
1919

20-
We'll start off this course light, by showing you how to create an Apify account and get everything ready for development with the platform. [Let's go!]({{@link apify_platform/getting_started.md}})
20+
We'll start off this category light, by showing you how to create an Apify account and get everything ready for development with the platform. [Let's go!]({{@link getting_started.md}})

content/academy/concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Concepts
33
description: Learn about some common yet tricky concepts and terms that are used frequently within the academy, as well as in the world of scraper development.
4-
menuWeight: 11
4+
menuWeight: 18
55
category: glossary
66
paths:
77
- concepts

0 commit comments

Comments
 (0)