Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/styles/config/vocabularies/Docs/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ idempotency
backoff

Authy
reCaptcha
reCAPTCHA?
OAuth
untrusted
unencrypted
Expand Down
1 change: 1 addition & 0 deletions .vale.ini
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Microsoft.Foreign = NO
Microsoft.We = NO
Microsoft.Quotes = NO
Microsoft.Auto = NO
Microsoft.Units = NO
Microsoft.URLFormat = NO
Microsoft.GeneralURL = NO

Expand Down
4 changes: 1 addition & 3 deletions sources/academy/glossary/concepts/dynamic_pages.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
---
title: Dynamic pages
title: Dynamic pages and single-page applications
description: Understand what makes a page dynamic, and how a page being dynamic might change your approach when writing a scraper for it.
sidebar_position: 8.3
slug: /concepts/dynamic-pages
---

# Dynamic pages and single-page applications (SPAs) {#dynamic-pages}

**Understand what makes a page dynamic, and how a page being dynamic might change your approach when writing a scraper for it.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/concepts/http_cookies.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 8.2
slug: /concepts/http-cookies
---

# HTTP cookies {#cookies}

**Learn a bit about what cookies are, and how they are utilized in scrapers to appear logged-in, view specific data, or even avoid blocking.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/concepts/http_headers.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 8.1
slug: /concepts/http-headers
---

# HTTP headers {#headers}

**Understand what HTTP headers are, what they're used for, and three of the biggest differences between HTTP/1.1 and HTTP/2 headers.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/concepts/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ category: glossary
slug: /concepts
---

# Concepts 🤔 {#concepts}

**Learn about some common yet tricky concepts and terms that are used frequently within the academy, as well as in the world of scraper development.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 8.7
slug: /concepts/robotic-process-automation
---

# What is robotic process automation (RPA)? {#what-is-robotic-process-automation-rpa}

**Learn the basics of robotic process automation. Make your processes on the web and other software more efficient by automating repetitive tasks.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ category: glossary
slug: /glossary
---

# Why a glossary? {#why-a-glossary}

**Browse important web scraping concepts, tools and topics in succinct articles explaining common web development terms in a web scraping and automation context.**

---
Expand Down
4 changes: 1 addition & 3 deletions sources/academy/glossary/tools/apify_cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 9.1
slug: /tools/apify-cli
---

# The Apify CLI {#the-apify-cli}

**Learn about, install, and log into the Apify CLI - your best friend for interacting with the Apify platform via your terminal.**

---
Expand All @@ -15,7 +13,7 @@ The [Apify CLI](/cli) helps you create, develop, build and run Apify Actors, and

## Installing {#installing}

To install the Apfiy CLI, you'll first need npm, which comes preinstalled with Node.js. If you haven't yet installed Node, learn how to do that [here](../../webscraping/scraping_basics_javascript/data_extraction/computer_preparation.md). Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential.
To install the Apfiy CLI, you'll first need npm, which comes preinstalled with Node.js. If you haven't yet installed Node, [learn how to do that](../../webscraping/scraping_basics_javascript/data_extraction/computer_preparation.md). Additionally, make sure you've got an Apify account, as you will need to log in to the CLI to gain access to its full potential.

Open up a terminal instance and run the following command:

Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/tools/edit_this_cookie.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 9.7
slug: /tools/edit-this-cookie
---

# What's EditThisCookie? {#what-is-it}

**Learn how to add, delete, and modify different cookies in your browser for testing purposes using the EditThisCookie Chrome extension.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/tools/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ category: glossary
slug: /tools
---

# Tools 🔧 {#tools}

**Discover a variety of tools that can be used to enhance the scraper development process, or even unlock doors to new scraping possibilities.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/tools/insomnia.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 9.2
slug: /tools/insomnia
---

# What is Insomnia {#what-is-insomnia}

**Learn about Insomnia, a valuable tool for testing requests and proxies when building scalable web scrapers.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/tools/modheader.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 9.5
slug: /tools/modheader
---

# What is ModHeader? {#what-is-modheader}

**Discover a super useful Chrome extension called ModHeader, which allows you to modify your browser's HTTP request headers.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/tools/postman.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 9.3
slug: /tools/postman
---

# What is Postman? {#what-is-postman}

**Learn about Postman, a valuable tool for testing requests and proxies when building scalable web scrapers.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/tools/proxyman.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 9.4
slug: /tools/proxyman
---

# What's Proxyman? {#what-is-proxyman}

**Learn about Proxyman, a tool for viewing all network requests that are coming through your system. Filter by response type, by a keyword, or by application.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/tools/quick_javascript_switcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 9.9
slug: /tools/quick-javascript-switcher
---

# Quick JavaScript Switcher

**Discover a handy tool for disabling JavaScript on a certain page to determine how it should be scraped. Great for detecting SPAs.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/tools/switchyomega.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 9.6
slug: /tools/switchyomega
---

# What is SwitchyOmega? {#what-is-switchyomega}

**Discover SwitchyOmega, a Chrome extension to manage and switch between proxies, which is extremely useful when testing proxies for a scraper.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/glossary/tools/user_agent_switcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 9.8
slug: /tools/user-agent-switcher
---

# User-Agent Switcher

**Learn how to switch your User-Agent header to different values in order to monitor how a certain site responds to the changes.**

---
Expand Down
2 changes: 1 addition & 1 deletion sources/academy/platform/apify_platform.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Introduction to Apify platform
title: Introduction to the Apify platform
description: Learn all about the Apify platform, all of the tools it offers, and how it can improve your overall development experience.
sidebar_position: 7
category: apify platform
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 6.1
slug: /expert-scraping-with-apify/actors-webhooks
---

# Webhooks & advanced Actor overview {#webhooks-and-advanced-actors}

**Learn more advanced details about Actors, how they work, and the default configurations they can take. Also, learn how to integrate your Actor with webhooks.**

---
Expand All @@ -15,7 +13,7 @@ Thus far, you've run Actors on the platform and written an Actor of your own, wh

## Advanced Actor overview {#advanced-actors}

In this course, we'll be working out of the Amazon scraper project from the **Web scraping basics for JavaScript devs** course. If you haven't already built that project, you can do it in three short lessons [here](../../webscraping/scraping_basics_javascript/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same.
In this course, we'll be working out of the Amazon scraper project from the **Web scraping basics for JavaScript devs** course. If you haven't already built that project, you can do it in [three short lessons](../../webscraping/scraping_basics_javascript/challenge/index.md). We've made a few small modifications to the project with the Apify SDK, but 99% of the code is still the same.

Take another look at the files within your Amazon scraper project. You'll notice that there is a **Dockerfile**. Every single Actor has a Dockerfile (the Actor's **Image**) which tells Docker how to spin up a container on the Apify platform which can successfully run the Actor's code. "Apify Actors" is a serverless platform that runs multiple Docker containers. For a deeper understanding of Actor Dockerfiles, refer to the [Apify Actor Dockerfile docs](/sdk/js/docs/guides/docker-images#example-dockerfile).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 6.4
slug: /expert-scraping-with-apify/apify-api-and-client
---

# Apify API & client {#api-and-client}

**Gain an in-depth understanding of the two main ways of programmatically interacting with the Apify platform - through the API, and through a client.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 6.6
slug: /expert-scraping-with-apify/bypassing-anti-scraping
---

# Bypassing anti-scraping methods {#bypassing-anti-scraping-methods}

**Learn about bypassing anti-scraping methods using proxies and proxy/session rotation together with Crawlee and the Apify SDK.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ category: apify platform
slug: /expert-scraping-with-apify
---

# Expert scraping with Apify {#expert-scraping}

**After learning the basics of Actors and Apify, learn to develop pro-level scrapers on the Apify platform with this advanced course.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 6.2
slug: /expert-scraping-with-apify/managing-source-code
---

# Managing source code {#managing-source-code}

**Learn how to manage your Actor's source code more efficiently by integrating it with a GitHub repository. This is standard on the Apify platform.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 6.5
slug: /expert-scraping-with-apify/migrations-maintaining-state
---

# Migrations & maintaining state {#migrations-maintaining-state}

**Learn about what Actor migrations are and how to handle them properly so that the state is not lost and runs can safely be resurrected.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,13 @@ sidebar_position: 6.7
slug: /expert-scraping-with-apify/saving-useful-stats
---

# Saving useful run statistics {#savings-useful-run-statistics}

**Understand how to save statistics about an Actor's run, what types of statistics you can save, and why you might want to save them for a large-scale scraper.**

---

Using Crawlee and the Apify SDK, we are now able to collect and format data coming directly from websites and save it into a Key-Value store or Dataset. This is great, but sometimes, we want to store some extra data about the run itself, or about each request. We might want to store some extra general run information separately from our results or potentially include statistics about each request within its corresponding dataset item.

The types of values that are saved are totally up to you, but the most common are error scores, number of total saved items, number of request retries, number of captchas hit, etc. Storing these values is not always necessary, but can be valuable when debugging and maintaining an Actor. As your projects scale, this will become more and more useful and important.
The types of values that are saved are totally up to you, but the most common are error scores, number of total saved items, number of request retries, number of CAPTCHAs hit, etc. Storing these values is not always necessary, but can be valuable when debugging and maintaining an Actor. As your projects scale, this will become more and more useful and important.

## Learning 🧠 {#learning}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 5
slug: /expert-scraping-with-apify/solutions/handling-migrations
---

# Handling migrations {#handling-migrations}

**Get real-world experience of maintaining a stateful object stored in memory, which will be persisted through migrations and even graceful aborts.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 6.7
slug: /expert-scraping-with-apify/solutions
---

# Solutions

**View all of the solutions for all of the activities and tasks of this course. Please try to complete each task on your own before reading the solution!**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 1
slug: /expert-scraping-with-apify/solutions/integrating-webhooks
---

# Integrating webhooks {#integrating-webhooks}

**Learn how to integrate webhooks into your Actors. Webhooks are a super powerful tool, and can be used to do almost anything!**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 2
slug: /expert-scraping-with-apify/solutions/managing-source
---

# Managing source

**View in-depth answers for all three of the quiz questions that were provided in the corresponding lesson about managing source code.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 6
slug: /expert-scraping-with-apify/solutions/rotating-proxies
---

# Rotating proxies/sessions {#rotating-proxy-sessions}

**Learn firsthand how to rotate proxies and sessions in order to avoid the majority of the most common anti-scraping protections.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 7
slug: /expert-scraping-with-apify/solutions/saving-stats
---

# Saving run stats {#saving-stats}

**Implement the saving of general statistics about an Actor's run, as well as adding request-specific statistics to dataset items.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 4
slug: /expert-scraping-with-apify/solutions/using-api-and-client
---

# Using the Apify API & JavaScript client {#using-api-and-client}

**Learn how to interact with the Apify API directly through the well-documented RESTful routes, or by using the proprietary Apify JavaScript client.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 3
slug: /expert-scraping-with-apify/solutions/using-storage-creating-tasks
---

# Using storage & creating tasks {#using-storage-creating-tasks}

## Quiz answers 📝 {#quiz-answers}

**Q: What is the relationship between Actors and tasks?**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 6.3
slug: /expert-scraping-with-apify/tasks-and-storage
---

# Tasks & storage {#tasks-and-storage}

**Understand how to save the configurations for Actors with Actor tasks. Also, learn about storage and the different types Apify offers.**

---
Expand Down
2 changes: 1 addition & 1 deletion sources/academy/platform/getting_started/apify_api.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Apify API
title: The Apify API
description: Learn how to use the Apify API to programmatically call your Actors, retrieve data stored on the platform, view Actor logs, and more!
sidebar_position: 4
slug: /getting-started/apify-api
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/tutorials/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ category: tutorials
slug: /api
---

# Using Apify API

**A collection of various tutorials explaining how to interact with the Apify platform programmatically using its API.**

---
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/tutorials/api/using_apify_from_php.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ description: Learn how to access Apify's REST API endpoints from your PHP projec
slug: /php/use-apify-from-php
---

# How to use Apify from PHP

Apify's [RESTful API](https://docs.apify.com/api/v2#) allows you to use the platform from basically anywhere. Many projects are and will continue to be built using [PHP](https://www.php.net/). This tutorial enables you to use Apify in these projects in PHP and frameworks built on it.

Apify does not have an official PHP client (yet), so we are going to use [guzzle](https://github.com/guzzle/guzzle), a great library for HTTP requests. By covering a few fundamental endpoints, this tutorial will show you the principles you can use for all Apify API endpoints.
Expand Down
2 changes: 0 additions & 2 deletions sources/academy/tutorials/apify_scrapers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 5
slug: /apify-scrapers
---

# Using ready-made Apify scrapers

**Discover Apify's ready-made web scraping and automation tools. Compare Web Scraper, Cheerio Scraper and Puppeteer Scraper to decide which is right for you.**

---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ sidebar_position: 14.1
slug: /node-js/analyzing-pages-and-fixing-errors
---

# How to analyze and fix errors when scraping a website {#scraping-with-sitemaps}

**Learn how to deal with random crashes in your web-scraping and automation jobs. Find out the essentials of debugging and fixing problems in your crawlers.**

---
Expand Down
Loading