diff --git a/docusaurus.config.js b/docusaurus.config.js
index 2f8fa964e4..125205c4f5 100644
--- a/docusaurus.config.js
+++ b/docusaurus.config.js
@@ -51,6 +51,8 @@ module.exports = {
/** @type {import('@docusaurus/types').ReportingSeverity} */ ('throw'),
onBrokenMarkdownLinks:
/** @type {import('@docusaurus/types').ReportingSeverity} */ ('throw'),
+ onBrokenAnchors:
+ /** @type {import('@docusaurus/types').ReportingSeverity} */ ('warn'),
themes: [
[
require.resolve('./apify-docs-theme'),
diff --git a/sources/academy/platform/deploying_your_code/input_schema.md b/sources/academy/platform/deploying_your_code/input_schema.md
index d438278752..f337203a76 100644
--- a/sources/academy/platform/deploying_your_code/input_schema.md
+++ b/sources/academy/platform/deploying_your_code/input_schema.md
@@ -53,7 +53,7 @@ Each property's key corresponds to the name we're expecting within our code, whi
## Property types & editor types {#property-types}
-Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings.
+Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema/specification/v1#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings.
```json
{
diff --git a/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md b/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md
index 4e4fa6f226..ccc9c62f3e 100644
--- a/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md
+++ b/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md
@@ -20,7 +20,7 @@ You might have already noticed that we've been using the **RESIDENTIAL** proxy g
## Learning 🧠{#learning}
- Skim [this page](https://apify.com/proxy) for a general idea of Apify Proxy.
-- Give the [proxy documentation](/platform/proxy#our-proxies) a solid readover (feel free to skip most of the examples).
+- Give the [proxy documentation](/platform/proxy) a solid readover (feel free to skip most of the examples).
- Check out the [anti-scraping guide](../../webscraping/anti_scraping/index.md).
- Gain a solid understanding of the [SessionPool](https://crawlee.dev/api/core/class/SessionPool).
- Look at a few Actors on the [Apify store](https://apify.com/store). How are they utilizing proxies?
diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md b/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md
index fe23b28fc4..635971ff65 100644
--- a/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md
+++ b/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md
@@ -231,7 +231,7 @@ That's everything! Now, even if the Actor migrates (or is gracefully aborted and
**A:** It's not best to use this option by default. If it fails, there must be a reason, which would need to be thought through first - meaning that the edge case of failing should be handled when resurrecting the Actor. The state should be persisted beforehand.
-**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run?**
+**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running/runs-and-builds#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run?**
**A:** After aborting or throwing an error mid-process, it manages to start back from where it was upon resurrection.
diff --git a/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md b/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md
index 8d42fc6d18..16889c7085 100644
--- a/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md
+++ b/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md
@@ -24,7 +24,7 @@ Storage allows us to save persistent data for further processing. As you'll lear
## Learning 🧠{#learning}
- Check out [the docs about Actor tasks](/platform/actors/running/tasks).
-- Read about the [two main storage options](/platform/storage#dataset) on the Apify platform.
+- Read about the [two main storage options](/platform/storage/dataset) on the Apify platform.
- Understand the [crucial differences between named and unnamed storages](/platform/storage/usage#named-and-unnamed-storages).
- Learn about the [`Dataset`](/sdk/js/reference/class/Dataset) and [`KeyValueStore`](/sdk/js/reference/class/KeyValueStore) objects in the Apify SDK.
diff --git a/sources/academy/platform/getting_started/inputs_outputs.md b/sources/academy/platform/getting_started/inputs_outputs.md
index 88a87c5810..564e31f4d8 100644
--- a/sources/academy/platform/getting_started/inputs_outputs.md
+++ b/sources/academy/platform/getting_started/inputs_outputs.md
@@ -65,7 +65,7 @@ Then, replace everything in **INPUT_SCHEMA.json** with this:
}
```
-> If you're interested in learning more about how the code works, and what the **INPUT_SCHEMA.json** means, read about [inputs](/sdk/js/docs/examples/accept-user-input) and [adding data to a dataset](/sdk/js/docs/examples/add-data-to-dataset) in the Apify SDK documentation, and refer to the [input schema docs](/platform/actors/development/actor-definition/input-schema#integer).
+> If you're interested in learning more about how the code works, and what the **INPUT_SCHEMA.json** means, read about [inputs](/sdk/js/docs/examples/accept-user-input) and [adding data to a dataset](/sdk/js/docs/examples/add-data-to-dataset) in the Apify SDK documentation, and refer to the [input schema docs](/platform/actors/development/actor-definition/input-schema/specification/v1#integer).
Finally, **Save** and **Build** the Actor just as you did in the previous lesson.
@@ -89,7 +89,7 @@ On the results tab, there are a whole lot of options for which format to view/do
There's our solution! Did it work for you as well? Now, we can download the data right from the results tab to be used elsewhere, or even programmatically retrieve it by using [Apify's API](/api/v2) (we'll be discussing how to do this in the next lesson).
-It's important to note that the default dataset of the Actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage#data-retention).
+It's important to note that the default dataset of the Actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage/usage#data-retention).
## Next up {#next}
diff --git a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md
index dd126c6d64..b901b6c46c 100644
--- a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md
+++ b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md
@@ -28,7 +28,7 @@ If the Actor being run via API takes 5 minutes or less to complete a typical run
> If you are unsure about the differences between an Actor and a task, you can read about them in the [tasks](/platform/actors/running/tasks) documentation. In brief, tasks are pre-configured inputs for Actors.
-The API endpoints and usage (for both sync and async) for [Actors](/api/v2#/reference/actors/run-collection/run-actor) and [tasks](/api/v2#/reference/actor-tasks/run-collection/run-task) are essentially the same.
+The API endpoints and usage (for both sync and async) for [Actors](/api/v2#tag/ActorsRun-collection/operation/act_runs_post) and [tasks](/api/v2#/reference/actor-tasks/run-collection/run-task) are essentially the same.
To run, or **call**, an Actor/task, you will need a few things:
diff --git a/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md b/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md
index 8ba3c521da..586a835b40 100644
--- a/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md
+++ b/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md
@@ -17,7 +17,7 @@ tutorial, great! You are ready to continue where we left off. If you haven't see
check it out, it will help you learn about Apify and scraping in general and set you up for this tutorial,
because this one builds on topics and code examples discussed there.
-## [](#getting-to-know-our-tools) Getting to know our tools
+## Getting to know our tools
In the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started) tutorial, we've confirmed that the scraper works as expected,
so now it's time to add more data to the results.
@@ -36,7 +36,7 @@ Now that's out of the way, let's open one of the Actor detail pages in the Store
> If you're wondering why we're using Web Scraper as an example instead of Cheerio Scraper,
it's only because we didn't want to triple the number of screenshots we needed to make. Lazy developers!
-## [](#building-our-page-function) Building our Page function
+## Building our Page function
Before we start, let's do a quick recap of the data we chose to scrape:
@@ -52,7 +52,7 @@ Before we start, let's do a quick recap of the data we chose to scrape:
We've already scraped numbers 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started)
tutorial, so let's get to the next one on the list: title.
-### [](#title) Title
+### Title

@@ -79,7 +79,7 @@ async function pageFunction(context) {
}
```
-### [](#description) Description
+### Description
Getting the Actor's description is a little more involved, but still pretty straightforward. We cannot search for a `
` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the Actor description is nested within
the `` element too, same as the title. Moreover, the actual description is nested inside a `` tag with a class `actor-description`.
@@ -97,7 +97,7 @@ async function pageFunction(context) {
}
```
-### [](#modified-date) Modified date
+### Modified date
The DevTools tell us that the `modifiedDate` can be found in a `` element.
@@ -125,7 +125,7 @@ But we would much rather see a readable date in our results, not a unix timestam
constructor will not accept a `string`, so we cast the `string` to a `number` using the `Number()` function before actually calling `new Date()`.
Phew!
-### [](#run-count) Run count
+### Run count
And so we're finishing up with the `runCount`. There's no specific element like ``, so we need to create
a complex selector and then do a transformation on the result.
@@ -164,7 +164,7 @@ using a regular expression, but its type is still a `string`, so we finally conv
>
> This will give us a string (e.g. `'1234567'`) that can be converted via `Number` function.
-### [](#wrapping-it-up) Wrapping it up
+### Wrapping it up
And there we have it! All the data we needed in a single object. For the sake of completeness, let's add
the properties we parsed from the URL earlier and we're good to go.
@@ -242,13 +242,13 @@ async function pageFunction(context) {
}
```
-### [](#test-run) Test run
+### Test run
As always, try hitting that **Save & Run** button and visit
the **Dataset** preview of clean items. You should see a nice table of all the attributes correctly scraped.
You nailed it!
-## [](#pagination) Pagination
+## Pagination
Pagination is a term that represents "going to the next page of results". You may have noticed that we did not
actually scrape all the Actors, just the first page of results. That's because to load the rest of the Actors,
@@ -264,7 +264,7 @@ with Cheerio? We don't have a browser to do it and we only have the HTML of the
answer is that we can't click a button. Does that mean that we cannot get the data at all? Usually not,
but it requires some clever DevTools-Fu.
-### [](#analyzing-the-page) Analyzing the page
+### Analyzing the page
While with Web Scraper and **Puppeteer Scraper** ([apify/puppeteer-scraper](https://apify.com/apify/puppeteer-scraper)), we could get away with clicking a button,
with Cheerio Scraper we need to dig a little deeper into the page's architecture. For this, we will use
@@ -280,7 +280,7 @@ Then we click the **Show more** button and wait for incoming requests to appear
Now, this is interesting. It seems that we've only received two images after clicking the button and no additional
data. This means that the data about Actors must already be available in the page and the **Show more** button only displays it. This is good news.
-### [](#finding-the-actors) Finding the Actors
+### Finding the Actors
Now that we know the information we seek is already in the page, we just need to find it. The first Actor in the store
is Web Scraper, so let's try using the search tool in the **Elements** tab to find some reference to it. The first
@@ -309,7 +309,7 @@ so you might already be wondering, can I make one request to the store to get th
and then parse it out and be done with it in a single request? Yes you can! And that's the power
of clever page analysis.
-### [](#using-the-data-to-enqueue-all-actor-details) Using the data to enqueue all Actor details
+### Using the data to enqueue all Actor details
We don't really need to go to all the Actor details now, but for the sake of practice, let's imagine we only found
Actor names such as `cheerio-scraper` and their owners, such as `apify` in the data. We will use this information
@@ -342,7 +342,7 @@ how to route those requests.
>If you're wondering how we know the structure of the URL, see the [Getting started
with Apify Scrapers](./getting_started.md) tutorial again.
-### [](#plugging-it-into-the-page-function) Plugging it into the Page function
+### Plugging it into the Page function
We've got the general algorithm ready, so all that's left is to integrate it into our earlier `pageFunction`.
Remember the `// Do some stuff later` comment? Let's replace it.
@@ -411,13 +411,13 @@ to get all results with Cheerio only and other times it takes hours of research.
the right scraper for your job. But don't get discouraged. Often times, the only thing you will ever need is to
define a correct Pseudo URL. Do your research first before giving up on Cheerio Scraper.
-## [](#downloading-our-scraped-data) Downloading the scraped data
+## Downloading the scraped data
You already know the **Dataset** tab of the run console since this is where we've always previewed our data. Notice the row of data formats such as JSON, CSV, and Excel. Below it are options for viewing and downloading the data. Go ahead and try it.
> If you prefer working with an API, you can find the example endpoint under the API tab: **Get dataset items**.
-### [](#clean-items) Clean items
+### Clean items
You can view and download your data without modifications, or you can choose to only get **clean** items. Data that aren't cleaned include a record
for each `pageFunction` invocation, even if you did not return any results. The record also includes hidden fields
@@ -427,7 +427,7 @@ Clean items, on the other hand, include only the data you returned from the `pag
To control this, open the **Advanced options** view on the **Dataset** tab.
-## [](#bonus-making-your-code-neater) Bonus: Making your code neater
+## Bonus: Making your code neater
You may have noticed that the `pageFunction` gets quite bulky. To make better sense of your code and have an easier
time maintaining or extending your task, feel free to define other functions inside the `pageFunction`
@@ -495,11 +495,11 @@ async function pageFunction(context) {
> If you're confused by the functions being declared below their executions, it's called hoisting and it's a feature
of JavaScript. It helps you put what matters on top, if you so desire.
-## [](#final-word) Final word
+## Final word
Thank you for reading this whole tutorial! Really! It's important to us that our users have the best information available to them so that they can use Apify and effectively. We're glad that you made it all the way here and congratulations on creating your first scraping task. We hope that you liked the tutorial and if there's anything you'd like to ask, [join us on Discord](https://discord.gg/jyEM2PRvMU)!
-## [](#whats-next) What's next
+## What's next
* Check out the [Apify SDK](https://docs.apify.com/sdk) and its [Getting started](https://docs.apify.com/sdk/js/docs/guides/apify-platform) tutorial if you'd like to try building your own Actors. It's a bit more complex and involved than writing a `pageFunction`, but it allows you to fine-tune all the details of your scraper to your liking.
* [Take a deep dive into Actors](/platform/actors), from how they work to [publishing](/platform/actors/publishing) them in Apify Store, and even [making money](https://blog.apify.com/make-regular-passive-income-developing-web-automation-actors-b0392278d085/) on Actors.
diff --git a/sources/academy/tutorials/apify_scrapers/getting_started.md b/sources/academy/tutorials/apify_scrapers/getting_started.md
index f8460e173d..9b05130eba 100644
--- a/sources/academy/tutorials/apify_scrapers/getting_started.md
+++ b/sources/academy/tutorials/apify_scrapers/getting_started.md
@@ -13,7 +13,7 @@ slug: /apify-scrapers/getting-started
Welcome to the getting started tutorial! It will walk you through creating your first scraping task step by step. You will learn how to set up all the different configuration options, code a **Page function** (`pageFunction`), and finally download the scraped data either as an Excel sheet or in another format, such as JSON or CSV. But first, let's give you a brief introduction to web scraping with Apify.
-## [](#what-is-an-apify-scraper) What is an Apify scraper
+## What is an Apify scraper
It doesn't matter whether you arrived here from **Web Scraper** ([apify/web-scraper](https://apify.com/apify/web-scraper)), **Puppeteer Scraper** ([apify/puppeteer-scraper](https://apify.com/apify/puppeteer-scraper)) or **Cheerio Scraper** ([apify/cheerio-scraper](https://apify.com/apify/cheerio-scraper)). All of them are **Actors** and for now, let's think of an **Actor** as an application that you can use with your own configuration. **apify/web-scraper** is therefore an application called **web-scraper**, built by **apify**, that you can configure to scrape any webpage. We call these configurations **tasks**.
@@ -21,7 +21,7 @@ It doesn't matter whether you arrived here from **Web Scraper** ([apify/web-scra
You can create 10 different **tasks** for 10 different websites, with very different options, but there will always be just one **Actor**, the `apify/*-scraper` you chose. This is the essence of tasks. They are nothing but **saved configurations** of the Actor that you can run repeatedly.
-## [](#trying-it-out) Trying it out
+## Trying it out
Depending on how you arrived at this tutorial, you may already have your first task created for the scraper of your choice. If not, the easiest way is to go to [Apify Store](https://console.apify.com/actors#/store/) and select the Actor you want to base your task on. Then, click the **Create a new task** button in the top-right corner.
@@ -29,7 +29,7 @@ Depending on how you arrived at this tutorial, you may already have your first t

-### [](#running-a-task) Running a task
+### Running a task
This takes you to the **Input and options** tab of the task configuration. Before we delve into the details, let's see how the example works. You can see that there are already some pre-configured input values. It says that the task should visit **https://apify.com** and all its subpages, such as **https://apify.com/contact** and scrape some data using the provided `pageFunction`, specifically the `` of the page and its URL.
@@ -39,7 +39,7 @@ Scroll down to the **Performance and limits** section and set the **Max pages pe
Now click **Save & Run**! *(in the bottom-left part of your screen)*
-### [](#the-run-detail) The run detail
+### The run detail
After clicking **Save & Run**, the window will change to the run detail. Here, you will see the run's log. If it seems that nothing is happening, don't worry, it takes a few seconds for the run to fully boot up. In under a minute, you should have the 10 pages scraped. You will know that the run successfully completed when the `RUNNING` card in top-left corner changes to `SUCCEEDED`.
@@ -51,13 +51,13 @@ Now that the run has `SUCCEEDED`, click on the glowing **Results** card to see t
Good job! We've run our first task and got some results. Let's learn how to change the default configuration to scrape something more interesting than the page's ``.
-## [](#creating-your-own-task) Creating your own task
+## Creating your own task
Before we jump into the scraping itself, let's have a quick look at the user interface that's available to us. Click on the task's name in the top-left corner to visit the task's configuration.

-### [](#input) Input and options
+### Input and options
The **Input** tab is where we started and it's the place where you create your scraping configuration. The Actor's creator prepares the **Input** form so that you can tell the Actor what to do. Feel free to check the tooltips of the various options to get a better idea of what they do. To display the tooltip, click the question mark next to each input field's name.
@@ -67,33 +67,33 @@ Below the input fields are the Build, Timeout and Memory options. Let's keep the
> Timeouts are there to prevent tasks from running forever. Always set a reasonable timeout to prevent a rogue task from eating up all your compute units.
-### [](#settings) Settings
+### Settings
In the settings tab, you can set options that are common to all tasks and not directly related to the Actor's purpose. Unless you've already changed the task's name, it's set to **my-task**, so why not try changing it to **my-first-scraper** and clicking **Save**.
-### [](#runs) Runs
+### Runs
You can find all the task runs and their detail pages here. Every time you start a task, it will appear here in the list. Apify securely stores your ten most recent runs indefinitely, ensuring your records are always accessible. All of your task's runs and their outcomes, beyond the latest ten, will be stored here for the data retention period, [which you can find under your plan](https://apify.com/pricing).
-### [](#webhooks) Webhooks
+### Webhooks
Webhooks are a feature that help keep you aware of what's happening with your tasks. You can set them up to inform you when a task starts, finishes, fails etc., or you can even use them to run more tasks, depending on the outcome of the original one. [See webhooks documentation](/platform/integrations/webhooks).
-### [](#readme) Information
+### Information
Since tasks are configurations for Actors, this tab shows you all the information about the underlying Actor, the Apify scraper of your choice. You can see the available versions and their READMEs - it's always a good idea to read an Actor's README first before creating a task for it.
-### [](#api) API
+### API
The API tab gives you a quick overview of all the available API calls in case you would like to use your task programmatically. It also includes links to detailed API documentation. You can even try it out immediately using the **Test endpoint** button.
> Never share a URL containing the authentication token (`?token=...` parameter in the URLs), as this will compromise your account's security.
-## [](#scraping-theory) Scraping theory
+## Scraping theory
Since this is a tutorial, we'll be scraping our own website. [Apify Store](https://apify.com/store) is a great candidate for some scraping practice. It's a page built on popular technologies, which displays a lot of different items in various categories, just like an online store, a typical scraping target, would.
-### [](#the-goal) The goal
+### The goal
We want to create a scraper that scrapes all the Actors in the store and collects the following attributes for each Actor:
@@ -106,9 +106,9 @@ We want to create a scraper that scrapes all the Actors in the store and collect
Some of this information may be scraped directly from the listing pages, but for the rest, we will need to visit the detail pages of all the Actors.
-### [](#the-start-url) The start URL
+### The start URL
-In the **Input** tab of the task we have, we'll change the **Start URL** from **https://apify.com**. This will tell the scraper to start by opening a different URL. You can add more **Start URL**s or even [use a file with a list of thousands of them](#-crawling-the-website-with-pseudo-urls), but in this case, we'll be good with just one.
+In the **Input** tab of the task we have, we'll change the **Start URL** from **https://apify.com**. This will tell the scraper to start by opening a different URL. You can add more **Start URL**s or even [use a file with a list of thousands of them](#crawling-the-website-with-pseudo-urls), but in this case, we'll be good with just one.
How do we choose the new **Start URL**? The goal is to scrape all Actors in the store, which is available at [apify.com/store](https://apify.com/store), so we choose this URL as our **Start URL**.
@@ -124,7 +124,7 @@ We also need to somehow distinguish the **Start URL** from all the other URLs th
}
```
-### [](#filtering-with-a-link-selector) Filtering with a Link selector
+### Filtering with a Link selector
The **Link selector**, together with **Pseudo URL**s, are your URL matching arsenal. The Link selector is a CSS selector and its purpose is to select the HTML elements where the scraper should look for URLs. And by looking for URLs, we mean finding the elements' `href` attributes. For example, to enqueue URLs from `` tags, we would enter `'div.my-class'`.
@@ -138,7 +138,7 @@ div.item > a
Save it as your **Link selector**. If you're wondering how we figured this out, follow along with the tutorial. By the time we finish, you'll know why we used this selector, too.
-### [](#crawling-the-website-with-pseudo-url) Crawling the website with pseudo URLs
+### Crawling the website with pseudo URLs
What is a **Pseudo URL**? Let us explain. Before we can start scraping the Actor details, we need to find all the links to the details. If the links follow a set structure, we can use a certain pattern to describe this structure. And that's what a **Pseudo URL** is. A pattern that describes a URL structure. By setting a **Pseudo URL**, all links that follow the given structure will automatically be added to the crawling queue.
@@ -188,15 +188,15 @@ Let's use the above **Pseudo URL** in our task. We should also add a label as we
}
```
-### [](#test-run) Test run
+### Test run
Now that we've added some configuration, it's time to test it. Run the task, keeping the **Max pages per run** set to `10` and the `pageFunction` as it is. You should see in the log that the scraper first visits the **Start URL** and then several of the Actor details matching the **Pseudo URL**.
-## [](#the-page-function) The page function
+## The page function
The `pageFunction` is a JavaScript function that gets executed for each page the scraper visits. To figure out how to create it, you must first inspect the page's structure to get an idea of its inner workings. The best tools for that are a browser's inbuilt developer tools - DevTools.
-### [](#using-devtools) Using DevTools
+### Using DevTools
Open [Apify Store](https://apify.com/store) in the Chrome browser (or use any other browser, just note that the DevTools may differ slightly) and open the DevTools, either by right-clicking on the page and selecting **Inspect** or by pressing **F12**.
@@ -208,11 +208,11 @@ You'll see that the Element tab jumps to the first `
` element of the curr
> For the sake of brevity, we won't go into the details of using the DevTools in this tutorial. If you're just starting out with DevTools, this [Google tutorial](https://developer.chrome.com/docs/devtools/) is a good place to begin.
-### [](#understanding-context) Understanding `context`
+### Understanding `context`
The `pageFunction` has access to global variables such as `window` and `document`, which are provided by the browser, as well as to `context`, which is the `pageFunction`'s single argument. `context` carries a lot of useful information and helpful functions, which are described in the Actor's README.
-### [](#new-page-function-boilerplate) New page function boilerplate
+### New page function boilerplate
We know that we'll visit two kinds of pages, the list page (**Start URL**) and the detail pages (enqueued using the **Pseudo URL**). We want to enqueue links on the list page and scrape data on the detail page.
@@ -238,19 +238,19 @@ async function pageFunction(context) {
This may seem like a lot of new information, but it's all connected to our earlier configuration.
-### [](#context-request) `context.request`
+### `context.request`
The `request` is an instance of the [`Request`](https://sdk.apify.com/docs/api/request) class and holds information about the currently processed page, such as its `url`. Each `request` also has the `request.userData` property of type `Object`. While configuring the **Start URL** and the **Pseudo URL**, we gave them a `label`. We're now using them in the `pageFunction` to distinguish between the store page and the detail pages.
-### [](#context-skip-links) `context.skipLinks()`
+### `context.skipLinks()`
When a **Pseudo URL** is set, the scraper attempts to enqueue matching links on each page it visits. `skipLinks()` is used to tell the scraper that we don't want this to happen on the current page.
-### [](#context-log) `context.log`
+### `context.log`
`log` is used for printing messages to the console. You may be tempted to use `console.log()`, but this will not work unless you turn on the **Browser log** option. `log.info()` should be used for general messages, but you can also use `log.debug()` for messages that will only be shown when you turn on the **Debug log** option. [See the docs for more info](https://sdk.apify.com/docs/api/log).
-### [](#the-page-functions-return-value) The page function's return value
+### The page function's return value
The `pageFunction` may only return nothing, `null`, `Object` or `Object[]`. If an `Object` is returned, it will be saved as a single result. Returning an `Array` of `Objects` will save each item in the array as a result.
@@ -272,7 +272,7 @@ will produce the following table:
| ----- | --- |
| Web Scraping, Data Extraction and Automation - Apify | https://apify.com |
-## [](#scraper-lifecycle) Scraper lifecycle
+## Scraper lifecycle
Now that we're familiar with all the pieces in the puzzle, we'll quickly take a look at the scraper lifecycle,
or in other words, what the scraper actually does when it scrapes. It's quite straightforward.
@@ -288,7 +288,7 @@ The scraper:
> When you're not using the request queue, the scraper repeats steps 1 and 2. You would not use the request queue when you already know all the URLs you want to visit. For example, when you have a pre-existing list of a thousand URLs that you uploaded as a text file. Or when scraping a single URL.
-## [](#scraping-practice) Scraping practice
+## Scraping practice
We've covered all the concepts that we need to understand to successfully scrape the data in our goal, so let's get to it. We will only output data that are already available to us in the page's URL. Remember from [our goal](#the-goal) that we also want to include the **URL** and a **Unique identifier** in our results. To get those, we need the `request.url`, because it is the URL and includes the Unique identifier.
@@ -297,7 +297,7 @@ const { url } = request;
const uniqueIdentifier = url.split('/').slice(-2).join('/');
```
-### [](#test-run-2) Test run 2
+### Test run 2
We'll add our first data to the `pageFunction` and carry out a test run to see that everything works as expected.
@@ -329,7 +329,7 @@ async function pageFunction(context) {
Now **Save & Run** the task and once it finishes, check the dataset by clicking on the **Results** card. Click **Preview** and you should see the URLs and unique identifiers scraped. Great job!
-## [](#choosing-sides) Choosing sides
+## Choosing sides
Up until now, everything has been the same for all the Apify scrapers. Whether you're using Web Scraper,
Puppeteer Scraper or Cheerio Scraper, what you've learned now will always be the same.
diff --git a/sources/academy/tutorials/apify_scrapers/index.md b/sources/academy/tutorials/apify_scrapers/index.md
index c146e2fd71..9345092e12 100644
--- a/sources/academy/tutorials/apify_scrapers/index.md
+++ b/sources/academy/tutorials/apify_scrapers/index.md
@@ -17,7 +17,7 @@ Don't let the number of options confuse you. Unless you're really sure you need
[Visit the Scraper introduction tutorial to get started!](./getting_started.md)
-## [](#web-scraper)Web Scraper
+## Web Scraper
Web Scraper is a ready-made solution for scraping the web using the Chrome browser. It takes away all the work necessary to set up a browser for crawling, controls the browser automatically and produces machine-readable results in several common formats.
@@ -25,7 +25,7 @@ Underneath, it uses the Puppeteer library to control the browser, but you don't
[Visit the Web Scraper tutorial to get started!](./web_scraper.md)
-## [](#cheerio-scraper)Cheerio Scraper
+## Cheerio Scraper
Cheerio Scraper is a ready-made solution for crawling the web using plain HTTP requests to retrieve HTML pages and then parsing and inspecting the HTML using the [cheerio](https://www.npmjs.com/package/cheerio) library. It's blazing fast.
@@ -35,7 +35,7 @@ Cheerio Scraper is ideal for scraping websites that do not rely on client-side J
[Visit the Cheerio Scraper tutorial to get started!](./cheerio_scraper.md)
-## [](#puppeteer-scraper)Puppeteer Scraper
+## Puppeteer Scraper
Puppeteer Scraper is the most powerful scraper tool in our arsenal (aside from developing your own Actors). It uses the Puppeteer library to programmatically control a headless Chrome browser, and it can make it do almost anything. If using Web Scraper does not cut it, Puppeteer Scraper is what you need.
diff --git a/sources/academy/tutorials/apify_scrapers/puppeteer_scraper.md b/sources/academy/tutorials/apify_scrapers/puppeteer_scraper.md
index 1ac1a1f5db..130713691a 100644
--- a/sources/academy/tutorials/apify_scrapers/puppeteer_scraper.md
+++ b/sources/academy/tutorials/apify_scrapers/puppeteer_scraper.md
@@ -17,7 +17,7 @@ tutorial, great! You are ready to continue where we left off. If you haven't see
check it out, it will help you learn about Apify and scraping in general and set you up for this tutorial,
because this one builds on topics and code examples discussed there.
-## [](#getting-to-know-our-tools) Getting to know our tools
+## Getting to know our tools
In the [Getting started with Apify scrapers](https://docs.apify.com/academy/apify-scrapers/getting-started) tutorial, we've confirmed that the scraper works as expected,
so now it's time to add more data to the results.
@@ -33,7 +33,7 @@ you'll need to visit its [documentation](https://pptr.dev/) and really dive deep
it in a nice, manageable UI. It provides almost all of its features in a format that is much easier to grasp
when first trying to scrape using Puppeteer.
-### [](#web-scraper-differences) Web Scraper differences
+### Web Scraper differences
At first glance, it may seem like **Web Scraper** ([apify/web-scraper](https://apify.com/apify/web-scraper)) and Puppeteer Scraper are almost the same. Well, they are.
In fact, Web Scraper uses Puppeteer underneath. The difference is the amount of control they give you.
@@ -51,7 +51,7 @@ Now that's out of the way, let's open one of the Actor detail pages in the Store
> If you're wondering why we're using Web Scraper as an example instead of Puppeteer Scraper,
it's only because we didn't want to triple the number of screenshots we needed to make. Lazy developers!
-## [](#building-our-page-function) Building our Page function
+## Building our Page function
Before we start, let's do a quick recap of the data we chose to scrape:
@@ -67,7 +67,7 @@ Before we start, let's do a quick recap of the data we chose to scrape:
We've already scraped numbers 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started)
tutorial, so let's get to the next one on the list: title.
-### [](#title) Title
+### Title

@@ -103,7 +103,7 @@ function allows you to run a function in the browser, with the selected element
Here we use it to extract the text content of a `h1` element that's in the page. The return value of the function
is automatically passed back to the Node.js context, so we receive an actual `string` with the element's text.
-### [](#description) Description
+### Description
Getting the Actor's description is a little more involved, but still pretty straightforward. We cannot search for a `` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the Actor description is nested within
the `` element too, same as the title. Moreover, the actual description is nested inside a `` tag with a class `actor-description`.
@@ -129,7 +129,7 @@ async function pageFunction(context) {
}
```
-### [](#modified-date) Modified date
+### Modified date
The DevTools tell us that the `modifiedDate` can be found in a `` element.
@@ -172,7 +172,7 @@ But we would much rather see a readable date in our results, not a unix timestam
constructor will not accept a `string`, so we cast the `string` to a `number` using the `Number()` function before actually calling `new Date()`.
Phew!
-### [](#run-count) Run count
+### Run count
And so we're finishing up with the `runCount`. There's no specific element like ``, so we need to create
a complex selector and then do a transformation on the result.
@@ -222,7 +222,7 @@ using a regular expression, but its type is still a `string`, so we finally conv
>
> This will give us a string (e.g. `'1234567'`) that can be converted via `Number` function.
-### [](#wrapping-it-up) Wrapping it up
+### Wrapping it up
And there we have it! All the data we needed in a single object. For the sake of completeness, let's add
the properties we parsed from the URL earlier and we're good to go.
@@ -344,13 +344,13 @@ all the functions to start at the same time and only wait for all of them to fin
concurrency or parallelism. Unless the functions need to be executed in a specific order, it's often a good idea
to run them concurrently to speed things up.
-### [](#test-run) Test run
+### Test run
As always, try hitting that **Save & Run** button and visit
the **Dataset** preview of clean items. You should see a nice table of all the attributes correctly scraped.
You nailed it!
-## [](#pagination) Pagination
+## Pagination
Pagination is a term that represents "going to the next page of results". You may have noticed that we did not
actually scrape all the Actors, just the first page of results. That's because to load the rest of the Actors,
@@ -360,7 +360,7 @@ one needs to click the **Show more** button at the very bottom of the list. This
that take you to the next page. If you encounter those, make a **Pseudo URL** for those links and they will
be automatically enqueued to the request queue. Use a label to let the scraper know what kind of URL it's processing.
-### [](#waiting-for-dynamic-content) Waiting for dynamic content
+### Waiting for dynamic content
Before we talk about paginating, we need to have a quick look at dynamic content. Since Apify Store is a JavaScript
application (a popular approach), the button might not exist in the page when the scraper runs the `pageFunction`.
@@ -404,7 +404,7 @@ await page.waitFor('.bad-class', { timeout: 5000 });
With those tools, you should be able to handle any dynamic content the website throws at you.
-### [](#how-to-paginate) How to paginate
+### How to paginate
After going through the theory, let's design the algorithm:
@@ -488,7 +488,7 @@ already loaded and we're waiting for the page to re-render so waiting for `2` se
that the button is not there. We don't want to stall the scraper for `30` seconds just to make sure that there's
no button.
-### [](#pagination-page-function) Plugging it into the Page function
+### Plugging it into the Page function
We've got the general algorithm ready, so all that's left is to integrate it into our earlier `pageFunction`.
Remember the `// Do some stuff later` comment? Let's replace it.
@@ -581,13 +581,13 @@ it's probably just a typo.

-## [](#downloading-our-scraped-data) Downloading the scraped data
+## Downloading the scraped data
You already know the **Dataset** tab of the run console since this is where we've always previewed our data. Notice the row of data formats such as JSON, CSV, and Excel. Below it are options for viewing and downloading the data. Go ahead and try it.
> If you prefer working with an API, you can find the example endpoint under the API tab: **Get dataset items**.
-### [](#clean-items) Clean items
+### Clean items
You can view and download your data without modifications, or you can choose to only get **clean** items. Data that aren't cleaned include a record
for each `pageFunction` invocation, even if you did not return any results. The record also includes hidden fields
@@ -597,7 +597,7 @@ Clean items, on the other hand, include only the data you returned from the `pag
To control this, open the **Advanced options** view on the **Dataset** tab.
-## [](#bonus-making-your-code-neater) Bonus: Making your code neater
+## Bonus: Making your code neater
You may have noticed that the `pageFunction` gets quite bulky. To make better sense of your code and have an easier
time maintaining or extending your task, feel free to define other functions inside the `pageFunction`
@@ -697,13 +697,13 @@ async function pageFunction(context) {
> If you're confused by the functions being declared below their executions, it's called hoisting and it's a feature
of JavaScript. It helps you put what matters on top, if you so desire.
-## [](#bonus-2-using-jquery-with-puppeteer-scraper) Bonus 2: Using jQuery with Puppeteer Scraper
+## Bonus 2: Using jQuery with Puppeteer Scraper
If you're familiar with the [jQuery library](https://jquery.com/), you may have looked at the scraping code and thought
that it's unnecessarily complicated. That's probably up to everyone to decide on their own, but the good news is,
you can use jQuery with Puppeteer Scraper too.
-### [](#injecting-jquery) Injecting jQuery
+### Injecting jQuery
To be able to use jQuery, we first need to introduce it to the browser. The [`Apify.utils.puppeteer.injectJQuery`](https://sdk.apify.com/docs/api/puppeteer#puppeteerinjectjquerypage) function will help us with the task.
@@ -815,11 +815,11 @@ async function pageFunction(context) {
injecting it outside of the browser. We're using the [`page.evaluate()`](https://pptr.dev/#?product=Puppeteer&show=api-pageevaluatepagefunction-args)
function to run the script in the context of the browser and the return value is passed back to Node.js. Keep this in mind.
-## [](#final-word) Final word
+## Final word
Thank you for reading this whole tutorial! Really! It's important to us that our users have the best information available to them so that they can use Apify effectively. We're glad that you made it all the way here and congratulations on creating your first scraping task. We hope that you liked the tutorial and if there's anything you'd like to ask, [join us on Discord](https://discord.gg/jyEM2PRvMU)!
-## [](#whats-next) What's next?
+## What's next
- Check out the [Apify SDK](https://docs.apify.com/sdk) and its [Getting started](https://docs.apify.com/sdk/js/docs/guides/apify-platform) tutorial if you'd like to try building your own Actors. It's a bit more complex and involved than writing a `pageFunction`, but it allows you to fine-tune all the details of your scraper to your liking.
- [Take a deep dive into Actors](/platform/actors), from how they work to [publishing](/platform/actors/publishing) them in Apify Store, and even [making money](https://blog.apify.com/make-regular-passive-income-developing-web-automation-actors-b0392278d085/) on Actors.
diff --git a/sources/academy/tutorials/apify_scrapers/web_scraper.md b/sources/academy/tutorials/apify_scrapers/web_scraper.md
index 4610619fc2..3b468e198f 100644
--- a/sources/academy/tutorials/apify_scrapers/web_scraper.md
+++ b/sources/academy/tutorials/apify_scrapers/web_scraper.md
@@ -18,7 +18,7 @@ tutorial, great! You are ready to continue where we left off. If you haven't see
check it out, it will help you learn about Apify and scraping in general and set you up for this tutorial,
because this one builds on topics and code examples discussed there.
-## [](#getting-to-know-our-tools) Getting to know our tools
+## Getting to know our tools
In the [Getting started with Apify scrapers](https://docs.apify.com/academy/apify-scrapers/getting-started) tutorial,
we've confirmed that the scraper works as expected, so now it's time to add more data to the results.
@@ -34,7 +34,7 @@ This will add a `context.jQuery` function that you can use.
Now that's out of the way, let's open one of the Actor detail pages in the Store, for example
the [Web Scraper](https://apify.com/apify/web-scraper) page and use our DevTools-Fu to scrape some data.
-## [](#building-our-page-function) Building our Page function
+## Building our Page function
Before we start, let's do a quick recap of the data we chose to scrape:
@@ -50,7 +50,7 @@ Before we start, let's do a quick recap of the data we chose to scrape:
We've already scraped numbers 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started)
tutorial, so let's get to the next one on the list: title.
-### [](#title) Title
+### Title

@@ -78,7 +78,7 @@ async function pageFunction(context) {
}
```
-### [](#description) Description
+### Description
Getting the Actor's description is a little more involved, but still pretty straightforward. We cannot search for a `` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the Actor description is nested within
the `` element too, same as the title. Moreover, the actual description is nested inside a `` tag with a class `actor-description`.
@@ -97,7 +97,7 @@ async function pageFunction(context) {
}
```
-### [](#modified-date) Modified date
+### Modified date
The DevTools tell us that the `modifiedDate` can be found in a `` element.
@@ -126,7 +126,7 @@ But we would much rather see a readable date in our results, not a unix timestam
constructor will not accept a `string`, so we cast the `string` to a `number` using the `Number()` function before actually calling `new Date()`.
Phew!
-### [](#run-count) Run count
+### Run count
And so we're finishing up with the `runCount`. There's no specific element like ``, so we need to create
a complex selector and then do a transformation on the result.
@@ -166,7 +166,7 @@ using a regular expression, but its type is still a `string`, so we finally conv
>
> This will give us a string (e.g. `'1234567'`) that can be converted via `Number` function.
-### [](#wrapping-it-up) Wrapping it up
+### Wrapping it up
And there we have it! All the data we needed in a single object. For the sake of completeness, let's add
the properties we parsed from the URL earlier and we're good to go.
@@ -243,13 +243,13 @@ async function pageFunction(context) {
}
```
-### [](#test-run) Test run
+### Test run
As always, try hitting that **Save & Run** button and visit
the **Dataset** preview of clean items. You should see a nice table of all the attributes correctly scraped.
You nailed it!
-## [](#pagination) Pagination
+## Pagination
Pagination is a term that represents "going to the next page of results". You may have noticed that we did not
actually scrape all the Actors, just the first page of results. That's because to load the rest of the Actors,
@@ -259,7 +259,7 @@ one needs to click the **Show more** button at the very bottom of the list. This
that take you to the next page. If you encounter those, make a **Pseudo URL** for those links and they will
be automatically enqueued to the request queue. Use a label to let the scraper know what kind of URL it's processing.
-### [](#waiting-for-dynamic-content) Waiting for dynamic content
+### Waiting for dynamic content
Before we talk about paginating, we need to have a quick look at dynamic content. Since Apify Store is a JavaScript
application (a popular approach), the button might not exist in the page when the scraper runs the `pageFunction`.
@@ -300,7 +300,7 @@ await waitFor('.bad-class', { timeoutMillis: 5000 });
With those tools, you should be able to handle any dynamic content the website throws at you.
-### [](#how-to-paginate) How to paginate
+### How to paginate
After going through the theory, let's design the algorithm:
@@ -382,7 +382,7 @@ already loaded and we're waiting for the page to re-render so waiting for `2` se
that the button is not there. We don't want to stall the scraper for `20` seconds just to make sure that there's
no button.
-### [](#plugging-it-into-the-page-function) Plugging it into the pageFunction
+### Plugging it into the pageFunction
We've got the general algorithm ready, so all that's left is to integrate it into our earlier `pageFunction`.
Remember the `// Do some stuff later` comment? Let's replace it. And don't forget to destructure the `waitFor()`
@@ -457,13 +457,13 @@ it's probably just a typo.

-## [](#downloading-our-scraped-data) Downloading the scraped data
+## Downloading the scraped data
You already know the **Dataset** tab of the run console since this is where we've always previewed our data. Notice the row of data formats such as JSON, CSV, and Excel. Below it are options for viewing and downloading the data. Go ahead and try it.
> If you prefer working with an API, you can find the example endpoint under the API tab: **Get dataset items**.
-### [](#clean-items) Clean items
+### Clean items
You can view and download your data without modifications, or you can choose to only get **clean** items. Data that aren't cleaned include a record
for each `pageFunction` invocation, even if you did not return any results. The record also includes hidden fields
@@ -473,7 +473,7 @@ Clean items, on the other hand, include only the data you returned from the `pag
To control this, open the **Advanced options** view on the **Dataset** tab.
-## [](#bonus-making-your-code-neater) Bonus: Making your code neater
+## Bonus: Making your code neater
You may have noticed that the `pageFunction` gets quite bulky. To make better sense of your code and have an easier
time maintaining or extending your task, feel free to define other functions inside the `pageFunction`
@@ -549,11 +549,11 @@ async function pageFunction(context) {
> If you're confused by the functions being declared below their executions, it's called hoisting and it's a feature
of JavaScript. It helps you put what matters on top, if you so desire.
-## [](#final-word) Final word
+## Final word
Thank you for reading this whole tutorial! Really! It's important to us that our users have the best information available to them so that they can use Apify effectively. We're glad that you made it all the way here and congratulations on creating your first scraping task. We hope that you liked the tutorial and if there's anything you'd like to ask, [join us on Discord](https://discord.gg/jyEM2PRvMU)!
-## [](#whats-next) What's next?
+## What's next
- Check out the [Apify SDK](https://docs.apify.com/sdk) and its [Getting started](https://docs.apify.com/sdk/js/docs/guides/apify-platform) tutorial if you'd like to try building your own Actors. It's a bit more complex and involved than writing a `pageFunction`, but it allows you to fine-tune all the details of your scraper to your liking.
- [Take a deep dive into Actors](/platform/actors), from how they work to [publishing](/platform/actors/publishing) them in Apify Store, and even [making money](https://blog.apify.com/make-regular-passive-income-developing-web-automation-actors-b0392278d085/) on Actors.
diff --git a/sources/academy/tutorials/node_js/analyzing_pages_and_fixing_errors.md b/sources/academy/tutorials/node_js/analyzing_pages_and_fixing_errors.md
index ed1d2fcd44..892a3dd59b 100644
--- a/sources/academy/tutorials/node_js/analyzing_pages_and_fixing_errors.md
+++ b/sources/academy/tutorials/node_js/analyzing_pages_and_fixing_errors.md
@@ -127,7 +127,7 @@ Logging and snapshotting are great tools but once you reach a certain run size,
## With the Apify SDK {#with-the-apify-sdk}
-This example extends our snapshot solution above by creating a [named dataset](/platform/storage#named-and-unnamed-storages) (named datasets have infinite retention), where we will accumulate error reports. Those reports will explain what happened and will link to a saved snapshot, so we can do a quick visual check.
+This example extends our snapshot solution above by creating a [named dataset](/platform/storage/usage#named-and-unnamed-storages) (named datasets have infinite retention), where we will accumulate error reports. Those reports will explain what happened and will link to a saved snapshot, so we can do a quick visual check.
```js
import { Actor } from 'apify';
diff --git a/sources/academy/tutorials/node_js/filter_blocked_requests_using_sessions.md b/sources/academy/tutorials/node_js/filter_blocked_requests_using_sessions.md
index 56a82f1865..bf0fc4b30a 100644
--- a/sources/academy/tutorials/node_js/filter_blocked_requests_using_sessions.md
+++ b/sources/academy/tutorials/node_js/filter_blocked_requests_using_sessions.md
@@ -21,13 +21,13 @@ You want to crawl a website with a proxy pool, but most of your proxies are bloc
5. The proxies actually got banned before anyone used them to crawl the website because they use anti-bot protection that bans proxies across websites (e.g. Cloudflare).
-Nobody can make sure that a proxy will work infinitely. The only real solution to this problem is to use [residential proxies](/platform/proxy#residential-proxy), but they can sometimes be too costly.
+Nobody can make sure that a proxy will work infinitely. The only real solution to this problem is to use [residential proxies](/platform/proxy/residential-proxy), but they can sometimes be too costly.
However, usually, at least some of our proxies work. To crawl successfully, it is therefore imperative to handle blocked requests properly. You first need to discover that you are blocked, which usually means that either your request returned status greater or equal to 400 (it didn't return the proper response) or that the page displayed a captcha. To ensure that this bad request is retried, you usually throw an error and it gets automatically retried later (our [SDK](/sdk/js/) handles this for you). Check out [this article](https://docs.apify.com/academy/node-js/handle-blocked-requests-puppeteer) as inspiration for how to handle this situation with `PuppeteerCrawler`Â class.
### Solution
-Now we are able to retry bad requests and eventually unless all of our proxies get banned, we should be able to successfully crawl what we want. The problem is that it takes too long and our log is full of errors. Fortunately, we can overcome this with [proxy sessions](/platform/proxy#datacenter-proxy--username-params) (look at the proxy and SDK documentation for how to use them in your Actors.)
+Now we are able to retry bad requests and eventually unless all of our proxies get banned, we should be able to successfully crawl what we want. The problem is that it takes too long and our log is full of errors. Fortunately, we can overcome this with [proxy sessions](/platform/proxy/datacenter-proxy#username-parameters) (look at the proxy and SDK documentation for how to use them in your Actors.)
First we define `sessions` Â object at the top of our code (in global scope) to hold the state of our working sessions.
diff --git a/sources/academy/tutorials/php/using_apify_from_php.md b/sources/academy/tutorials/php/using_apify_from_php.md
index b6ea6425ca..82809ee73f 100644
--- a/sources/academy/tutorials/php/using_apify_from_php.md
+++ b/sources/academy/tutorials/php/using_apify_from_php.md
@@ -84,7 +84,7 @@ echo \json_encode($data, JSON_PRETTY_PRINT);
You should see information about the run, including its ID and the ID of its default [dataset](/platform/storage/dataset). Take note of these, we will need them later.
-## [](#getting-dataset) Getting the results from dataset
+## Getting the results from dataset
Actors usually store their output in a default dataset. The [Actor runs endpoint](/api/v2#/reference/actor-runs) lets you get overall info about an Actor run's default dataset.
@@ -126,7 +126,7 @@ echo \json_encode($parsedResponse, JSON_PRETTY_PRINT);
All the available parameters are described in [our API reference](/api/v2#/reference/datasets/item-collection/get-items) and work both for all datasets.
-## [](#getting-key-value-store) Getting the results from key-value stores
+## Getting the results from key-value stores
Datasets are great for structured data, but are not suited for binary files like images or PDFs. In these cases, Actors store their output in [key-value stores](/platform/storage/key-value-store). One such Actor is the **HTML String To PDF** ([mhamas/html-string-to-pdf](https://apify.com/mhamas/html-string-to-pdf)) converter. Let's run it.
@@ -179,7 +179,7 @@ If you open the generated `hello-world.pdf` file, you should see... well, "Hello
If the Actor stored the data in a key-value store other than the default, we can use the standalone endpoints, `key-value-stores/`, `key-value-stores//keys`, and `key-value-stores//records/`. They behave the same way as the default endpoints. [See the full docs](https://docs.apify.com/api/v2#/reference/key-value-stores/store-object).
-## When are the data ready?
+## When are the data ready
It takes some time for an Actor to generate its output. Some even have Actors that run for days! In the previous examples, we chose Actors whose runs only take a few seconds. This meant the runs had enough time to finish before we ran the code to retrieve their dataset or key-value store (so the Actor had time to produce some output). If we ran the code immediately after starting a longer-running Actor, the dataset would probably still be empty.
diff --git a/sources/academy/webscraping/scraping_basics_javascript/data_extraction/using_devtools.md b/sources/academy/webscraping/scraping_basics_javascript/data_extraction/using_devtools.md
index 85e3d719cd..d486638243 100644
--- a/sources/academy/webscraping/scraping_basics_javascript/data_extraction/using_devtools.md
+++ b/sources/academy/webscraping/scraping_basics_javascript/data_extraction/using_devtools.md
@@ -114,7 +114,7 @@ As you can see, we were able to extract information about the subwoofer, but the
### Finding child elements {#finding-child-elements}
-In the [Getting structured data from HTML](#getting-structured-data-from-html) section, we were browsing the elements in the **Elements** tab to find the element that contains all the data. We can use the same approach to find the individual data points as well.
+In the [Getting structured data from HTML](#getting-structured-data) section, we were browsing the elements in the **Elements** tab to find the element that contains all the data. We can use the same approach to find the individual data points as well.
Start from the element that contains all data: `` Then inspect all the elements nested within this element. You'll discover that:
diff --git a/sources/platform/actors/development/builds_and_runs/builds.md b/sources/platform/actors/development/builds_and_runs/builds.md
index 5d90bcb939..e488feb8bd 100644
--- a/sources/platform/actors/development/builds_and_runs/builds.md
+++ b/sources/platform/actors/development/builds_and_runs/builds.md
@@ -5,7 +5,7 @@ sidebar_position: 7
slug: /actors/development/builds-and-runs/builds
---
-# [](#builds)Builds
+# Builds
**Learn Apify's conventions for actor-build number and how to use a specific Actor version in a run. Understand an Actor's lifecycle and manage its cache.**
@@ -17,18 +17,18 @@ Each build is assigned a unique build number of the form **MAJOR\.MINOR\.BUILD**
By default, the build has a timeout of 300 seconds and consumes 4096 MB (2048 MB on the free plan) of memory from the user's memory limit. See the [Resource limits](../../running/index.md) section for more details.
-## [](#versioning)Versioning
+## Versioning
In order to enable active development, the Actor can have multiple versions of the source code and associated settings, such as the **Base image** and **Environment**. Each version is denoted by a version number of the form `MAJOR.MINOR`; the version numbers should adhere to the [Semantic Versioning](https://semver.org/) logic.
For example, the Actor can have a production version **1.1**, a beta version **1.2** that contains new features but is still backward compatible, and a development version **2.0** that contains breaking changes.
-## [](#tags)Tags
+## Tags
When running the Actor, the caller needs to specify which build should actually be used. To simplify this process, the builds can be associated with a tag such **latest** or **beta**, which can be used instead of the version number when running the Actor. The tags are unique - only one build can be associated with a specific tag.
To set a tag for builds of a specific Actor version, set the **Build tag** property. Whenever a new build of the version is successfully finished, it is automatically assigned the tag. By default, the builds are set to the **latest** tag.
-## [](#cache)Cache
+## Cache
By default, the build process pulls the latest copies of all necessary Docker images and builds each new layer of Docker images from scratch. To speed up the builds triggered via API, you can add **useCache=1** parameter. See the API reference for more details.
diff --git a/sources/platform/actors/development/builds_and_runs/index.md b/sources/platform/actors/development/builds_and_runs/index.md
index c9ebf72652..1c4cfb5a1c 100644
--- a/sources/platform/actors/development/builds_and_runs/index.md
+++ b/sources/platform/actors/development/builds_and_runs/index.md
@@ -36,7 +36,7 @@ flowchart LR
AD -- "start Actor" --> Run
```
-## [](#lifecycle)Lifecycle
+## Lifecycle
Actor builds and runs share their lifecycle. Each build and run starts with the initial status **READY** and goes through one or more transitional statuses to one of the terminal statuses.
diff --git a/sources/platform/actors/development/builds_and_runs/state_persistence.md b/sources/platform/actors/development/builds_and_runs/state_persistence.md
index 54baab5986..a57c98c335 100644
--- a/sources/platform/actors/development/builds_and_runs/state_persistence.md
+++ b/sources/platform/actors/development/builds_and_runs/state_persistence.md
@@ -4,7 +4,7 @@ description: Maintain a long-running Actor's state to prevent unexpected restart
slug: /actors/development/builds-and-runs/state-persistence
---
-# [](#state-persistence)State persistence
+# State persistence
**Maintain a long-running Actor's state to prevent unexpected restarts. See a code example on how to prevent a run in the case of a server shutdown.**
@@ -19,31 +19,31 @@ To avoid this, long-running Actors should save (persist) their state periodicall
For short-running Actors, the chance of a restart and the cost of repeated runs are low, so restarts can be ignored.
-## [](#what-is-a-migration)What is a migration?
+## What is a migration?
A migration is when a process running on a server has to stop and move to another. All in-progress processes on the current server are stopped. Unless you have saved your state, the Actor run will restart on the new server. For example, if a request in your [request queue](../../../storage/request_queue.md) has not been updated as **crawled** before the migration, it will be crawled again.
**When a migration event occurs, you only have a few seconds to save your work.**
-## [](#why-do-migrations-happen)Why do migrations happen?
+## Why do migrations happen
- To optimize server workloads.
- When a server crashes (unlikely).
- When we release new features and fix bugs.
-## [](#how-often-do-migrations-occur)How often do migrations occur?
+## How often do migrations occur
Migrations have no specific interval at which they happen. They are caused by the [above events](#why-do-migrations-happen), so they can happen at any time.
-## [](#why-is-state-lost-during-migration)Why is state lost during migration?
+## Why is state lost during migration
Unless instructed to save its output or state to a [storage](../../../storage/index.md), an Actor keeps them in the server's memory. When it switches servers, the run loses access to the previous server's memory. Even if data were saved on the server's disk, we would also lose access to that.
-## [](#how-to-persist-state)How to persist state
+## How to persist state
The [Apify SDKs](/sdk) persist their state automatically. In JavaScript, this is done using the `migrating` and `persistState` events in the [PlatformEventManager](/sdk/js/api/apify/class/PlatformEventManager). The `persistState` event notifies SDK components to persist their state at regular intervals in case a migration happens. The `migrating` event is emitted just before a migration.
-### [](#code-examples)Code examples
+### Code examples
To persist state manually, you can use the `Actor.on` method in the Apify SDK.
diff --git a/sources/platform/actors/development/programming_interface/basic_commands.md b/sources/platform/actors/development/programming_interface/basic_commands.md
index 3c6b8956cc..dcc89a7292 100644
--- a/sources/platform/actors/development/programming_interface/basic_commands.md
+++ b/sources/platform/actors/development/programming_interface/basic_commands.md
@@ -63,7 +63,7 @@ async def main():
## Get input
-Access the Actor's input object, which is stored as a JSON file in the Actor's default key-value store. The input is an object with properties. If the Actor defines the input schema, the input object is guaranteed to conform to it. For details, check out [Input and output](#input-and-output).
+Access the Actor's input object, which is stored as a JSON file in the Actor's default key-value store. The input is an object with properties. If the Actor defines the input schema, the input object is guaranteed to conform to it.
diff --git a/sources/platform/integrations/index.mdx b/sources/platform/integrations/index.mdx
index 191019fd5c..061024e7e4 100644
--- a/sources/platform/integrations/index.mdx
+++ b/sources/platform/integrations/index.mdx
@@ -77,12 +77,6 @@ Apify offers easy-to-set-up solutions for common scenarios, like uploading your
imageUrlDarkTheme="/img/platform/integrations/github-white.svg"
smallImage
/>
-
{/* Only show Asana once we have the videos ready for it
-## Session persistence {#session-persistence}
+## Session persistence
When you use datacenter proxy with the `session` [parameter](./usage.md#sessions) set in the `username` [field](#username-parameters), a single IP is assigned to the `session ID` provided after you make the first request.
diff --git a/sources/platform/proxy/index.md b/sources/platform/proxy/index.md
index 8305179e84..bfede81346 100644
--- a/sources/platform/proxy/index.md
+++ b/sources/platform/proxy/index.md
@@ -11,7 +11,7 @@ import TabItem from '@theme/TabItem';
import Card from "@site/src/components/Card";
import CardGrid from "@site/src/components/CardGrid";
-# [](./proxy) Proxy
+# Proxy
**Learn to anonymously access websites in scraping/automation jobs. Improve data outputs and efficiency of bots, and access websites from various geographies.**
@@ -19,7 +19,7 @@ import CardGrid from "@site/src/components/CardGrid";
> [Apify Proxy](https://apify.com/proxy) allows you to change your IP address when web scraping to reduce the chance of being [blocked](/academy/anti-scraping/techniques) because of your geographical location.
-You can use proxies in your [Actors](../actors/index.mdx) or any other application that supports HTTP proxies. Apify Proxy monitors the health of your IP pool and intelligently [rotates addresses](#ip-address-rotation) to prevent IP address-based blocking.
+You can use proxies in your [Actors](../actors/index.mdx) or any other application that supports HTTP proxies. Apify Proxy monitors the health of your IP pool and intelligently rotates addresses to prevent IP address-based blocking.
You can view your proxy settings and password on the [Proxy](https://console.apify.com/proxy) page in Apify Console. For pricing information, visit [apify.com/pricing](https://apify.com/pricing).
diff --git a/sources/platform/proxy/usage.md b/sources/platform/proxy/usage.md
index de97b49d39..d39304a07f 100644
--- a/sources/platform/proxy/usage.md
+++ b/sources/platform/proxy/usage.md
@@ -143,13 +143,13 @@ Depending on whether you use a [browser](https://apify.com/apify/web-scraper) or
* Browser—a different IP address is used for each browser.
* HTTP request—a different IP address is used for each request.
-Use [sessions](#sessions) to control how you rotate and [persist](#session-persistence) IP addresses. See our guide [Anti-scraping techniques](/academy/anti-scraping/techniques) to learn more about IP address rotation and our findings on how blocking works.
+Use [sessions](#sessions) to control how you rotate IP addresses. See our guide [Anti-scraping techniques](/academy/anti-scraping/techniques) to learn more about IP address rotation and our findings on how blocking works.
## Sessions {#sessions}
Sessions allow you to use the same IP address for multiple connections. In cases where you need to keep the same session (e.g. when you need to log in to a website), it is best to keep the same proxy and so the IP address. On the other hand by switching the IP address, you can avoid being blocked by the website.
-To set a new session, pass the `session` parameter in your [username](./usage.md#username-parameters) field when connecting to a proxy. This will serve as the session's ID and an IP address will be assigned to it. To [use that IP address in other requests](./datacenter_proxy.md#multiple-requests-with-the-same-ip-address), pass that same session ID in the username field.
+To set a new session, pass the `session` parameter in your [username](./usage.md#username-parameters) field when connecting to a proxy. This will serve as the session's ID and an IP address will be assigned to it. To [use that IP address in other requests](/platform/proxy/datacenter-proxy#connecting-to-datacenter-proxies), pass that same session ID in the username field.
We recommend you to use [SessionPool](https://crawlee.dev/api/core/class/SessionPool) abstraction when managing sessions. The created session will then store information such as cookies and can be used to generate [browser fingerprints](/academy/anti-scraping/mitigation/generating-fingerprints). You can also assign custom user data such as authorization tokens and specific headers.
diff --git a/sources/platform/storage/dataset.md b/sources/platform/storage/dataset.md
index f871a87a67..bf79cf44b7 100644
--- a/sources/platform/storage/dataset.md
+++ b/sources/platform/storage/dataset.md
@@ -20,7 +20,7 @@ Dataset storage enables you to sequentially save and retrieve data. A unique dat
Typically, datasets comprise results from web scraping, crawling, and data processing jobs. You can visualize this data in a table, where each object is forming a row and its attributes are represented as columns. You have the option to export data in various formats, including JSON, CSV, XML, Excel, HTML Table, RSS or JSONL.
> Named datasets are retained indefinitely.
-> Unnamed datasets expire after 7 days unless otherwise specified. > [Learn more](usage.md#named-and-unnamed-storages)
+> Unnamed datasets expire after 7 days unless otherwise specified. > [Learn more](/platform/storage/usage#named-and-unnamed-storages)
Dataset storage is _append-only_ - data can only be added and cannot be modified or deleted once stored.
@@ -45,7 +45,7 @@ To view or download a dataset:
2. Select the format & configure other options if desired in **Export dataset** section.
3. Click **Download**.
-Utilize the **Actions** menu to modify the dataset's name, which also affects its [retention period](./usage.md#data-retention-data-retention), and to adjust [access rights](../collaboration/index.md). The **API** button allows you to explore and test the dataset's [API endpoints](/api/v2#/reference/datasets).
+Utilize the **Actions** menu to modify the dataset's name, which also affects its [retention period](/platform/storage/usage#data-retention), and to adjust [access rights](../collaboration/index.md). The **API** button allows you to explore and test the dataset's [API endpoints](/api/v2#/reference/datasets).

@@ -440,7 +440,7 @@ other_dataset_client = apify_client.dataset('jane-doe/old-dataset')
The same applies for the [Apify API](#apify-api) - you can use [the same endpoints](#apify-api) as you would normally do.
-See the [Storage overview](/platform/storage#sharing-storages-between-runs) for details on sharing storages between runs.
+See the [Storage overview](/platform/storage/usage#sharing-storages-between-runs) for details on sharing storages between runs.
## Limits
diff --git a/sources/platform/storage/index.md b/sources/platform/storage/index.md
index f3a9b0c9c1..226875ec79 100644
--- a/sources/platform/storage/index.md
+++ b/sources/platform/storage/index.md
@@ -15,7 +15,7 @@ import CardGrid from "@site/src/components/CardGrid";
---
-The Apify platform provides three types of storage accessible both within our [Apify Console](https://console.apify.com/storage) and externally through our [REST API](/api/v2#/) [Apify API Clients](/api) or [SDKs](/sdk).
+The Apify platform provides three types of storage accessible both within our [Apify Console](https://console.apify.com/storage) and externally through our [REST API](/api/v2) [Apify API Clients](/api) or [SDKs](/sdk).
Named key-value stores are retained indefinitely.
-> Unnamed key-value stores expire after 7 days unless otherwise specified. > [Learn more](./index.md#named-and-unnamed-storages)
+> Unnamed key-value stores expire after 7 days unless otherwise specified. > [Learn more](/platform/storage/usage#named-and-unnamed-storages)
## Basic usage
@@ -40,7 +40,7 @@ In [Apify Console](https://console.apify.com), you can view your key-value store

To view a key-value store's content, click on its **Store ID**.
-Under the **Actions** menu, you can rename your store (and, in turn extend its [retention period](./usage#named-and-unnamed-storages)) and grant [access rights](../collaboration/index.md) using the **Share** button.
+Under the **Actions** menu, you can rename your store (and, in turn extend its [retention period](/platform/storage/usage#named-and-unnamed-storages)) and grant [access rights](../collaboration/index.md) using the **Share** button.
Click on the **API** button to view and test a store's [API endpoints](/api/v2#/reference/key-value-stores).

@@ -314,7 +314,7 @@ other_store_client = apify_client.key_value_store('jane-doe/old-store')
The same applies for the [Apify API](#apify-api) - you can use [the same endpoints](#apify-api) as you would normally do.
-Check out the [Storage overview](/platform/storage#sharing-storages-between-runs) for details on sharing storages between runs.
+Check out the [Storage overview](/platform/storage/usage#sharing-storages-between-runs) for details on sharing storages between runs.
## Data consistency
diff --git a/sources/platform/storage/request_queue.md b/sources/platform/storage/request_queue.md
index 3b8726fce6..4113e92a41 100644
--- a/sources/platform/storage/request_queue.md
+++ b/sources/platform/storage/request_queue.md
@@ -18,7 +18,7 @@ Request queues enable you to enqueue and retrieve requests such as URLs with an
The storage system for request queues accommodates both breadth-first and depth-first crawling strategies, along with the inclusion of custom data attributes. This system enables you to check if certain URLs have already been encountered, add new URLs to the queue, and retrieve the next set of URLs for processing.
> Named request queues are retained indefinitely.
-> Unnamed request queues expire after 7 days unless otherwise specified. > [Learn more](./index.md#named-and-unnamed-storages)
+> Unnamed request queues expire after 7 days unless otherwise specified. > [Learn more](/platform/storage/usage#named-and-unnamed-storages)
## Basic usage
@@ -36,8 +36,8 @@ In the [Apify Console](https://console.apify.com), you can view your request que

To view a request queue, click on its **Queue ID**.
-Under the **Actions** menu, you can rename your queue (and, in turn, its
-[retention period](./usage#named-and-unnamed-storages)) and [access rights](../collaboration/index.md) using the **Share** button.
+Under the **Actions** menu, you can rename your queue's name (and, in turn, its
+[retention period](/platform/storage/usage#named-and-unnamed-storages)) and [access rights](../collaboration/index.md) using the **Share** button.
Click on the **API** button to view and test a queue's [API endpoints](/api/v2#/reference/request-queues).

@@ -563,7 +563,7 @@ other_queue_client = apify_client.request_queue('jane-doe/old-queue')
The same applies for the [Apify API](#apify-api) - you can use [the same endpoints](#apify-api) as you would normally do.
-Check out the [Storage overview](/platform/storage#sharing-storages-between-runs) for details on sharing storages between runs.
+Check out the [Storage overview](/platform/storage/usage#sharing-storages-between-runs) for details on sharing storages between runs.
## Limits