From 5061f24ea3c277c1dc42c84793510b60b7aceba0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jind=C5=99ich=20B=C3=A4r?= Date: Mon, 9 Sep 2024 08:31:36 +0200 Subject: [PATCH 1/9] chore: throw on build with broken anchors --- docusaurus.config.js | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docusaurus.config.js b/docusaurus.config.js index be899e25d0..c84dc08493 100644 --- a/docusaurus.config.js +++ b/docusaurus.config.js @@ -51,6 +51,8 @@ module.exports = { /** @type {import('@docusaurus/types').ReportingSeverity} */ ('throw'), onBrokenMarkdownLinks: /** @type {import('@docusaurus/types').ReportingSeverity} */ ('throw'), + onBrokenAnchors: + /** @type {import('@docusaurus/types').ReportingSeverity} */ ('throw'), themes: [ [ require.resolve('./apify-docs-theme'), From cf5f49297ca51e36af6a9e88511004696706f044 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Olender?= <92638966+TC-MO@users.noreply.github.com> Date: Mon, 7 Oct 2024 14:31:16 +0200 Subject: [PATCH 2/9] fix broken anchor links --- .../deploying_your_code/input_schema.md | 10 +- .../bypassing_anti_scraping.md | 2 +- .../solutions/handling_migrations.md | 2 +- .../tasks_and_storage.md | 2 +- .../getting_started/inputs_outputs.md | 4 +- .../apify_scrapers/cheerio_scraper.md | 34 +- .../apify_scrapers/getting_started.md | 58 +- .../apify_scrapers/puppeteer_scraper.md | 40 +- .../tutorials/apify_scrapers/web_scraper.md | 34 +- .../analyzing_pages_and_fixing_errors.md | 2 +- .../filter_blocked_requests_using_sessions.md | 4 +- .../tutorials/php/using_apify_from_php.md | 4 +- .../data_extraction/using_devtools.md | 2 +- sources/platform/storage/dataset.md | 459 -------------- .../platform/storage/images/datasets-app.png | Bin 73072 -> 0 bytes .../storage/images/datasets-detail.png | Bin 114389 -> 0 bytes .../platform/storage/images/find-store-id.png | Bin 40396 -> 0 bytes .../storage/images/key-value-stores-app.png | Bin 73815 -> 0 bytes .../images/key-value-stores-detail.png | Bin 53129 -> 0 bytes .../platform/storage/images/overview-api.png | Bin 100286 -> 0 bytes .../storage/images/request-queue-app.png | Bin 83407 -> 0 bytes .../storage/images/request-queue-detail.png | Bin 82521 -> 0 bytes sources/platform/storage/index.md | 37 -- sources/platform/storage/key_value_store.md | 325 ---------- sources/platform/storage/request_queue.md | 583 ------------------ sources/platform/storage/usage.md | 181 ------ 26 files changed, 99 insertions(+), 1684 deletions(-) delete mode 100644 sources/platform/storage/dataset.md delete mode 100644 sources/platform/storage/images/datasets-app.png delete mode 100644 sources/platform/storage/images/datasets-detail.png delete mode 100644 sources/platform/storage/images/find-store-id.png delete mode 100644 sources/platform/storage/images/key-value-stores-app.png delete mode 100644 sources/platform/storage/images/key-value-stores-detail.png delete mode 100644 sources/platform/storage/images/overview-api.png delete mode 100644 sources/platform/storage/images/request-queue-app.png delete mode 100644 sources/platform/storage/images/request-queue-detail.png delete mode 100644 sources/platform/storage/index.md delete mode 100644 sources/platform/storage/key_value_store.md delete mode 100644 sources/platform/storage/request_queue.md delete mode 100644 sources/platform/storage/usage.md diff --git a/sources/academy/platform/deploying_your_code/input_schema.md b/sources/academy/platform/deploying_your_code/input_schema.md index fb0ba9b564..c684d5b9cd 100644 --- a/sources/academy/platform/deploying_your_code/input_schema.md +++ b/sources/academy/platform/deploying_your_code/input_schema.md @@ -5,7 +5,7 @@ sidebar_position: 2 slug: /deploying-your-code/input-schema --- -# Input schema {#input-schema} +# Input schema **Learn how to generate a user interface on the platform for your Actor's input with a single file - the INPUT_SCHEMA.json file.** @@ -30,7 +30,7 @@ In the root of our project, we'll create a file named **INPUT_SCHEMA.json** and The **title** and **description** simply describe what the input schema is for, and a bit about what the Actor itself does. -## Properties {#properties} +## Properties In order to define all of the properties our Actor is expecting, we must include them within an object with a key of **properties**. @@ -53,7 +53,7 @@ Each property's key corresponds to the name we're expecting within our code, whi ## Property types & editor types {#property-types} -Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings. +Within our new **numbers** property, there are two more fields we must specify. Firstly, we must let the platform know that we're expecting an array of numbers with the **type** field. Then, we should also instruct Apify on which UI component to render for this input property. In our case, we have an array of numbers, which means we should use the **json** editor type that we discovered in the ["array" section](/platform/actors/development/actor-definition/input-schema/specification/v1#array) of the input schema documentation. We could also use **stringList**, but then we'd have to parse out the numbers from the strings. ```json { @@ -72,7 +72,7 @@ Within our new **numbers** property, there are two more fields we must specify. } ``` -## Required fields {#required-fields} +## Required fields The great thing about building an input schema is that it will automatically validate your inputs based on their type, maximum value, minimum value, etc. Sometimes, you want to ensure that the user will always provide input for certain fields, as they are crucial to the Actor's run. This can be done by using the **required** field and passing in the names of the fields you'd like to require. @@ -96,7 +96,7 @@ The great thing about building an input schema is that it will automatically val For our case, we've made the **numbers** field required, as it is crucial to our Actor's run. -## Final thoughts {#final-thoughts} +## Final thoughts Here is what the input schema we wrote will render on the platform: diff --git a/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md b/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md index 4e4fa6f226..ccc9c62f3e 100644 --- a/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md +++ b/sources/academy/platform/expert_scraping_with_apify/bypassing_anti_scraping.md @@ -20,7 +20,7 @@ You might have already noticed that we've been using the **RESIDENTIAL** proxy g ## Learning 🧠 {#learning} - Skim [this page](https://apify.com/proxy) for a general idea of Apify Proxy. -- Give the [proxy documentation](/platform/proxy#our-proxies) a solid readover (feel free to skip most of the examples). +- Give the [proxy documentation](/platform/proxy) a solid readover (feel free to skip most of the examples). - Check out the [anti-scraping guide](../../webscraping/anti_scraping/index.md). - Gain a solid understanding of the [SessionPool](https://crawlee.dev/api/core/class/SessionPool). - Look at a few Actors on the [Apify store](https://apify.com/store). How are they utilizing proxies? diff --git a/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md b/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md index 2cfba52a4d..87c5450f60 100644 --- a/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md +++ b/sources/academy/platform/expert_scraping_with_apify/solutions/handling_migrations.md @@ -231,7 +231,7 @@ That's everything! Now, even if the Actor migrates (or is gracefully aborted and **A:** It's not best to use this option by default. If it fails, there must be a reason, which would need to be thought through first - meaning that the edge case of failing should be handled when resurrecting the Actor. The state should be persisted beforehand. -**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run?** +**Q: Migrations happen randomly, but by [aborting gracefully](/platform/actors/running/runs-and-builds#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted Actor's run?** **A:** After aborting or throwing an error mid-process, it manages to start back from where it was upon resurrection. diff --git a/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md b/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md index 8d42fc6d18..98449a5e38 100644 --- a/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md +++ b/sources/academy/platform/expert_scraping_with_apify/tasks_and_storage.md @@ -24,7 +24,7 @@ Storage allows us to save persistent data for further processing. As you'll lear ## Learning 🧠 {#learning} - Check out [the docs about Actor tasks](/platform/actors/running/tasks). -- Read about the [two main storage options](/platform/storage#dataset) on the Apify platform. +- Read about the [three main storage options](/platform/storage) on the Apify platform. - Understand the [crucial differences between named and unnamed storages](/platform/storage/usage#named-and-unnamed-storages). - Learn about the [`Dataset`](/sdk/js/reference/class/Dataset) and [`KeyValueStore`](/sdk/js/reference/class/KeyValueStore) objects in the Apify SDK. diff --git a/sources/academy/platform/getting_started/inputs_outputs.md b/sources/academy/platform/getting_started/inputs_outputs.md index b7a95a8e66..10bda52d93 100644 --- a/sources/academy/platform/getting_started/inputs_outputs.md +++ b/sources/academy/platform/getting_started/inputs_outputs.md @@ -65,7 +65,7 @@ Then, replace everything in **INPUT_SCHEMA.json** with this: } ``` -> If you're interested in learning more about how the code works, and what the **INPUT_SCHEMA.json** means, read about [inputs](/sdk/js/docs/examples/accept-user-input) and [adding data to a dataset](/sdk/js/docs/examples/add-data-to-dataset) in the Apify SDK documentation, and refer to the [input schema docs](/platform/actors/development/actor-definition/input-schema#integer). +> If you're interested in learning more about how the code works, and what the **INPUT_SCHEMA.json** means, read about [inputs](/sdk/js/docs/examples/accept-user-input) and [adding data to a dataset](/sdk/js/docs/examples/add-data-to-dataset) in the Apify SDK documentation, and refer to the [input schema docs](/platform/actors/development/actor-definition/input-schema/specification/v1). Finally, **Save** and **Build** the Actor just as you did in the previous lesson. @@ -89,7 +89,7 @@ On the results tab, there are a whole lot of options for which format to view/do There's our solution! Did it work for you as well? Now, we can download the data right from the results tab to be used elsewhere, or even programmatically retrieve it by using [Apify's API](/api/v2) (we'll be discussing how to do this in the next lesson). -It's important to note that the default dataset of the Actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage#data-retention). +It's important to note that the default dataset of the Actor, which we pushed our solution to, will be retained for 7 days. If we wanted the data to be retained for an indefinite period of time, we'd have to use a named dataset. For more information about named storages vs unnamed storages, read a bit about [data retention on the Apify platform](/platform/storage/usage#data-retention). ## Next up {#next} diff --git a/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md b/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md index a1ae5c7c1a..b5fef6d886 100644 --- a/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md +++ b/sources/academy/tutorials/apify_scrapers/cheerio_scraper.md @@ -17,7 +17,7 @@ tutorial, great! You are ready to continue where we left off. If you haven't see check it out, it will help you learn about Apify and scraping in general and set you up for this tutorial, because this one builds on topics and code examples discussed there. -## [](#getting-to-know-our-tools) Getting to know our tools +## Getting to know our tools In the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started) tutorial, we've confirmed that the scraper works as expected, so now it's time to add more data to the results. @@ -36,7 +36,7 @@ Now that's out of the way, let's open one of the Actor detail pages in the Store > If you're wondering why we're using Web Scraper as an example instead of Cheerio Scraper, it's only because we didn't want to triple the number of screenshots we needed to make. Lazy developers! -## [](#building-our-page-function) Building our Page function +## Building our Page function Before we start, let's do a quick recap of the data we chose to scrape: @@ -52,7 +52,7 @@ Before we start, let's do a quick recap of the data we chose to scrape: We've already scraped numbers 1 and 2 in the [Getting started with Apify scrapers](/academy/apify-scrapers/getting-started) tutorial, so let's get to the next one on the list: title. -### [](#title) Title +### Title ![$1](https://raw.githubusercontent.com/apifytech/actor-scraper/master/docs/img/title.webp) @@ -79,7 +79,7 @@ async function pageFunction(context) { } ``` -### [](#description) Description +### Description Getting the Actor's description is a little more involved, but still pretty straightforward. We can't just simply search for a `

` tag, because there's a lot of them in the page. We need to narrow our search down a little. Using the DevTools we find that the Actor description is nested within @@ -98,7 +98,7 @@ async function pageFunction(context) { } ``` -### [](#modified-date) Modified date +### Modified date The DevTools tell us that the `modifiedDate` can be found in a `