feat: add new typos PR check (#1141)

TC-MO · web-flow · commit f1be8fd9aca9 · 2024-08-08T10:21:55.000+02:00
add PR check for new typos action
add _typos.toml to configure ignores for the new action
diff --git a/.github/workflows/typos-check.yaml b/.github/workflows/typos-check.yaml
@@ -0,0 +1,18 @@
+name: Typos Check
+
+on:
+    pull_request:
+        branches: [master]
+
+jobs:
+    run:
+        name: Spell Check with Typos
+        runs-on: ubuntu-latest
+        steps:
+            - name: Checkout code
+              uses: actions/checkout@v4
+
+            - name: Check spelling
+              uses: crate-ci/typos@master
+              with:
+                  files: ./sources
diff --git a/_typos.toml b/_typos.toml
@@ -0,0 +1,8 @@
+[default]
+extend-ignore-re = [
+    '`[^`\n]+`',
+    '```[\s\S]*?```',
+]
+
+[default.extend-words]
+SER = "SER"
diff --git a/sources/academy/glossary/tools/switchyomega.md b/sources/academy/glossary/tools/switchyomega.md
@@ -13,7 +13,7 @@ slug: /tools/switchyomega
 
 SwitchyOmega is a Chrome extension for managing and switching between proxies which can be added in the [Chrome Webstore](https://chrome.google.com/webstore/detail/padekgcemlokbadohgkifijomclgjgif).
 
-After adding it to Chrome, you can see the SwitchyOmega icon somewhere amongst all your other Chrome extension icons. Clicking on it will display a menu, where you can select various differnt connection profiles, as well as open the extension's options.
+After adding it to Chrome, you can see the SwitchyOmega icon somewhere amongst all your other Chrome extension icons. Clicking on it will display a menu, where you can select various different connection profiles, as well as open the extension's options.
 
 ![The SwitchyOmega interface](./images/switchyomega.png)
 
diff --git a/sources/academy/tutorials/node_js/caching_responses_in_puppeteer.js b/sources/academy/tutorials/node_js/caching_responses_in_puppeteer.js
@@ -29,7 +29,7 @@ const crawler = new PuppeteerCrawler({
                 try {
                     buffer = await response.buffer();
                 } catch (error) {
-                    // some responses do not contain buffer and do not need to be catched
+                    // some responses do not contain buffer and do not need to be cached
                     return;
                 }
 
diff --git a/sources/academy/webscraping/anti_scraping/index.md b/sources/academy/webscraping/anti_scraping/index.md
@@ -66,7 +66,7 @@ Anti-scraping protections can work on many different layers and use a large amou
 
 1. **Where you are coming from** - The IP address of the incoming traffic is always available to the website. Proxies are used to emulate a different IP addresses but their quality matters a lot.
 2. **How you look** - With each request, the website can analyze its HTTP headers, TLS version, cyphers, and other information. Moreover, if you use a browser, the website can also analyze the whole browser fingerprint and run challenges to classify your hardware (like graphics hardware acceleration).
-3. **What you are scraping** - The same data can be extracted in many ways from a website. You can just get the inital HTML or you can use a browser to render the full page or you can reverse engineer internal APIs. Each of those endpoints can be protected differently.
+3. **What you are scraping** - The same data can be extracted in many ways from a website. You can just get the initial HTML or you can use a browser to render the full page or you can reverse engineer internal APIs. Each of those endpoints can be protected differently.
 4. **How you behave** - The website can see patterns in how you are ordering your requests, how fast you are scraping, etc. It can also analyze browser behavior like mouse movement, clicks or key presses.
 
 These are the 4 main principles that anti-scraping protections are based on.
diff --git a/sources/academy/webscraping/anti_scraping/mitigation/cloudflare_challenge.md b/sources/academy/webscraping/anti_scraping/mitigation/cloudflare_challenge.md
@@ -1,5 +1,5 @@
 ---
-title: Bypasing Cloudflare browser check
+title: Bypassing Cloudflare browser check
 description: Learn how to bypass Cloudflare browser challenge with Crawlee.
 sidebar_position: 3
 slug: /anti-scraping/mitigation/cloudflare-challenge.md
diff --git a/sources/platform/actors/development/actor_definition/docker.md b/sources/platform/actors/development/actor_definition/docker.md
@@ -9,7 +9,7 @@ sidebar_position: 4
 
 ---
 
-When developing an [Actor](/sources/platform/actors/index.mdx) on the Apify platform, you can choose from a variety of pre-built Docker iamges to serve as the base for your Actor. These base images come with pre-installed dependencies and tools, making it easier to set up your development envrionment and ensuring consistent behavior across different environments.
+When developing an [Actor](/sources/platform/actors/index.mdx) on the Apify platform, you can choose from a variety of pre-built Docker images to serve as the base for your Actor. These base images come with pre-installed dependencies and tools, making it easier to set up your development environment and ensuring consistent behavior across different environments.
 
 ## Base Docker images
 
@@ -105,7 +105,7 @@ By default, Apify base Docker images with the Apify SDK and Crawlee start your N
 }
 ```
 
-This means the system expects the source code to be in `main.js` by default. If you want to override this behavior, ues a custom `package.json` and/or `Dockerfile`.
+This means the system expects the source code to be in `main.js` by default. If you want to override this behavior, use a custom `package.json` and/or `Dockerfile`.
 
 :::tip Optimization tips
 
diff --git a/sources/platform/actors/development/actor_definition/output_schema.md b/sources/platform/actors/development/actor_definition/output_schema.md
@@ -111,7 +111,7 @@ To set up the Actor's output tab UI using a single configuration file, use the f
 }
 ```
 
-The template above defines the configuration for the default dataset output view. Under the `views` property, there is one view titled _Overview_. The view configuartion consists of two main steps:
+The template above defines the configuration for the default dataset output view. Under the `views` property, there is one view titled _Overview_. The view configuration consists of two main steps:
 
 1. `transformation` - set up how to fetch the data.
 2. `display` - set up how to visually present the fetched data.
@@ -124,7 +124,7 @@ The default behavior of the Output tab UI table is to display all fields from `t
 
 Output configuration files need to be located in the `.actor` folder within the Actor's root directory.
 
-You have two choices of how to organize files withing the `.actor` folder.
+You have two choices of how to organize files within the `.actor` folder.
 
 ### Single configuration file
 
diff --git a/sources/platform/actors/development/programming_interface/actor_standby.md b/sources/platform/actors/development/programming_interface/actor_standby.md
@@ -70,7 +70,7 @@ async def main() -> None:
 </Tabs>
 
 Please make sure to describe your Actors, their endpoints, and the schema for their
-inputs and ouputs in your README.
+inputs and outputs in your README.
 
 ## Can I monetize my Actor in the Standby mode
 
diff --git a/sources/platform/api_v2/api_v2_reference.apib b/sources/platform/api_v2/api_v2_reference.apib
@@ -989,7 +989,7 @@ received in the response JSON to the [Get items](#reference/datasets/item-collec
                                              otherwise it will have a transitional status (e.g. `RUNNING`).
     + webhooks: `dGhpcyBpcyBqdXN0IGV4YW1wbGUK...` (string, optional) - Specifies optional webhooks associated with the actor run, which can be used to receive a notification
       e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks. For more information, see
-      [Webhooks documenation](https://docs.apify.com/platform/integrations/webhooks).
+      [Webhooks documentation](https://docs.apify.com/platform/integrations/webhooks).
 
 + Request
 
@@ -1023,7 +1023,7 @@ received in the response JSON to the [Get items](#reference/datasets/item-collec
     + build: `0.1.234` (string, optional) - Specifies the actor build to run. It can be either a build tag or build number. By default, the run uses the build specified in the default run configuration for the actor (typically `latest`).
     + webhooks: `dGhpcyBpcyBqdXN0IGV4YW1wbGUK...` (string, optional) - Specifies optional webhooks associated with the actor run, which can be used to receive a notification
       e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks. For more information, see
-      [Webhooks documenation](https://docs.apify.com/platform/integrations/webhooks).
+      [Webhooks documentation](https://docs.apify.com/platform/integrations/webhooks).
 
 ### With input [POST]
 
@@ -1141,7 +1141,7 @@ To run the actor asynchronously, use the [Run actor](#reference/actors/run-colle
     + build: `0.1.234` (string, optional) - Specifies the actor build to run. It can be either a build tag or build number. By default, the run uses the build specified in the default run configuration for the actor (typically `latest`).
     + webhooks: `dGhpcyBpcyBqdXN0IGV4YW1wbGUK...` (string, optional) - Specifies optional webhooks associated with the actor run, which can be used to receive a notification
       e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks. For more information, see
-      [Webhooks documenation](https://docs.apify.com/platform/integrations/webhooks).
+      [Webhooks documentation](https://docs.apify.com/platform/integrations/webhooks).
     + format: `json` (string, optional) - Format of the results, possible values are: `json`, `jsonl`, `csv`, `html`, `xlsx`, `xml` and `rss`. The default value is `json`.
     + clean: `false` (boolean, optional) - If `true` or `1` then the API endpoint returns only non-empty items and skips hidden fields
         (i.e. fields starting with the # character).
@@ -1758,7 +1758,7 @@ received in the response JSON to the [Get items](#reference/datasets/item-collec
       e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks.
       **Note**: if you already have a webhook set up for the actor or task, you do not have to add it again here.
       For more information, see
-      [Webhooks documenation](https://docs.apify.com/platform/integrations/webhooks).
+      [Webhooks documentation](https://docs.apify.com/platform/integrations/webhooks).
 
 + Request
 
@@ -1792,7 +1792,7 @@ received in the response JSON to the [Get items](#reference/datasets/item-collec
       in the response. By default, it is `OUTPUT`.
     + webhooks: `dGhpcyBpcyBqdXN0IGV4YW1wbGUK...` (string, optional) - Specifies optional webhooks associated with the actor run, which can be used to receive a notification
       e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks. For more information, see
-      [Webhooks documenation](https://docs.apify.com/platform/integrations/webhooks).
+      [Webhooks documentation](https://docs.apify.com/platform/integrations/webhooks).
 
 ### Run task synchronously (POST) [POST]
 
@@ -1898,7 +1898,7 @@ To run the Task asynchronously, use the [Run task asynchronously](#reference/act
     + build: `0.1.234` (string, optional) - Specifies the actor build to run. It can be either a build tag or build number. By default, the run uses the build specified in the task settings (typically `latest`).
     + webhooks: `dGhpcyBpcyBqdXN0IGV4YW1wbGUK...` (string, optional) - Specifies optional webhooks associated with the actor run, which can be used to receive a notification
       e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks. For more information, see
-      [Webhooks documenation](https://docs.apify.com/platform/integrations/webhooks).
+      [Webhooks documentation](https://docs.apify.com/platform/integrations/webhooks).
     + format: `json` (string, optional) - Format of the results, possible values are: `json`, `jsonl`, `csv`, `html`, `xlsx`, `xml` and `rss`. The default value is `json`.
     + clean: `false` (boolean, optional) - If `true` or `1` then the API endpoint returns only non-empty items and skips hidden fields
       (i.e. fields starting with the # character).
@@ -3005,7 +3005,7 @@ The pagination is always performed with the granularity of a single item, regard
 By default, the **Items** in the response are sorted by the time they were stored to the database, therefore you can use
 pagination to incrementally fetch the items as they are being added.
 The maximum number of items that will be returned in a single API call is limited to 250,000. <!-- GET_ITEMS_LIMIT -->
-If you specify `desc=1` query paremeter, the results are returned in the reverse order
+If you specify `desc=1` query parameter, the results are returned in the reverse order
 than they were stored (i.e. from newest to oldest items).
 Note that only the order of **Items** is reversed, but not the order of the `unwind` array elements.
 
@@ -3081,7 +3081,7 @@ The POST payload is a JSON object or a JSON array of objects to save into the da
 **IMPORTANT:** The limit of request payload size for the dataset is 5 MB. If the array exceeds the size,
 you'll need to split it into a number of smaller arrays.
 
-If the dataset has fields schema defined, the push request can potentialy fail with `400 Bad Request` if any item does not match the schema.
+If the dataset has fields schema defined, the push request can potentially fail with `400 Bad Request` if any item does not match the schema.
 In such case, nothing will be inserted into the dataset and the response will contain an error message with a list of invalid items and their validation errors.
 
 + Parameters
@@ -3767,7 +3767,7 @@ parameter.
 
 + Parameters
 
-    + dispatchId: `Zib4xbZsmvZeK55ua`         (string, required) - Webhook dispacth ID.
+    + dispatchId: `Zib4xbZsmvZeK55ua`         (string, required) - Webhook dispatch ID.
     + token:      `soSkq9ekdmfOslopH` (string, required) - API authentication token.
 
 ### Get webhook dispatch [GET]
@@ -4091,7 +4091,7 @@ a summary of your limits, and your current usage.
 - taggedBuilds                                 (object, nullable)
     - latest                                   (object, nullable)
         - buildId:      `z2EryhbfhgSyqj6Hn`    (string, nullable)
-        - buldNumber:   `0.0.2`                (string, nullable)
+        - buildNumber:   `0.0.2`                (string, nullable)
         - finishedAt:   `2019-06-10T11:15:49.286Z`          (string, nullable)
 
 ## ActCreate (object)
diff --git a/sources/platform/proxy/usage.md b/sources/platform/proxy/usage.md
@@ -143,7 +143,7 @@ Depending on whether you use a [browser](https://apify.com/apify/web-scraper) or
 * Browser—a different IP address is used for each browser.
 * HTTP request—a different IP address is used for each request.
 
-Use [sessions](#sessions) to controll how you rotate and [persist](#session-persistence) IP addresses. See our guide [Anti-scraping techniques](/academy/anti-scraping/techniques) to learn more about IP address rotation and our findings on how blocking works.
+Use [sessions](#sessions) to control how you rotate and [persist](#session-persistence) IP addresses. See our guide [Anti-scraping techniques](/academy/anti-scraping/techniques) to learn more about IP address rotation and our findings on how blocking works.
 
 ## Sessions {#sessions}
 
diff --git a/sources/platform/storage/dataset.md b/sources/platform/storage/dataset.md
@@ -51,7 +51,7 @@ Utilize the **Actions** menu to modify the dataset's name, which also affects it
 
 ### Apify API
 
-The [Apify API](/api/v2#/reference/datasets) enables you progammatic access to your datasets using [HTTP requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods).
+The [Apify API](/api/v2#/reference/datasets) enables you programmatic access to your datasets using [HTTP requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods).
 
 If you are accessing your datasets using the `username~store-name` [store ID format](./index.md), you will need to use your secret API token. You can find the token (and your user ID) on the [Integrations](https://console.apify.com/account#/integrations)tab of **Settings** page of your Apify account.
 
@@ -155,7 +155,7 @@ Check out the [Python API client documentation](/api/client/python/reference/cla
 
 When working with a JavaScript [Actor](../actors/index.mdx), the [JavaScript SDK](/sdk/js/docs/guides/result-storage#dataset) is an essential tool, especially for dataset management. It simplifies the tasks of storing and retrieving data, seamlessly integrating with the Actor's workflow. Key features of the SDK include the ability to append data, retrieve what is stored, and manage dataset properties effectively. Central to this functionality is the [`Dataset`](/sdk/js/reference/class/Dataset) class. This class allows you to determine where your data is stored - locally or in the Apify cloud. To add data to your chosen datasets, use the [`pushData()`](/sdk/js/reference/class/Dataset#pushData) method.
 
-Additionaly the SDK offers other methods like [`getData()`](/sdk/js/reference/class/Dataset#getData), [`map()`](/sdk/js/reference/class/Dataset#map), and [`reduce()`](/sdk/js/reference/class/Dataset#reduce). For practical applications of these methods, refer to the [example](/sdk/js/docs/examples/map-and-reduce) section.
+Additionally the SDK offers other methods like [`getData()`](/sdk/js/reference/class/Dataset#getData), [`map()`](/sdk/js/reference/class/Dataset#map), and [`reduce()`](/sdk/js/reference/class/Dataset#reduce). For practical applications of these methods, refer to the [example](/sdk/js/docs/examples/map-and-reduce) section.
 
 If you have chosen to store your dataset locally, you can find it in the location below.
 
@@ -284,7 +284,7 @@ For more information, visit our [Python SDK documentation](/sdk/python/docs/conc
 
 Fields in a dataset that begin with a `#` are treated as hidden. You can exclude these fields when downloading data by using either `skipHidden=1` or `clean=1` in your query parameters. This feature is useful for excluding debug information from the final dataset output.
 
-The following example demonstates a dataset record with hiddent fields, including HTTP response and error details.
+The following example demonstrates a dataset record with hiddent fields, including HTTP response and error details.
 
 ```json
 {
diff --git a/sources/platform/storage/key_value_store.md b/sources/platform/storage/key_value_store.md
@@ -47,7 +47,7 @@ Click on the **API** button to view and test a store's [API endpoints](/api/v2#/
 
 ### Apify API
 
-The [Apify API](/api/v2#/reference/key-value-stores) enables you programmatic acces to your key-value stores using [HTTP requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods).
+The [Apify API](/api/v2#/reference/key-value-stores) enables you programmatic access to your key-value stores using [HTTP requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods).
 
 If you are accessing your datasets using the `username~store-name` [store ID format](./index.md), you will need to use your secret API token. You can find the token (and your user ID) on the [Integrations](https://console.apify.com/account#/integrations) tab of **Settings** page of your Apify account.
 
diff --git a/sources/platform/storage/request_queue.md b/sources/platform/storage/request_queue.md
@@ -15,7 +15,7 @@ import TabItem from '@theme/TabItem';
 
 Request queues enable you to enqueue and retrieve requests such as URLs with an [HTTP method](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods) and other parameters. They prove essential not only in web crawling scenarios but also in any situation requiring the management of a large number of URLs and the addition of new links.
 
-The storage system for request queues accomoodates both breadth-first and depth-first crawling stategies, along with the inclusion of custom data attributes. This system enables you to check if certain URLs have already been encountered, add new URLs to the queue, and retrieve the next set of URLs fo processing.
+The storage system for request queues accomoodates both breadth-first and depth-first crawling strategies, along with the inclusion of custom data attributes. This system enables you to check if certain URLs have already been encountered, add new URLs to the queue, and retrieve the next set of URLs for processing.
 
 > Named request queues are retained indefinitely. <br/>
 > Unnamed request queues expire after 7 days unless otherwise specified.<br/> > [Learn more](./index.md#named-and-unnamed-storages)

Original file line number	Diff line number	Diff line change
`@@ -29,7 +29,7 @@ const crawler = new PuppeteerCrawler({`
`29`	`29`	`try {`
`30`	`30`	`buffer = await response.buffer();`
`31`	`31`	`} catch (error) {`
`32`		`- // some responses do not contain buffer and do not need to be catched`
	`32`	`+ // some responses do not contain buffer and do not need to be cached`
`33`	`33`	`return;`
`34`	`34`	`}`
`35`	`35`