You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/glossary/tools/switchyomega.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ slug: /tools/switchyomega
13
13
14
14
SwitchyOmega is a Chrome extension for managing and switching between proxies which can be added in the [Chrome Webstore](https://chrome.google.com/webstore/detail/padekgcemlokbadohgkifijomclgjgif).
15
15
16
-
After adding it to Chrome, you can see the SwitchyOmega icon somewhere amongst all your other Chrome extension icons. Clicking on it will display a menu, where you can select various differnt connection profiles, as well as open the extension's options.
16
+
After adding it to Chrome, you can see the SwitchyOmega icon somewhere amongst all your other Chrome extension icons. Clicking on it will display a menu, where you can select various different connection profiles, as well as open the extension's options.
Copy file name to clipboardExpand all lines: sources/academy/webscraping/anti_scraping/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,7 +66,7 @@ Anti-scraping protections can work on many different layers and use a large amou
66
66
67
67
1.**Where you are coming from** - The IP address of the incoming traffic is always available to the website. Proxies are used to emulate a different IP addresses but their quality matters a lot.
68
68
2.**How you look** - With each request, the website can analyze its HTTP headers, TLS version, cyphers, and other information. Moreover, if you use a browser, the website can also analyze the whole browser fingerprint and run challenges to classify your hardware (like graphics hardware acceleration).
69
-
3.**What you are scraping** - The same data can be extracted in many ways from a website. You can just get the inital HTML or you can use a browser to render the full page or you can reverse engineer internal APIs. Each of those endpoints can be protected differently.
69
+
3.**What you are scraping** - The same data can be extracted in many ways from a website. You can just get the initial HTML or you can use a browser to render the full page or you can reverse engineer internal APIs. Each of those endpoints can be protected differently.
70
70
4.**How you behave** - The website can see patterns in how you are ordering your requests, how fast you are scraping, etc. It can also analyze browser behavior like mouse movement, clicks or key presses.
71
71
72
72
These are the 4 main principles that anti-scraping protections are based on.
Copy file name to clipboardExpand all lines: sources/platform/actors/development/actor_definition/docker.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ sidebar_position: 4
9
9
10
10
---
11
11
12
-
When developing an [Actor](/sources/platform/actors/index.mdx) on the Apify platform, you can choose from a variety of pre-built Docker iamges to serve as the base for your Actor. These base images come with pre-installed dependencies and tools, making it easier to set up your development envrionment and ensuring consistent behavior across different environments.
12
+
When developing an [Actor](/sources/platform/actors/index.mdx) on the Apify platform, you can choose from a variety of pre-built Docker images to serve as the base for your Actor. These base images come with pre-installed dependencies and tools, making it easier to set up your development environment and ensuring consistent behavior across different environments.
13
13
14
14
## Base Docker images
15
15
@@ -105,7 +105,7 @@ By default, Apify base Docker images with the Apify SDK and Crawlee start your N
105
105
}
106
106
```
107
107
108
-
This means the system expects the source code to be in `main.js` by default. If you want to override this behavior, ues a custom `package.json` and/or `Dockerfile`.
108
+
This means the system expects the source code to be in `main.js` by default. If you want to override this behavior, use a custom `package.json` and/or `Dockerfile`.
Copy file name to clipboardExpand all lines: sources/platform/actors/development/actor_definition/output_schema.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -111,7 +111,7 @@ To set up the Actor's output tab UI using a single configuration file, use the f
111
111
}
112
112
```
113
113
114
-
The template above defines the configuration for the default dataset output view. Under the `views` property, there is one view titled _Overview_. The view configuartion consists of two main steps:
114
+
The template above defines the configuration for the default dataset output view. Under the `views` property, there is one view titled _Overview_. The view configuration consists of two main steps:
115
115
116
116
1.`transformation` - set up how to fetch the data.
117
117
2.`display` - set up how to visually present the fetched data.
@@ -124,7 +124,7 @@ The default behavior of the Output tab UI table is to display all fields from `t
124
124
125
125
Output configuration files need to be located in the `.actor` folder within the Actor's root directory.
126
126
127
-
You have two choices of how to organize files withing the `.actor` folder.
127
+
You have two choices of how to organize files within the `.actor` folder.
Copy file name to clipboardExpand all lines: sources/platform/api_v2/api_v2_reference.apib
+10-10Lines changed: 10 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -989,7 +989,7 @@ received in the response JSON to the [Get items](#reference/datasets/item-collec
989
989
otherwise it will have a transitional status (e.g. `RUNNING`).
990
990
+ webhooks: `dGhpcyBpcyBqdXN0IGV4YW1wbGUK...` (string, optional) - Specifies optional webhooks associated with the actor run, which can be used to receive a notification
991
991
e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks. For more information, see
@@ -1023,7 +1023,7 @@ received in the response JSON to the [Get items](#reference/datasets/item-collec
1023
1023
+ build: `0.1.234` (string, optional) - Specifies the actor build to run. It can be either a build tag or build number. By default, the run uses the build specified in the default run configuration for the actor (typically `latest`).
1024
1024
+ webhooks: `dGhpcyBpcyBqdXN0IGV4YW1wbGUK...` (string, optional) - Specifies optional webhooks associated with the actor run, which can be used to receive a notification
1025
1025
e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks. For more information, see
@@ -1141,7 +1141,7 @@ To run the actor asynchronously, use the [Run actor](#reference/actors/run-colle
1141
1141
+ build: `0.1.234` (string, optional) - Specifies the actor build to run. It can be either a build tag or build number. By default, the run uses the build specified in the default run configuration for the actor (typically `latest`).
1142
1142
+ webhooks: `dGhpcyBpcyBqdXN0IGV4YW1wbGUK...` (string, optional) - Specifies optional webhooks associated with the actor run, which can be used to receive a notification
1143
1143
e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks. For more information, see
+ format: `json` (string, optional) - Format of the results, possible values are: `json`, `jsonl`, `csv`, `html`, `xlsx`, `xml` and `rss`. The default value is `json`.
1146
1146
+ clean: `false` (boolean, optional) - If `true` or `1` then the API endpoint returns only non-empty items and skips hidden fields
1147
1147
(i.e. fields starting with the # character).
@@ -1758,7 +1758,7 @@ received in the response JSON to the [Get items](#reference/datasets/item-collec
1758
1758
e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks.
1759
1759
**Note**: if you already have a webhook set up for the actor or task, you do not have to add it again here.
@@ -1792,7 +1792,7 @@ received in the response JSON to the [Get items](#reference/datasets/item-collec
1792
1792
in the response. By default, it is `OUTPUT`.
1793
1793
+ webhooks: `dGhpcyBpcyBqdXN0IGV4YW1wbGUK...` (string, optional) - Specifies optional webhooks associated with the actor run, which can be used to receive a notification
1794
1794
e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks. For more information, see
@@ -1898,7 +1898,7 @@ To run the Task asynchronously, use the [Run task asynchronously](#reference/act
1898
1898
+ build: `0.1.234` (string, optional) - Specifies the actor build to run. It can be either a build tag or build number. By default, the run uses the build specified in the task settings (typically `latest`).
1899
1899
+ webhooks: `dGhpcyBpcyBqdXN0IGV4YW1wbGUK...` (string, optional) - Specifies optional webhooks associated with the actor run, which can be used to receive a notification
1900
1900
e.g. when the actor finished or failed. The value is a Base64-encoded JSON array of objects defining the webhooks. For more information, see
+ format: `json` (string, optional) - Format of the results, possible values are: `json`, `jsonl`, `csv`, `html`, `xlsx`, `xml` and `rss`. The default value is `json`.
1903
1903
+ clean: `false` (boolean, optional) - If `true` or `1` then the API endpoint returns only non-empty items and skips hidden fields
1904
1904
(i.e. fields starting with the # character).
@@ -3005,7 +3005,7 @@ The pagination is always performed with the granularity of a single item, regard
3005
3005
By default, the **Items** in the response are sorted by the time they were stored to the database, therefore you can use
3006
3006
pagination to incrementally fetch the items as they are being added.
3007
3007
The maximum number of items that will be returned in a single API call is limited to 250,000. <!-- GET_ITEMS_LIMIT -->
3008
-
If you specify `desc=1` query paremeter, the results are returned in the reverse order
3008
+
If you specify `desc=1` query parameter, the results are returned in the reverse order
3009
3009
than they were stored (i.e. from newest to oldest items).
3010
3010
Note that only the order of **Items** is reversed, but not the order of the `unwind` array elements.
3011
3011
@@ -3081,7 +3081,7 @@ The POST payload is a JSON object or a JSON array of objects to save into the da
3081
3081
**IMPORTANT:** The limit of request payload size for the dataset is 5 MB. If the array exceeds the size,
3082
3082
you'll need to split it into a number of smaller arrays.
3083
3083
3084
-
If the dataset has fields schema defined, the push request can potentialy fail with `400 Bad Request` if any item does not match the schema.
3084
+
If the dataset has fields schema defined, the push request can potentially fail with `400 Bad Request` if any item does not match the schema.
3085
3085
In such case, nothing will be inserted into the dataset and the response will contain an error message with a list of invalid items and their validation errors.
0 commit comments