You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/glossary/concepts/http_headers.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ For some websites, you won't need to worry about modifying headers at all, as th
23
23
24
24
Some websites will require certain default browser headers to work properly, such as **User-Agent** (though, this header is becoming more obsolete, as there are more sophisticated ways to detect and block a suspicious user).
25
25
26
-
Another example of such a "default" header is **Referer**. Some e-commerce websites might share the same platform, and data is loaded through XMLHttpRequests to that platform, which simply would not know which data to return without knowing which exact website is requesting it.
26
+
Another example of such a "default" header is **Referer**. Some e-commerce websites might share the same platform, and data is loaded through XMLHttpRequests to that platform, which would not know which data to return without knowing which exact website is requesting it.
@@ -44,7 +44,7 @@ You could use Chrome DevTools to inspect request headers, and [Insomnia](../tool
44
44
HTTP/1.1 and HTTP/2 headers have several differences. Here are the three key differences that you should be aware of:
45
45
46
46
1. HTTP/2 headers do not include status messages. They only contain status codes.
47
-
2. Certain headers are no longer used in HTTP/2 (such as **Connection** along with a few others related to it like **Keep-Alive**). In HTTP/2, connection-specific headers are prohibited. While some browsers will simply ignore them, Safari and other Webkit-based browsers will outright reject any response that contains them. Easy to do by accident, and a big problem.
47
+
2. Certain headers are no longer used in HTTP/2 (such as **Connection** along with a few others related to it like **Keep-Alive**). In HTTP/2, connection-specific headers are prohibited. While some browsers will ignore them, Safari and other Webkit-based browsers will outright reject any response that contains them. Easy to do by accident, and a big problem.
48
48
3. While HTTP/1.1 headers are case-insensitive and could be sent by the browsers with capitalized letters (e.g. **Accept-Encoding**, **Cache-Control**, **User-Agent**), HTTP/2 headers must be lower-cased (e.g. **accept-encoding**, **cache-control**, **user-agent**).
49
49
50
50
> To learn more about the difference between HTTP/1.1 and HTTP/2 headers, check out [this](https://httptoolkit.tech/blog/translating-http-2-into-http-1/) article
Copy file name to clipboardExpand all lines: sources/academy/glossary/tools/edit_this_cookie.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ At the top of the popup, there is a row of buttons. From left to right, here is
21
21
22
22
### Delete all cookies
23
23
24
-
Clicking this button will simply remove all cookies associated with the current domain. For example, if you're logged into your Apify account and delete all the cookies, the website will ask you to log in again.
24
+
Clicking this button will remove all cookies associated with the current domain. For example, if you're logged into your Apify account and delete all the cookies, the website will ask you to log in again.
**Q: What do you need to do to rotate a proxy (one proxy usually has one IP)? How does this differ for CheerioCrawler and PuppeteerCrawler?**
108
108
109
-
**A:**Simply making a new request with the proxy endpoint above will automatically rotate it. Sessions can also be used to automatically do this. While proxy rotation is fairly straightforward for Cheerio, it's more complex in Puppeteer, as you have to retire the browser each time a new proxy is rotated in. The SessionPool will automatically retire a browser when a session is retired. Sessions can be manually retired with `session.retire()`.
109
+
**A:**Making a new request with the proxy endpoint above will automatically rotate it. Sessions can also be used to automatically do this. While proxy rotation is fairly straightforward for Cheerio, it's more complex in Puppeteer, as you have to retire the browser each time a new proxy is rotated in. The SessionPool will automatically retire a browser when a session is retired. Sessions can be manually retired with `session.retire()`.
110
110
111
111
**Q: Name a few different ways how a website can prevent you from scraping it.**
Copy file name to clipboardExpand all lines: sources/academy/platform/get_most_of_actors/actor_readme.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,7 +57,7 @@ Aim for sections 1–6 below and try to include at least 300 words. You can move
57
57
58
58
- Add a video tutorial or GIF from an ideal Actor run.
59
59
60
-
> Tip: For better user experience, Apify Console automatically renders every YouTube URL as an embedded video player. Simply add a separate line with the URL of your YouTube video.
60
+
> Tip: For better user experience, Apify Console automatically renders every YouTube URL as an embedded video player. Add a separate line with the URL of your YouTube video.
61
61
62
62
- Consider adding a short numbered tutorial as Google will sometimes pick these up as rich snippets. Remember that this might be in search results, so you can repeat the name of the Actor and give a link, e.g.
Copy file name to clipboardExpand all lines: sources/academy/platform/getting_started/actors.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ After you've followed the **Getting started** lesson, you're almost ready to sta
15
15
16
16
## What's an Actor? {#what-is-an-actor}
17
17
18
-
When you deploy your script to the Apify platform, it is then called an **Actor**, which is simply a [serverless microservice](https://www.datadoghq.com/knowledge-center/serverless-architecture/serverless-microservices/#:~:text=Serverless%20microservices%20are%20cloud-based,suited%20for%20microservice-based%20architectures.) that accepts an input and produces an output. Actors can run for a few seconds, hours or even infinitely. An Actor can perform anything from a simple action such as filling out a web form or sending an email, to complex operations such as crawling an entire website and removing duplicates from a large dataset.
18
+
When you deploy your script to the Apify platform, it is then called an **Actor**, which is a [serverless microservice](https://www.datadoghq.com/knowledge-center/serverless-architecture/serverless-microservices/#:~:text=Serverless%20microservices%20are%20cloud-based,suited%20for%20microservice-based%20architectures.) that accepts an input and produces an output. Actors can run for a few seconds, hours or even infinitely. An Actor can perform anything from a simple action such as filling out a web form or sending an email, to complex operations such as crawling an entire website and removing duplicates from a large dataset.
19
19
20
20
Once an Actor has been pushed to the Apify platform, they can be shared to the world through the [Apify Store](https://apify.com/store), and even monetized after going public.
Copy file name to clipboardExpand all lines: sources/academy/platform/getting_started/apify_api.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ Our **adding-actor** takes in two input values (`num1` and `num2`). When using t
39
39
40
40
## Parameters {#parameters}
41
41
42
-
Let's say we want to run our **adding-actor** via API and view its results in CSV format at the end. We'll achieve this by simply passing the **format** parameter with a value of **csv** to change the output format:
42
+
Let's say we want to run our **adding-actor** via API and view its results in CSV format at the end. We'll achieve this by passing the **format** parameter with a value of **csv** to change the output format:
Copy file name to clipboardExpand all lines: sources/academy/platform/getting_started/creating_actors.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,7 +68,7 @@ If you want to use the template locally, you can again use our [Apify CLI](/cli)
68
68
69
69
When you click on the **Use locally** button, you'll be presented with instructions on how to create an Actor from this template in your local environment.
70
70
71
-
With the Apify CLI installed, you can simply run the following commands in your terminal:
71
+
With the Apify CLI installed, you can run the following commands in your terminal:
72
72
73
73
```shell
74
74
apify create my-actor -t getting_started_node
@@ -153,7 +153,7 @@ And now we are ready to run the Actor. But before we do that, let's give the Act
153
153
154
154
The input tab is where you can provide the Actor with some meaningful input. In this case, we'll be providing the Actor with a URL to scrape. For now, we'll use the prefilled value of [Apify website](https://apify.com/) (`https://apify.com/`).
155
155
156
-
You can change the website you want to extract the data from by simply changing the URL in the input field.
156
+
You can change the website you want to extract the data from by changing the URL in the input field.
157
157
158
158

159
159
@@ -163,7 +163,7 @@ Once you have provided the Actor with some URL you want to extract the data from
163
163
164
164

165
165
166
-
After the Actor finishes, you can preview or download the extracted data simply by clicking on the **Export X results** button.
166
+
After the Actor finishes, you can preview or download the extracted data by clicking on the **Export X results** button.
Copy file name to clipboardExpand all lines: sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,7 +56,7 @@ If we send a correct POST request to one of these endpoints, the actor/actor-tas
56
56
57
57
### Additional settings {#additional-settings}
58
58
59
-
We can also add settings for the Actor (which will override the default settings) as additional query parameters. For example, if we wanted to change how much memory the Actor's run should be allocated and which build to run, we could simply add the `memory` and `build` parameters separated by `&`.
59
+
We can also add settings for the Actor (which will override the default settings) as additional query parameters. For example, if we wanted to change how much memory the Actor's run should be allocated and which build to run, we could add the `memory` and `build` parameters separated by `&`.
@@ -201,7 +201,7 @@ For runs longer than 5 minutes, the process consists of three steps:
201
201
202
202
### Wait for the run to finish {#wait-for-the-run-to-finish}
203
203
204
-
There may be cases where we need to simply run the Actor and go away. But in any kind of integration, we are usually interested in its output. We have three basic options for how to wait for the actor/task to finish.
204
+
There may be cases where we need to run the Actor and go away. But in any kind of integration, we are usually interested in its output. We have three basic options for how to wait for the actor/task to finish.
@@ -221,7 +221,7 @@ Once again, the final response will be the **run info object**; however, now its
221
221
222
222
#### Webhooks {#webhooks}
223
223
224
-
If you have a server, [webhooks](/platform/integrations/webhooks) are the most elegant and flexible solution for integrations with Apify. You can simply set up a webhook for any Actor or task, and that webhook will send a POST request to your server after an [event](/platform/integrations/webhooks/events) has occurred.
224
+
If you have a server, [webhooks](/platform/integrations/webhooks) are the most elegant and flexible solution for integrations with Apify. You can set up a webhook for any Actor or task, and that webhook will send a POST request to your server after an [event](/platform/integrations/webhooks/events) has occurred.
225
225
226
226
Usually, this event is a successfully finished run, but you can also set a different webhook for failed runs, etc.
227
227
@@ -239,7 +239,7 @@ What if you don't have a server, and the run you'd like to do is much too long t
239
239
240
240
When we run the Actor with the [usual API call](#run-an-actor-or-task) shown above, we will back a response with the **run info** object. From this JSON object, we can then extract the ID of the Actor run that we just started from the `id` field. Then, we can set an interval that will poll the Apify API (let's say every 5 seconds) by calling the [**Get run**](https://apify.com/docs/api/v2#/reference/actors/run-object/get-run) endpoint to retrieve the run's status.
241
241
242
-
Simply replace the `RUN_ID` in the following URL with the ID you extracted earlier:
242
+
Replace the `RUN_ID` in the following URL with the ID you extracted earlier:
0 commit comments