You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/academy/expert_scraping_with_apify/saving_useful_stats.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,16 +8,16 @@ paths:
8
8
9
9
# [](#savings-useful-run-statistics) Saving useful run statistics
10
10
11
-
Using the Apify SDK, we are now able to collect and format data coming directly from websites and save it into a Key-Value store or Dataset. This is great, but sometimes, we want to store some extra data about the run itself, or about each request. We might want to store some extra general run information separately from our results, or potentially include statistics about each request within its corresponding dataset item.
11
+
Using Crawlee and the Apify SDK, we are now able to collect and format data coming directly from websites and save it into a Key-Value store or Dataset. This is great, but sometimes, we want to store some extra data about the run itself, or about each request. We might want to store some extra general run information separately from our results, or potentially include statistics about each request within its corresponding dataset item.
12
12
13
13
The types of values that are saved are totally up to you, but the most common are error scores, number of total saved items, number of request retries, number of captchas hit, etc. Storing these values is not always necessary, but can be valuable when debugging and maintaining an actor. As your projects scale, this will become more and more useful and important.
14
14
15
15
## [](#learning) Learning 🧠
16
16
17
17
Before moving on, give these valuable resources a quick lookover:
18
18
19
-
- Refamiliarize with the various available data on the [Request object](https://sdk.apify.com/docs/api/request).
20
-
- Learn about the [`handleFailedRequest` function](https://sdk.apify.com/docs/typedefs/cheerio-crawler-options#handlefailedrequestfunction).
19
+
- Refamiliarize with the various available data on the [Request object](https://crawlee.dev/api/core/class/Request).
20
+
- Learn about the [`failedRequestHandler` function](https://crawlee.dev/api/browser-crawler/interface/BrowserCrawlerOptions#failedRequestHandler).
21
21
- Ensure you are comfortable using [key-value stores](https://sdk.apify.com/docs/guides/data-storage#key-value-store) and [datasets](https://sdk.apify.com/docs/api/dataset#__docusaurus), and understand the differences between the two storage types.
Copy file name to clipboardExpand all lines: content/academy/expert_scraping_with_apify/solutions/saving_stats.md
+18-38Lines changed: 18 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ paths:
11
11
The code in this solution will be similar to what we already did in the **Handling migrations** solution; however, we'll be storing and logging different data. First, let's create a new file called **Stats.js** and write a utility class for storing our run stats:
12
12
13
13
```JavaScript
14
-
constApify=require('apify');
14
+
importActorfrom'apify';
15
15
16
16
classStats {
17
17
constructor() {
@@ -22,12 +22,12 @@ class Stats {
22
22
}
23
23
24
24
asyncinitialize() {
25
-
constdata=awaitApify.getValue('STATS');
25
+
constdata=awaitActor.getValue('STATS');
26
26
27
27
if (data) this.state= data;
28
28
29
-
Apify.events.on('persistState', async () => {
30
-
awaitApify.setValue('STATS', this.state);
29
+
Actor.on('persistState', async () => {
30
+
awaitActor.setValue('STATS', this.state);
31
31
});
32
32
33
33
setInterval(() =>console.log(this.state), 10000);
@@ -50,24 +50,20 @@ Cool, very similar to the **AsinTracker** class we wrote earlier. We'll now impo
50
50
51
51
```JavaScript
52
52
// ...
53
-
constStats=require('./src/Stats');
53
+
importStatsfrom'./Stats.js';
54
54
55
-
const { log } =Apify.utils;
56
-
57
-
Apify.main(async () => {
58
-
awaitasinTracker.initialize();
59
-
awaitStats.initialize();
55
+
awaitActor.init();
56
+
awaitasinTracker.initialize();
57
+
awaitStats.initialize();
60
58
// ...
61
59
```
62
60
63
61
## [](#tracking-errors) Tracking errors
64
62
65
-
In order to keep track of errors, we must write a new function within the crawler's configuration called **handleFailedRequestFunction**. Passed into this function is an object containing an **Error** object for the error which occurred and the **Request** object, as well as information about the session and proxy which were used for the request.
63
+
In order to keep track of errors, we must write a new function within the crawler's configuration called **failedRequestHandler**. Passed into this function is an object containing an **Error** object for the error which occurred and the **Request** object, as well as information about the session and proxy which were used for the request.
66
64
67
65
```JavaScript
68
-
constcrawler=newApify.CheerioCrawler({
69
-
requestList,
70
-
requestQueue,
66
+
constcrawler=newCheerioCrawler({
71
67
proxyConfiguration,
72
68
useSessionPool:true,
73
69
sessionPoolOptions: {
@@ -78,25 +74,9 @@ const crawler = new Apify.CheerioCrawler({
## [](#saving-stats-with-dataset-items) Saving stats with dataset items
133
113
134
-
Still in the **handleOffers** function, we need to add a few extra keys to the items which are pushed to the dataset. Luckily, all of the data required by the task is easily accessible in the context object.
114
+
Still in the **OFFERS** handler, we need to add a few extra keys to the items which are pushed to the dataset. Luckily, all of the data required by the task is easily accessible in the context object.
**Q: Is storing these types of values necessary for every single actor?**
179
159
180
-
**A:** For small actors, it might be a waste of time to do this. For large-scale actors, it can be extremely helpful when debugging and most definitely worth the extra 10-20 minutes of development time. Usually though, the default statistics from the SDK might be enough for simple run stats.
160
+
**A:** For small actors, it might be a waste of time to do this. For large-scale actors, it can be extremely helpful when debugging and most definitely worth the extra 10-20 minutes of development time. Usually though, the default statistics from the Crawlee and the SDK might be enough for simple run stats.
0 commit comments