Skip to content

Commit 9f4ff88

Browse files
committed
handling_migrations
1 parent 7c5c09b commit 9f4ff88

File tree

3 files changed

+37
-32
lines changed

3 files changed

+37
-32
lines changed

content/academy/expert_scraping_with_apify/migrations_maintaining_state.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ paths:
88

99
# [](#migrations-maintaining-state) Migrations & maintaining state
1010

11-
We already know that actors are basically just Docker containers that can be run on any server. This means that they can be allocated anywhere there is space available, making them very efficient. Unfortunately, there is one big caveat: actors move - a lot. When an actor moves, it is called **migration**.
11+
We already know that actors are basically just Docker containers that can be run on any server. This means that they can be allocated anywhere there is space available, making them very efficient. Unfortunately, there is one big caveat: actors move - a lot. When an actor moves, it is called a **migration**.
1212

1313
On migration, the process inside of an actor is completely restarted and everything in its memory is lost, meaning that any values stored within variables or classes are lost.
1414

@@ -24,7 +24,7 @@ Before moving forward, read about actor [events](https://sdk.apify.com/docs/api/
2424

2525
1. Actors have an option the **Settings** tab to **Restart on error**. Would you use this feature for regular actors? When would you use this feature?
2626
2. Migrations happen randomly, but by [aborting **gracefully**](https://docs.apify.com/actors/running#aborting-runs), you can simulate a similar situation. Try this out on the platform and observe what happens. What changes occur, and what remains the same for the restarted actor's run?
27-
3. Why don't you (usually) need to add any special migration handling code for a standard crawling/scraping actor? Are there any features in the Apify SDK that handle this under the hood?
27+
3. Why don't you (usually) need to add any special migration handling code for a standard crawling/scraping actor? Are there any features in the Crawlee/Apify SDK that handle this under the hood?
2828
4. How can you intercept the migration event? How much time do you have after this event happens and before the actor migrates?
2929
5. When would you persist data to the default key-value store instead of to a named key-value store?
3030

content/academy/expert_scraping_with_apify/solutions/handling_migrations.md

Lines changed: 34 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,14 @@ Here is our updated **routes.js** file which is now utilizing this utility class
4747

4848
```JavaScript
4949
// routes.js
50-
const { BASE_URL, OFFERS_URL, labels } = require('./constants');
51-
const tracker = require('./asinTracker');
50+
import { createCheerioRouter } from '@crawlee/cheerio';
51+
import { BASE_URL, OFFERS_URL, labels } from './constants';
52+
import tracker from './asinTracker';
53+
import { dataset } from './main.js';
5254

53-
exports.handleStart = async ({ $, crawler: { requestQueue }, request }) => {
55+
export const router = createCheerioRouter();
56+
57+
router.addHandler(labels.START, async ({ $, crawler, request }) => {
5458
const { keyword } = request.userData;
5559

5660
const products = $('div > div[data-asin]:not([data-asin=""])');
@@ -65,7 +69,7 @@ exports.handleStart = async ({ $, crawler: { requestQueue }, request }) => {
6569
// and initialize its collected offers count to 0
6670
tracker.incrementASIN(element.attr('data-asin'));
6771

68-
await requestQueue.addRequest({
72+
await crawler.addRequest([{
6973
url,
7074
userData: {
7175
label: labels.PRODUCT,
@@ -76,16 +80,16 @@ exports.handleStart = async ({ $, crawler: { requestQueue }, request }) => {
7680
keyword,
7781
},
7882
},
79-
});
83+
}]);
8084
}
81-
};
85+
});
8286

83-
exports.handleProduct = async ({ $, crawler: { requestQueue }, request }) => {
87+
router.addHandler(labels.PRODUCT, async ({ $, crawler, request }) => {
8488
const { data } = request.userData;
8589

8690
const element = $('div#productDescription');
8791

88-
await requestQueue.addRequest({
92+
await crawler.addRequests([{
8993
url: OFFERS_URL(data.asin),
9094
userData: {
9195
label: labels.OFFERS,
@@ -94,10 +98,10 @@ exports.handleProduct = async ({ $, crawler: { requestQueue }, request }) => {
9498
description: element.text().trim(),
9599
},
96100
},
97-
});
98-
};
101+
}]);
102+
});
99103

100-
exports.handleOffers = async ({ $, request }, dataset) => {
104+
router.addHandler(labels.OFFERS, async ({ $, request }) => {
101105
const { data } = request.userData;
102106

103107
const { asin } = data;
@@ -115,18 +119,18 @@ exports.handleOffers = async ({ $, request }, dataset) => {
115119
offer: element.find('.a-price .a-offscreen').text().trim(),
116120
});
117121
}
118-
};
122+
});
119123
```
120124

121125
## [](#persisting-state) Persisting state
122126

123127
The **persistState** event is automatically fired (by default) every 60 seconds by the Apify SDK while the actor is running, and is also fired when the **migrating** event occurs.
124128

125-
In order to persist our ASIN tracker object, let's use the `Apify.events.on` function to listen for the **persistState** event and store it in the key-value store each time it is emitted.
129+
In order to persist our ASIN tracker object, let's use the `Actor.on` function to listen for the **persistState** event and store it in the key-value store each time it is emitted.
126130

127131
```JavaScript
128132
// asinTracker.js
129-
const Apify = require('apify');
133+
import { Actor } from 'apify';
130134
// We've updated our constants.js file to include the name
131135
// of this new key in the key-value store
132136
const { ASIN_TRACKER } = require('./constants');
@@ -135,8 +139,8 @@ class ASINTracker {
135139
constructor() {
136140
this.state = {};
137141

138-
Apify.events.on('persistState', async () => {
139-
await Apify.setValue(ASIN_TRACKER, this.state);
142+
Actor.on('persistState', async () => {
143+
await Actor.setValue(ASIN_TRACKER, this.state);
140144
});
141145

142146
setInterval(() => console.log(this.state), 10000);
@@ -163,15 +167,15 @@ In order to fix this, let's create a method called `initialize` which will be ca
163167

164168
```JavaScript
165169
// asinTracker.js
166-
const Apify = require('apify');
167-
const { ASIN_TRACKER } = require('./constants');
170+
import { Actor } from 'apify';
171+
import { ASIN_TRACKER } from './constants';
168172

169173
class ASINTracker {
170174
constructor() {
171175
this.state = {};
172176

173-
Apify.events.on('persistState', async () => {
174-
await Apify.setValue(ASIN_TRACKER, this.state);
177+
Actor.on('persistState', async () => {
178+
await Actor.setValue(ASIN_TRACKER, this.state);
175179
});
176180

177181
setInterval(() => console.log(this.state), 10000);
@@ -180,7 +184,7 @@ class ASINTracker {
180184
async initialize() {
181185
// Read the data from the key-value store. If it
182186
// doesn't exist, it will be undefined
183-
const data = await Apify.getValue(ASIN_TRACKER);
187+
const data = await Actor.getValue(ASIN_TRACKER);
184188

185189
// If the data does exist, replace the current state
186190
// (initialized as an empty object) with the data
@@ -200,18 +204,19 @@ class ASINTracker {
200204
module.exports = new ASINTracker();
201205
```
202206

203-
We'll now call this function at the top level of the `Apify.main` function in **main.js** to ensure it is the first thing that gets called when the actor starts up:
207+
We'll now call this function at the top level of the **main.js** file to ensure it is the first thing that gets called when the actor starts up:
204208

205209
```JavaScript
206210
// main.js
207211

208212
// ...
209-
const tracker = require('./src/asinTracker');
213+
import tracker from './asinTracker';
210214

211-
const { log } = Apify.utils;
215+
// The Actor.init() function should be executed before
216+
// the tracker's initialization
217+
await Actor.init();
212218

213-
Apify.main(async () => {
214-
await tracker.initialize();
219+
await tracker.initialize();
215220
// ...
216221
```
217222

@@ -227,13 +232,13 @@ That's everything! Now, even if the actor migrates (or is gracefully aborted the
227232

228233
**A:** After aborting or throwing an error mid-process, it manages to start back from where it was upon resurrection.
229234

230-
**Q: Why don't you (usually) need to add any special migration handling code for a standard crawling/scraping actor? Are there any features in the Apify SDK that handle this under the hood?**
235+
**Q: Why don't you (usually) need to add any special migration handling code for a standard crawling/scraping actor? Are there any features in the Crawlee/Apify SDK that handle this under the hood?**
231236

232-
**A:** Because Apify SDK handles all of the migration handling code for us. If you want to add custom migration-handling code, you can use `Apify.events` to listen for the `migrating` or `persistState` events to save the current state in key-value store (or elsewhere).
237+
**A:** Because Apify SDK handles all of the migration handling code for us. If you want to add custom migration-handling code, you can use `Actor.events` to listen for the `migrating` or `persistState` events to save the current state in key-value store (or elsewhere).
233238

234239
**Q: How can you intercept the migration event? How much time do you have after this event happens and before the actor migrates?**
235240

236-
**A:** By using the `Apify.events.on` function. You have a maximum of a few seconds before shutdown after the `migrating` event has been fired.
241+
**A:** By using the `Actor.on` function. You have a maximum of a few seconds before shutdown after the `migrating` event has been fired.
237242

238243
**Q: When would you persist data to the default key-value store instead of to a named key-value store?**
239244

content/academy/expert_scraping_with_apify/solutions/using_storage_creating_tasks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ await Actor.init();
119119

120120
const { keyword } = await Actor.getInput();
121121

122-
const dataset = await Actor.openDataset(`amazon-offers-${keyword.replace(' ', '-')}`);
122+
export const dataset = await Actor.openDataset(`amazon-offers-${keyword.replace(' ', '-')}`);
123123

124124
const proxyConfiguration = await Actor.createProxyConfiguration({
125125
groups: ['RESIDENTIAL'],

0 commit comments

Comments
 (0)