|
9 | 9 |
|
10 | 10 | [Options](#usage) | [Plugins](#plugins) | [Log and debug](#log-and-debug) | [Frequently Asked Questions](https://github.com/website-scraper/node-website-scraper/blob/master/docs/FAQ.md) | [Contributing](https://github.com/website-scraper/node-website-scraper/blob/master/CONTRIBUTING.md) | [Code of Conduct](https://github.com/website-scraper/node-website-scraper/blob/master/CODE_OF_CONDUCT.md) |
11 | 11 |
|
| 12 | +Download the website to the local directory (including all css, images, js, etc.) |
12 | 13 |
|
13 | | -Download website to local directory (including all css, images, js, etc.) |
14 | | - |
15 | | -Try it using [demo app](https://github.com/website-scraper/demo) |
16 | | - |
17 | | -**Note:** by default dynamic websites (where content is loaded by js) may be saved not correctly because `website-scraper` doesn't execute js, it only parses http responses for html and css files. If you need to download dynamic website take a look on [website-scraper-puppeteer](https://github.com/website-scraper/website-scraper-puppeteer). |
| 14 | +**Note:** by default dynamic websites (where content is loaded by js) may be saved not correctly because `website-scraper` doesn't execute js, it only parses http responses for html and css files. If you need to download a dynamic website take a look at [website-scraper-puppeteer](https://github.com/website-scraper/website-scraper-puppeteer). |
18 | 15 |
|
19 | 16 | This module is an Open Source Software maintained by one developer in free time. If you want to thank the author of this module you can use [GitHub Sponsors](https://github.com/sponsors/s0ph1e) or [Patreon](https://www.patreon.com/s0ph1e). |
20 | 17 |
|
@@ -314,47 +311,44 @@ registerAction('beforeRequest', async ({resource, requestOptions}) => { |
314 | 311 | ``` |
315 | 312 |
|
316 | 313 | ##### afterResponse |
317 | | -Action afterResponse is called after each response, allows to customize resource or reject its saving. |
| 314 | +Action afterResponse is called after each response, it allows to customize resource or reject its saving. |
318 | 315 |
|
319 | 316 | Parameters - object which includes: |
320 | 317 | * response - response object from http module [got](https://github.com/sindresorhus/got#response) |
321 | 318 |
|
322 | | -Should return resolved `Promise` if resource should be saved or rejected with Error `Promise` if it should be skipped. |
323 | | -Promise should be resolved with: |
324 | | -* the `response` object with the `body` modified in place as necessary. |
325 | | -* or object with properties |
326 | | - * `body` (response body, string) |
327 | | - * `encoding` (`binary` or `utf8`) used to save the file, binary used by default. |
328 | | - * `metadata` (object) - everything you want to save for this resource (like headers, original text, timestamps, etc.), scraper will not use this field at all, it is only for result. |
329 | | -* a binary `string`. This is advised against because of the binary assumption being made can foul up saving of `utf8` responses to the filesystem. |
| 319 | +Return resolved `Promise` with: |
| 320 | + * object if the resource should be saved, object should contain next properties: |
| 321 | + * `body` (string, required) |
| 322 | + * `encoding` (`binary` or `utf8`) is used to save the file, binary is used by default. |
| 323 | + * `metadata` (object) - everything you want to save for this resource (like headers, original text, timestamps, etc.), scraper will not use this field at all, it is only for the result |
| 324 | + * or null if the resource should be skipped |
330 | 325 |
|
331 | | -If multiple actions `afterResponse` added - scraper will use result from last one. |
| 326 | +If multiple actions `afterResponse` are added - the scraper will use the result from the last one. |
332 | 327 | ```javascript |
333 | | -// Do not save resources which responded with 404 not found status code |
| 328 | +// Do not save resources that responded with 404 not found status code |
334 | 329 | registerAction('afterResponse', ({response}) => { |
335 | 330 | if (response.statusCode === 404) { |
336 | | - return null; |
| 331 | + return null; |
337 | 332 | } else { |
338 | | - // if you don't need metadata - you can just return Promise.resolve(response.body) |
339 | 333 | return { |
340 | 334 | body: response.body, |
| 335 | + encoding: 'utf8', |
341 | 336 | metadata: { |
342 | 337 | headers: response.headers, |
343 | 338 | someOtherData: [ 1, 2, 3 ] |
344 | | - }, |
345 | | - encoding: 'utf8' |
| 339 | + } |
346 | 340 | } |
347 | 341 | } |
348 | 342 | }); |
349 | 343 | ``` |
350 | 344 |
|
351 | 345 | ##### onResourceSaved |
352 | | -Action onResourceSaved is called each time after resource is saved (to file system or other storage with 'saveResource' action). |
| 346 | +Action onResourceSaved is called each time after a resource is saved (to file system or other storage with 'saveResource' action). |
353 | 347 |
|
354 | 348 | Parameters- object which includes: |
355 | 349 | * resource - [Resource](https://github.com/website-scraper/node-website-scraper/blob/master/lib/resource.js) object |
356 | 350 |
|
357 | | -Scraper ignores result returned from this action and does not wait until it is resolved |
| 351 | +Scraper ignores the result returned from this action and does not wait until it is resolved |
358 | 352 | ```javascript |
359 | 353 | registerAction('onResourceSaved', ({resource}) => console.log(`Resource ${resource.url} saved!`)); |
360 | 354 | ``` |
@@ -438,7 +432,7 @@ Array of [Resource](https://github.com/website-scraper/node-website-scraper/blob |
438 | 432 |
|
439 | 433 | ## Log and debug |
440 | 434 | This module uses [debug](https://github.com/visionmedia/debug) to log events. To enable logs you should use environment variable `DEBUG`. |
441 | | -Next command will log everything from website-scraper |
| 435 | +The next command will log everything from website-scraper |
442 | 436 | ```bash |
443 | 437 | export DEBUG=website-scraper*; node app.js |
444 | 438 | ``` |
|
0 commit comments