Skip to content

Commit ffcd02d

Browse files
authored
Update README.md
1 parent bbf0101 commit ffcd02d

File tree

1 file changed

+17
-23
lines changed

1 file changed

+17
-23
lines changed

README.md

Lines changed: 17 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,9 @@
99

1010
[Options](#usage) | [Plugins](#plugins) | [Log and debug](#log-and-debug) | [Frequently Asked Questions](https://github.com/website-scraper/node-website-scraper/blob/master/docs/FAQ.md) | [Contributing](https://github.com/website-scraper/node-website-scraper/blob/master/CONTRIBUTING.md) | [Code of Conduct](https://github.com/website-scraper/node-website-scraper/blob/master/CODE_OF_CONDUCT.md)
1111

12+
Download the website to the local directory (including all css, images, js, etc.)
1213

13-
Download website to local directory (including all css, images, js, etc.)
14-
15-
Try it using [demo app](https://github.com/website-scraper/demo)
16-
17-
**Note:** by default dynamic websites (where content is loaded by js) may be saved not correctly because `website-scraper` doesn't execute js, it only parses http responses for html and css files. If you need to download dynamic website take a look on [website-scraper-puppeteer](https://github.com/website-scraper/website-scraper-puppeteer).
14+
**Note:** by default dynamic websites (where content is loaded by js) may be saved not correctly because `website-scraper` doesn't execute js, it only parses http responses for html and css files. If you need to download a dynamic website take a look at [website-scraper-puppeteer](https://github.com/website-scraper/website-scraper-puppeteer).
1815

1916
This module is an Open Source Software maintained by one developer in free time. If you want to thank the author of this module you can use [GitHub Sponsors](https://github.com/sponsors/s0ph1e) or [Patreon](https://www.patreon.com/s0ph1e).
2017

@@ -314,47 +311,44 @@ registerAction('beforeRequest', async ({resource, requestOptions}) => {
314311
```
315312

316313
##### afterResponse
317-
Action afterResponse is called after each response, allows to customize resource or reject its saving.
314+
Action afterResponse is called after each response, it allows to customize resource or reject its saving.
318315

319316
Parameters - object which includes:
320317
* response - response object from http module [got](https://github.com/sindresorhus/got#response)
321318

322-
Should return resolved `Promise` if resource should be saved or rejected with Error `Promise` if it should be skipped.
323-
Promise should be resolved with:
324-
* the `response` object with the `body` modified in place as necessary.
325-
* or object with properties
326-
* `body` (response body, string)
327-
* `encoding` (`binary` or `utf8`) used to save the file, binary used by default.
328-
* `metadata` (object) - everything you want to save for this resource (like headers, original text, timestamps, etc.), scraper will not use this field at all, it is only for result.
329-
* a binary `string`. This is advised against because of the binary assumption being made can foul up saving of `utf8` responses to the filesystem.
319+
Return resolved `Promise` with:
320+
* object if the resource should be saved, object should contain next properties:
321+
* `body` (string, required)
322+
* `encoding` (`binary` or `utf8`) is used to save the file, binary is used by default.
323+
* `metadata` (object) - everything you want to save for this resource (like headers, original text, timestamps, etc.), scraper will not use this field at all, it is only for the result
324+
* or null if the resource should be skipped
330325

331-
If multiple actions `afterResponse` added - scraper will use result from last one.
326+
If multiple actions `afterResponse` are added - the scraper will use the result from the last one.
332327
```javascript
333-
// Do not save resources which responded with 404 not found status code
328+
// Do not save resources that responded with 404 not found status code
334329
registerAction('afterResponse', ({response}) => {
335330
if (response.statusCode === 404) {
336-
return null;
331+
return null;
337332
} else {
338-
// if you don't need metadata - you can just return Promise.resolve(response.body)
339333
return {
340334
body: response.body,
335+
encoding: 'utf8',
341336
metadata: {
342337
headers: response.headers,
343338
someOtherData: [ 1, 2, 3 ]
344-
},
345-
encoding: 'utf8'
339+
}
346340
}
347341
}
348342
});
349343
```
350344

351345
##### onResourceSaved
352-
Action onResourceSaved is called each time after resource is saved (to file system or other storage with 'saveResource' action).
346+
Action onResourceSaved is called each time after a resource is saved (to file system or other storage with 'saveResource' action).
353347

354348
Parameters- object which includes:
355349
* resource - [Resource](https://github.com/website-scraper/node-website-scraper/blob/master/lib/resource.js) object
356350

357-
Scraper ignores result returned from this action and does not wait until it is resolved
351+
Scraper ignores the result returned from this action and does not wait until it is resolved
358352
```javascript
359353
registerAction('onResourceSaved', ({resource}) => console.log(`Resource ${resource.url} saved!`));
360354
```
@@ -438,7 +432,7 @@ Array of [Resource](https://github.com/website-scraper/node-website-scraper/blob
438432

439433
## Log and debug
440434
This module uses [debug](https://github.com/visionmedia/debug) to log events. To enable logs you should use environment variable `DEBUG`.
441-
Next command will log everything from website-scraper
435+
The next command will log everything from website-scraper
442436
```bash
443437
export DEBUG=website-scraper*; node app.js
444438
```

0 commit comments

Comments
 (0)