Skip to content

Commit abc5f52

Browse files
committed
Update README.md
Add description of recursive and maxdepth options
1 parent 0d1ba41 commit abc5f52

File tree

1 file changed

+25
-9
lines changed

1 file changed

+25
-9
lines changed

README.md

Lines changed: 25 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ npm install website-scraper
1919

2020
## Usage
2121
```javascript
22-
var scraper = require('website-scraper');
22+
var scraper = require('website-scraper');
2323
var options = {
2424
urls: ['http://nodejs.org/'],
2525
directory: '/path/to/save/',
@@ -38,7 +38,7 @@ scraper.scrape(options).then(function (result) {
3838

3939
## API
4040
### scrape(options, callback)
41-
Makes requests to `urls` and saves all files found with `sources` to `directory`.
41+
Makes requests to `urls` and saves all files found with `sources` to `directory`.
4242

4343
**options** - object containing next options:
4444

@@ -48,31 +48,34 @@ Makes requests to `urls` and saves all files found with `sources` to `directory`
4848
- `sources:` array of objects to load, specifies selectors and attribute values to select files for loading *(optional, see default value in `lib/config/defaults.js`)*
4949
- `subdirectories:` array of objects, specifies subdirectories for file extensions. If `null` all files will be saved to `directory` *(optional, see example below)*
5050
- `request`: object, custom options for [request](https://github.com/request/request#requestoptions-callback) *(optional, see example below)*
51-
52-
51+
- `recursive`: boolean, if `true` scraper will follow anchors in html files. Don't forget to set `maxDepth` to avoid infinite downloading *(optional, see example below)*
52+
- `maxDepth`: positive number, maximum allowed depth for dependencies *(optional, see example below)*
53+
54+
5355
**callback** - callback function *(optional)*, includes following parameters:
54-
56+
5557
- `error:` if error - `Error` object, if success - `null`
5658
- `result:` if error - `null`, if success - array if objects containing:
5759
- `url:` url of loaded page
5860
- `filename:` filename where page was saved (relative to `directory`)
5961

6062

6163
## Examples
62-
Let's scrape some pages from [http://nodejs.org/](http://nodejs.org/) with images, css, js files and save them to `/path/to/save/`.
64+
#### Example 1
65+
Let's scrape some pages from [http://nodejs.org/](http://nodejs.org/) with images, css, js files and save them to `/path/to/save/`.
6366
Imagine we want to load:
6467
- [Home page](http://nodejs.org/) to `index.html`
6568
- [About page](http://nodejs.org/about/) to `about.html`
6669
- [Blog](http://blog.nodejs.org/) to `blog.html`
67-
70+
6871
and separate files into directories:
6972

70-
- `img` for .jpg, .png, .svg (full path `/path/to/save/img`)
73+
- `img` for .jpg, .png, .svg (full path `/path/to/save/img`)
7174
- `js` for .js (full path `/path/to/save/js`)
7275
- `css` for .css (full path `/path/to/save/css`)
7376

7477
```javascript
75-
var scraper = require('website-scraper');
78+
var scraper = require('website-scraper');
7679
scraper.scrape({
7780
urls: [
7881
'http://nodejs.org/', // Will be saved with default filename 'index.html'
@@ -101,3 +104,16 @@ scraper.scrape({
101104
console.log(err);
102105
});
103106
```
107+
108+
#### Example 2. Recursive downloading
109+
```javascript
110+
// Links from example.com will be followed
111+
// Links from links will be ignored because theirs depth = 2 is greater than maxDepth
112+
var scraper = require('website-scraper');
113+
scraper.scrape({
114+
urls: ['http://example.com/'],
115+
directory: '/path/to/save',
116+
recursive: true,
117+
maxDepth: 1
118+
}).then(console.log).catch(console.log);
119+
```

0 commit comments

Comments
 (0)