You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Makes requests to `urls` and saves all files found with `sources` to `directory`.
41
+
Makes requests to `urls` and saves all files found with `sources` to `directory`.
42
42
43
43
**options** - object containing next options:
44
44
@@ -48,31 +48,34 @@ Makes requests to `urls` and saves all files found with `sources` to `directory`
48
48
-`sources:` array of objects to load, specifies selectors and attribute values to select files for loading *(optional, see default value in `lib/config/defaults.js`)*
49
49
-`subdirectories:` array of objects, specifies subdirectories for file extensions. If `null` all files will be saved to `directory`*(optional, see example below)*
50
50
-`request`: object, custom options for [request](https://github.com/request/request#requestoptions-callback)*(optional, see example below)*
51
-
52
-
51
+
-`recursive`: boolean, if `true` scraper will follow anchors in html files. Don't forget to set `maxDepth` to avoid infinite downloading *(optional, see example below)*
52
+
-`maxDepth`: positive number, maximum allowed depth for dependencies *(optional, see example below)*
53
+
54
+
53
55
**callback** - callback function *(optional)*, includes following parameters:
54
-
56
+
55
57
-`error:` if error - `Error` object, if success - `null`
56
58
-`result:` if error - `null`, if success - array if objects containing:
57
59
-`url:` url of loaded page
58
60
-`filename:` filename where page was saved (relative to `directory`)
59
61
60
62
61
63
## Examples
62
-
Let's scrape some pages from [http://nodejs.org/](http://nodejs.org/) with images, css, js files and save them to `/path/to/save/`.
64
+
#### Example 1
65
+
Let's scrape some pages from [http://nodejs.org/](http://nodejs.org/) with images, css, js files and save them to `/path/to/save/`.
63
66
Imagine we want to load:
64
67
-[Home page](http://nodejs.org/) to `index.html`
65
68
-[About page](http://nodejs.org/about/) to `about.html`
66
69
-[Blog](http://blog.nodejs.org/) to `blog.html`
67
-
70
+
68
71
and separate files into directories:
69
72
70
-
-`img` for .jpg, .png, .svg (full path `/path/to/save/img`)
73
+
-`img` for .jpg, .png, .svg (full path `/path/to/save/img`)
71
74
-`js` for .js (full path `/path/to/save/js`)
72
75
-`css` for .css (full path `/path/to/save/css`)
73
76
74
77
```javascript
75
-
var scraper =require('website-scraper');
78
+
var scraper =require('website-scraper');
76
79
scraper.scrape({
77
80
urls: [
78
81
'http://nodejs.org/', // Will be saved with default filename 'index.html'
@@ -101,3 +104,16 @@ scraper.scrape({
101
104
console.log(err);
102
105
});
103
106
```
107
+
108
+
#### Example 2. Recursive downloading
109
+
```javascript
110
+
// Links from example.com will be followed
111
+
// Links from links will be ignored because theirs depth = 2 is greater than maxDepth
0 commit comments