Skip to content

Commit 5f459e4

Browse files
committed
Update README.md
1 parent 2a796ae commit 5f459e4

File tree

1 file changed

+40
-25
lines changed

1 file changed

+40
-25
lines changed

README.md

Lines changed: 40 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
11
##Introduction
2-
Node.js module for website's scraping with images, css, js, etc. Uses cheerio, request, bluebird, fs-extra, underscore.
2+
Node.js module for website's scraping with images, css, js, etc.
33

44
[![Build Status](https://travis-ci.org/s0ph1e/node-website-scraper.svg?branch=master)](https://travis-ci.org/s0ph1e/node-website-scraper)
5-
[![Code Climate](https://codeclimate.com/github/s0ph1e/node-website-scraper/badges/gpa.svg)](https://codeclimate.com/github/s0ph1e/node-website-scraper)
65
[![Version](https://img.shields.io/npm/v/website-scraper.svg)](https://www.npmjs.org/package/website-scraper)
76
[![Downloads](https://img.shields.io/npm/dm/website-scraper.svg)](https://www.npmjs.org/package/website-scraper)
8-
[![Dependency Status](https://david-dm.org/s0ph1e/node-website-scraper.svg)](https://david-dm.org/s0ph1e/node-website-scraper)
9-
107

118
[![NPM Stats](https://nodei.co/npm/website-scraper.png?downloadRank=true&stars=true)](https://www.npmjs.org/package/website-scraper)
129

@@ -16,61 +13,79 @@ Node.js module for website's scraping with images, css, js, etc. Uses cheerio, r
1613
##Usage
1714
```javascript
1815
var scraper = require('website-scraper');
19-
scraper.scrape({
16+
var options = {
2017
url: 'http://nodejs.org/',
21-
path: '/path/to/save/',
22-
}, function (error, result){
18+
directory: '/path/to/save/',
19+
};
20+
21+
// with callback
22+
scraper.scrape(options, function (error, result) {
23+
/* some code here */
24+
});
25+
26+
// or with promise
27+
scraper.scrape(options).then(function (result) {
2328
/* some code here */
2429
});
2530
```
2631

2732
##API
2833
### scrape(options, callback)
29-
Makes request to `url` and saves all files found with `srcToLoad` to `path`.
34+
Makes request to `url` and saves all files found with `srcToLoad` to `directory`.
3035

3136
**options** - object containing next options:
3237

3338
- `url:` url to load *(required)*
34-
- `path:` path to save loaded files *(required)*
39+
- `directory:` path to save loaded files *(required)*
40+
- `paths:` array of objects, contains urls or relative paths to load and filenames for them (if is not set only `url` will be loaded) *(optional, see example below)*
3541
- `log:` boolean indicates whether to write the log to console *(optional, default: false)*
3642
- `indexFile:` filename for index page *(optional, default: 'index.html')*
3743
- `srcToLoad:` array of objects to load, specifies selectors and attribute values to select files for loading *(optional, see default value in `lib/defaults.js`)*
38-
- `directories:` array of objects, specifies relative directories for extensions. If `null` all files will be saved to `path` *(optional, see example below)*
44+
- `subdirectories:` array of objects, specifies subdirectories for extensions. If `null` all files will be saved to `directory` *(optional, see example below)*
3945

4046

4147
**callback** - callback function *(optional)*, includes following parameters:
4248

4349
- `error:` if error - `Error object`, if success - `null`
44-
- `result:` if error - `null`, if success - object containing:
45-
- `html:` html code of index page
50+
- `result:` if error - `null`, if success - array if objects containing:
51+
- `url:` url of loaded page
52+
- `filename:` absolute filename where page was saved
4653

4754

4855
##Examples
49-
Let's scrape [http://nodejs.org/](http://nodejs.org/) with images, css, js files and save them to `/path/to/save/`. Index page will be named 'myIndex.html', files will be separated into directories:
56+
Let's scrape some pages from [http://nodejs.org/](http://nodejs.org/) with images, css, js files and save them to `/path/to/save/`.
57+
Imagine we want to load:
58+
- [Home page](http://nodejs.org/) to `index.html`
59+
- [About page](http://nodejs.org/about/) to `about.html`
60+
- [Blog](http://blog.nodejs.org/) to `blog.html`
61+
62+
and separate files into directories:
5063

51-
- `img` for .jpg, .png (full path `/path/to/save/img`)
64+
- `img` for .jpg, .png, .svg (full path `/path/to/save/img`)
5265
- `js` for .js (full path `/path/to/save/js`)
5366
- `css` for .css (full path `/path/to/save/css`)
54-
- `font` for .ttf, .woff, .eot, .svg (full path `/path/to/save/font`)
5567

5668
```javascript
5769
scraper.scrape({
5870
url: 'http://nodejs.org/',
59-
path: '/path/to/save',
60-
indexFile: 'myIndex.html',
71+
directory: '/path/to/save',
72+
paths: [
73+
{path: '/', filename: 'index.html'},
74+
{path: '/about', filename: 'about.html'},
75+
{url: 'http://blog.nodejs.org/', filename: 'blog.html'}
76+
],
77+
subdirectories: [
78+
{directory: 'img', extensions: ['.jpg', '.png', '.svg']},
79+
{directory: 'js', extensions: ['.js']},
80+
{directory: 'css', extensions: ['.css']}
81+
],
6182
srcToLoad: [
6283
{selector: 'img', attr: 'src'},
6384
{selector: 'link[rel="stylesheet"]', attr: 'href'},
6485
{selector: 'script', attr: 'src'}
65-
],
66-
directories: [
67-
{directory: 'img', extensions: ['.jpg', '.png']},
68-
{directory: 'js', extensions: ['.js']},
69-
{directory: 'css', extensions: ['.css']},
70-
{directory: 'fonts', extensions: ['.ttf', '.woff', '.eot', '.svg']}
7186
]
72-
}, function (error, result){
73-
console.log(result);
87+
}).then(function (result) {
88+
console.log(result);
7489
});
7590
```
7691

0 commit comments

Comments
 (0)