Skip to content

Commit 4a5610c

Browse files
committed
Merge remote-tracking branch 'upstream/dev'
2 parents b5f72fd + b410eae commit 4a5610c

File tree

9 files changed

+388
-326
lines changed

9 files changed

+388
-326
lines changed

.travis.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
language: node_js
2+
node_js:
3+
- "0.11"
4+
- "0.10"

README.md

Lines changed: 48 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,93 @@
11
##Introduction
2-
Node.js module for website's scraping with images, css, js, etc. Uses cheerio, request, bluebird, fs-extra, underscore.
2+
Node.js module for website's scraping with images, css, js, etc.
3+
4+
[![Build Status](https://img.shields.io/travis/s0ph1e/node-website-scraper/master.svg?style=flat)](https://travis-ci.org/s0ph1e/node-website-scraper)
5+
[![Code Climate](https://img.shields.io/codeclimate/github/s0ph1e/node-website-scraper.svg?style=flat)](https://codeclimate.com/github/s0ph1e/node-website-scraper)
6+
[![Version](https://img.shields.io/npm/v/website-scraper.svg?style=flat)](https://www.npmjs.org/package/website-scraper)
7+
[![Downloads](https://img.shields.io/npm/dm/website-scraper.svg?style=flat)](https://www.npmjs.org/package/website-scraper)
8+
[![Dependency Status](https://david-dm.org/s0ph1e/node-website-scraper.svg?style=flat)](https://david-dm.org/s0ph1e/node-website-scraper)
9+
10+
[![NPM Stats](https://nodei.co/npm/website-scraper.png?downloadRank=true&stars=true)](https://www.npmjs.org/package/website-scraper)
311

412
##Installation
513
`npm install website-scraper`
614

715
##Usage
816
```javascript
917
var scraper = require('website-scraper');
10-
scraper.scrape({
18+
var options = {
1119
url: 'http://nodejs.org/',
12-
path: '/path/to/save/',
13-
}, function (error, result){
20+
directory: '/path/to/save/',
21+
};
22+
23+
// with callback
24+
scraper.scrape(options, function (error, result) {
25+
/* some code here */
26+
});
27+
28+
// or with promise
29+
scraper.scrape(options).then(function (result) {
1430
/* some code here */
1531
});
1632
```
1733

1834
##API
1935
### scrape(options, callback)
20-
Makes request to `url` and saves all files found with `srcToLoad` to `path`.
36+
Makes request to `url` and saves all files found with `srcToLoad` to `directory`.
2137

2238
**options** - object containing next options:
2339

2440
- `url:` url to load *(required)*
25-
- `path:` path to save loaded files *(required)*
41+
- `directory:` path to save loaded files *(required)*
42+
- `paths:` array of objects, contains urls or relative paths to load and filenames for them (if is not set only `url` will be loaded) *(optional, see example below)*
2643
- `log:` boolean indicates whether to write the log to console *(optional, default: false)*
2744
- `indexFile:` filename for index page *(optional, default: 'index.html')*
2845
- `srcToLoad:` array of objects to load, specifies selectors and attribute values to select files for loading *(optional, see default value in `lib/defaults.js`)*
29-
- `directories:` array of objects, specifies relative directories for extensions. If `null` all files will be saved to `path` *(optional, see example below)*
46+
- `subdirectories:` array of objects, specifies subdirectories for extensions. If `null` all files will be saved to `directory` *(optional, see example below)*
3047

3148

3249
**callback** - callback function *(optional)*, includes following parameters:
3350

3451
- `error:` if error - `Error object`, if success - `null`
35-
- `result:` if error - `null`, if success - object containing:
36-
- `html:` html code of index page
52+
- `result:` if error - `null`, if success - array if objects containing:
53+
- `url:` url of loaded page
54+
- `filename:` absolute filename where page was saved
3755

3856

3957
##Examples
40-
Let's scrape [http://nodejs.org/](http://nodejs.org/) with images, css, js files and save them to `/path/to/save/`. Index page will be named 'myIndex.html', files will be separated into directories:
58+
Let's scrape some pages from [http://nodejs.org/](http://nodejs.org/) with images, css, js files and save them to `/path/to/save/`.
59+
Imagine we want to load:
60+
- [Home page](http://nodejs.org/) to `index.html`
61+
- [About page](http://nodejs.org/about/) to `about.html`
62+
- [Blog](http://blog.nodejs.org/) to `blog.html`
63+
64+
and separate files into directories:
4165

42-
- `img` for .jpg, .png (full path `/path/to/save/img`)
66+
- `img` for .jpg, .png, .svg (full path `/path/to/save/img`)
4367
- `js` for .js (full path `/path/to/save/js`)
4468
- `css` for .css (full path `/path/to/save/css`)
45-
- `font` for .ttf, .woff, .eot, .svg (full path `/path/to/save/font`)
4669

4770
```javascript
4871
scraper.scrape({
4972
url: 'http://nodejs.org/',
50-
path: '/path/to/save',
51-
indexFile: 'myIndex.html',
73+
directory: '/path/to/save',
74+
paths: [
75+
{path: '/', filename: 'index.html'},
76+
{path: '/about', filename: 'about.html'},
77+
{url: 'http://blog.nodejs.org/', filename: 'blog.html'}
78+
],
79+
subdirectories: [
80+
{directory: 'img', extensions: ['.jpg', '.png', '.svg']},
81+
{directory: 'js', extensions: ['.js']},
82+
{directory: 'css', extensions: ['.css']}
83+
],
5284
srcToLoad: [
5385
{selector: 'img', attr: 'src'},
5486
{selector: 'link[rel="stylesheet"]', attr: 'href'},
5587
{selector: 'script', attr: 'src'}
56-
],
57-
directories: [
58-
{directory: 'img', extensions: ['.jpg', '.png']},
59-
{directory: 'js', extensions: ['.js']},
60-
{directory: 'css', extensions: ['.css']},
61-
{directory: 'fonts', extensions: ['.ttf', '.woff', '.eot', '.svg']}
6288
]
63-
}, function (error, result){
64-
console.log(result);
89+
}).then(function (result) {
90+
console.log(result);
6591
});
6692
```
6793

lib/defaults.js renamed to lib/config/defaults.js

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ var config = {
3535
attr: 'href'
3636
},
3737
],
38-
directories: [
38+
subdirectories: [
3939
{
4040
directory: 'images',
4141
extensions: ['.png', '.jpg', '.jpeg', '.gif']
@@ -55,4 +55,4 @@ var config = {
5555
]
5656
};
5757

58-
module.exports = config;
58+
module.exports = config;

0 commit comments

Comments
 (0)