Skip to content

Commit e2354ee

Browse files
author
Sophia Nepochataya
committed
Create README.md
1 parent 200132d commit e2354ee

File tree

1 file changed

+74
-0
lines changed

1 file changed

+74
-0
lines changed

README.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
##Introduction
2+
Node.js module for website's scraping with images, css, js, etc. Uses cheerio, request, bluebird, fs-extra, underscore.
3+
4+
##Installation
5+
`npm install website-scraper`
6+
7+
##Usage
8+
```javascript
9+
var scraper = require('website-scraper');
10+
scraper.scrape({
11+
url: 'http://nodejs.org/',
12+
path: '/path/to/save/',
13+
}, function (error, result){
14+
/* some code here */
15+
});
16+
```
17+
18+
##API
19+
### scrape(options, callback)
20+
Makes request to `url` and saves all files found with `srcToLoad` to `path`.
21+
22+
**options** - object containing next options:
23+
24+
- `url:` url to load *(required)*
25+
- `path:` path to save loaded files *(required)*
26+
- `log:` boolean indicates whether to write the log to console *(optional, default: false)*
27+
- `indexFile:` filename for index page *(optional, default: 'index.html')*
28+
- `srcToLoad:` array of objects to load, specifies selectors and attribute values to select files for loading *(optional, see example below)*
29+
- `directories:` array of objects, specifies relative directories for extensions. If `null` all files will be saved to `path` *(optional, see example below)*
30+
31+
32+
**callback** - callback function *(optional)*, includes following parameters:
33+
34+
- `error:` if error - `Error object`, if success - `null`
35+
- `result:` if error - `null`, if success - object containing:
36+
- `html:` html code of index page
37+
38+
39+
##Examples
40+
Let's scrape [http://nodejs.org/](http://nodejs.org/) with images, css, js files and save them to `/path/to/save/`. Index page will be named 'myIndex.html', files will be separated into directories:
41+
42+
- `img` for .jpg, .png (full path `/path/to/save/img`)
43+
- `js` for .js (full path `/path/to/save/js`)
44+
- `css` for .css (full path `/path/to/save/css`)
45+
- `font` for .ttf, .woff, .eot, .svg (full path `/path/to/save/font`)
46+
47+
```javascript
48+
scraper.scrape({
49+
url: 'http://nodejs.org/',
50+
path: '/path/to/save',
51+
indexFile: 'myIndex.html',
52+
srcToLoad: [
53+
{selector: 'img', attr: 'src'},
54+
{selector: 'link[rel="stylesheet"]', attr: 'href'},
55+
{selector: 'script', attr: 'src'}
56+
],
57+
directories: [
58+
{directory: 'img', extensions: ['.jpg', '.png']},
59+
{directory: 'js', extensions: ['.js']},
60+
{directory: 'css', extensions: ['.css']},
61+
{directory: 'fonts', extensions: ['.ttf', '.woff', '.eot', '.svg']}
62+
]
63+
}, function (error, result){
64+
console.log(result);
65+
});
66+
```
67+
68+
##Dependencies
69+
70+
- cheerio
71+
- request
72+
- bluebird
73+
- fs-extra
74+
- underscore

0 commit comments

Comments
 (0)