@@ -4,17 +4,21 @@ English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.
4
4
5
5
x-crawl is a Nodejs multifunctional crawler library.
6
6
7
- ## Feature
7
+ ## Features
8
8
9
- - Crawl HTML , JSON, file resources, etc. with simple configuration.
10
- - Built -in puppeteer crawls HTML and uses JSDOM library to parse HTML .
9
+ - Crawl pages , JSON, file resources, etc. with simple configuration.
10
+ - The built -in puppeteer crawls the page, and uses the jsdom library to parse the page .
11
11
- Support asynchronous/synchronous way to crawl data.
12
- - Support Promise/Callback way to get the result.
13
- - Polling function.
12
+ - Support Promise/Callback method to get the result.
13
+ - Polling function, fixed-point crawling .
14
14
- Anthropomorphic request interval.
15
- - Written in TypeScript, provides generics.
15
+ - Written in TypeScript, providing generics.
16
16
17
- ## Benefits provided by using puppeter
17
+ ## Relationship with puppeter
18
+
19
+ The fetchHTML API internally uses the [ puppeter] ( https://github.com/puppeteer/puppeteer ) library to crawl pages.
20
+
21
+ The following can be done:
18
22
19
23
- Generate screenshots and PDFs of pages.
20
24
- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
@@ -33,6 +37,7 @@ x-crawl is a Nodejs multifunctional crawler library.
33
37
* [ fetchHTML] ( #fetchHTML )
34
38
+ [ Type] ( #Type-2 )
35
39
+ [ Example] ( #Example-2 )
40
+ + [ About page] ( #About-page )
36
41
* [ fetchData] ( #fetchData )
37
42
+ [ Type] ( #Type-3 )
38
43
+ [ Example] ( #Example-3 )
@@ -173,12 +178,12 @@ The first request is not to trigger the interval.
173
178
174
179
### fetchHTML
175
180
176
- fetchHTML is the method of the above [myXCrawl ](https :// github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl HTML .
181
+ fetchHTML is the method of the above [myXCrawl ](https :// github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl page .
177
182
178
183
#### Type
179
184
180
185
- Look at the [FetchHTMLConfig ](#FetchHTMLConfig ) type
181
- - Look at the [FetchHTML ](#FetchHTML ) type
186
+ - Look at the [FetchHTML ](#FetchHTML - 2 ) type
182
187
183
188
` ` ` ts
184
189
function fetchHTML: (
@@ -196,6 +201,10 @@ myXCrawl.fetchHTML('/xxx').then((res) => {
196
201
})
197
202
` ` `
198
203
204
+ #### About page
205
+
206
+ Get the page instance from res .data .page , which can do interactive operations such as events . For specific usage , refer to [page ](https :// pptr.dev/api/puppeteer.page).
207
+
199
208
### fetchData
200
209
201
210
fetchData is the method of the above [myXCrawl ](#Example - 1 ) instance , which is usually used to crawl APIs to obtain JSON data and so on .
@@ -224,7 +233,7 @@ const requestConfig = [
224
233
225
234
myXCrawl.fetchData({
226
235
requestConfig, // Request configuration, can be RequestConfig | RequestConfig[]
227
- intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when not using myXCrawl
236
+ intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when creating myXCrawl is not used
228
237
}).then(res => {
229
238
console.log(res)
230
239
})
@@ -380,7 +389,7 @@ interface FetchDataConfig extends FetchBaseConfigV1 {
380
389
interface FetchFileConfig extends FetchBaseConfigV1 {
381
390
fileConfig: {
382
391
storeDir: string // Store folder
383
- extension?: string // filename extension
392
+ extension?: string // Filename extension
384
393
}
385
394
}
386
395
` ` `
@@ -409,7 +418,7 @@ interface FetchCommon<T> {
409
418
### FetchResCommonArrV1
410
419
411
420
` ` ` ts
412
- type FetchCommonArr <T> = FetchCommon <T>[]
421
+ type FetchResCommonArrV1 <T> = FetchResCommonV1 <T>[]
413
422
` ` `
414
423
415
424
### FileInfo
0 commit comments