You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- x-crawl is for legal purposes only. It is prohibited to use this tool to conduct any illegal activities, including but not limited to unauthorized data collection, network attacks, privacy violations, etc.
4
+
-Before collecting data, make sure you have explicit authorization from the target website and comply with its robots.txt file and terms of use.
5
+
-Avoid placing excessive access pressure on the target website to avoid triggering its anti-crawling strategy or causing server downtime.
-**Discord Chat:**Ask and discuss with other x-crawl users in real time via [Discord](https://discord.gg/SF7aaebg4E)(keep up to date on x-crawl news in advance).
4
+
-**GitHub Discussions:**Use[GitHub Discussions](https://github.com/coder-hxl/x-crawl/discussions)for message board-style questions and discussions.
Questions and discussions related to any illegal activity may not be submitted. x-crawl is for legal purposes only, and it is prohibited to use this tool to conduct any illegal activities, including but not limited to unauthorized data collection, network attacks, privacy violations, etc. Please ensure that your usage behavior always complies with laws, regulations and ethical standards, and jointly maintain a safe and legal network environment.
The crawlPage API has built-in [puppeteer](https://github.com/puppeteer/puppeteer). You only need to pass in some configuration options to let x-crawl help you simplify the operation and get intact Brower instances and Pages. instance, x-crawl does not override it.
6
6
7
-
## 使用 crawlPage API 造成程序崩溃
7
+
## Using crawlPage API causes the program to crash
If you need to crawl many pages in one crawlPage, it is recommended that after crawling each page, use [onCrawlItemComplete life cycle function](#onCrawlItemComplete) to process the results of each target and close the page instance. If no shutdown operation is performed, then The program may crash due to too many pages being opened (related to the performance of the device itself).
If you have **questions, requirements, and good suggestions**, you can raise **Issues** in [GitHub Issues](https://github.com/coder-hxl/x-crawl/issues).
Copy file name to clipboardExpand all lines: docs/api/crawl-data.md
+31-31Lines changed: 31 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
1
# crawlData
2
2
3
-
crawl 是爬虫实例的方法,通常用于爬取 API ,可获取 JSON 数据等等。
3
+
crawl is a method of crawler instance, usually used to crawl API, obtain JSON data, etc.
4
4
5
-
## 类型
5
+
## type
6
6
7
-
crawlData API 是一个函数。类型是 [重载函数](https://www.typescriptlang.org/docs/handbook/2/functions.html#function-overloads)可以通过不同的配置参数调用该函数(在类型方面)。
7
+
crawlData API is a function. A type is an [overloaded function](https://www.typescriptlang.org/docs/handbook/2/functions.html#function-overloads)that can be called with different configuration parameters (in terms of type).
This is a mixed target array configuration. If you want to crawl multiple data, and some data needs to be failed and retried, you can try this way of writing:
107
107
108
108
```js
109
109
import { createCrawl } from'x-crawl'
@@ -118,13 +118,13 @@ crawlApp
118
118
.then((res) => {})
119
119
```
120
120
121
-
拿到的 res 将是一个数组,里面是对象。
121
+
The res obtained will be an array containing objects.
This is an advanced configuration, targets is a mixed target array configuration. If you want to crawl multiple data, and you don’t want to write the crawling target configuration (proxy, cookies, retry, etc.) repeatedly, and you also need interval time, device fingerprint, life cycle, etc., you can try this way of writing:
128
128
129
129
```js
130
130
import { createCrawl } from'x-crawl'
@@ -144,8 +144,8 @@ crawlApp
144
144
.then((res) => {})
145
145
```
146
146
147
-
拿到的 res 将是一个数组,里面是对象。
147
+
The res obtained will be an array containing objects.
0 commit comments