Skip to content

Commit 777a9af

Browse files
committed
Docs: Update features
1 parent 88e35aa commit 777a9af

File tree

2 files changed

+15
-21
lines changed

2 files changed

+15
-21
lines changed

README.md

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@ x-crawl is a flexible Node.js multifunctional crawler library. Flexible usage an
1010

1111
- **🔥 Asynchronous Synchronous** - Just change the mode property to toggle asynchronous or synchronous crawling mode.
1212
- **⚙️ Multiple purposes** - It can crawl pages, crawl interfaces, crawl files and poll crawls to meet the needs of various scenarios.
13+
- **☁️ Crawl SPA** - Crawl SPA (Single Page Application) to generate pre-rendered content (aka "SSR" (Server Side Rendering)).
14+
- **⚒️ Control Page** - Automate form submission, UI testing, keyboard input, event manipulation, open browser, etc.
1315
- **🖋️ Flexible writing style** - The same crawling API can be adapted to multiple configurations, and each configuration method is very unique.
1416
- **⏱️ Interval Crawling** - No interval, fixed interval and random interval to generate or avoid high concurrent crawling.
1517
- **🔄 Failed Retry** - Avoid crawling failure due to short-term problems, and customize the number of retries.
1618
- **➡️ Proxy Rotation** - Auto-rotate proxies with failure retry, custom error times and HTTP status codes.
1719
- **👀 Device Fingerprinting** - Zero configuration or custom configuration, avoid fingerprinting to identify and track us from different locations.
1820
- **🚀 Priority Queue** - According to the priority of a single crawling target, it can be crawled ahead of other targets.
19-
- **☁️ Crawl SPA** - Crawl SPA (Single Page Application) to generate pre-rendered content (aka "SSR" (Server Side Rendering)).
20-
- **⚒️ Control Page** - You can submit form, keyboard input, event operation, generate screenshots of the page, etc.
2121
- **🧾 Capture Record** - Capture and record crawling, and use colored strings to remind in the terminal.
2222
- **🦾 TypeScript** - Own types, implement complete types through generics.
2323

@@ -136,7 +136,7 @@ Take the automatic acquisition of some photos of experiences and homes around th
136136
import xCrawl from 'x-crawl'
137137

138138
// 2.Create a crawler instance
139-
const myXCrawl = xCrawl({maxRetry: 3,intervalTime: { max: 3000, min: 2000 }})
139+
const myXCrawl = xCrawl({ maxRetry: 3, intervalTime: { max: 3000, min: 2000 } })
140140

141141
// 3.Set the crawling task
142142
/*
@@ -164,12 +164,9 @@ myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
164164
await new Promise((r) => setTimeout(r, 300))
165165

166166
// Gets the URL of the page image
167-
const urls = await page.$$eval(
168-
`${elSelectorMap[id - 1]} img`,
169-
(imgEls) => {
170-
return imgEls.map((item) => item.src)
171-
}
172-
)
167+
const urls = await page.$$eval(`${elSelectorMap[id - 1]} img`, (imgEls) => {
168+
return imgEls.map((item) => item.src)
169+
})
173170
targets.push(...urls)
174171

175172
// Close page
@@ -283,7 +280,7 @@ myXCrawl.crawlPage('https://www.example.com').then((res) => {
283280

284281
#### Browser Instance
285282

286-
When you call crawlPage API to crawl pages in the same crawler instance, the browser instance used is the same, because the crawlPage API of the browser instance in the same crawler instance is shared. For specific usage, please refer to [Browser](https://pptr.dev/api/puppeteer.browser).
283+
When you call crawlPage API to crawl pages in the same crawler instance, the browser instance used is the same, because the crawlPage API of the browser instance in the same crawler instance is shared. For specific usage, please refer to [Browser](https://pptr.dev/api/puppeteer.browser).
287284

288285
**Note:** The browser will keep running and the file will not be terminated. If you want to stop, you can execute browser.close() to close it. Do not call [crawlPage](#crawlPage) or [page](#page) if you need to use it later. Because the crawlPage API of the browser instance in the same crawler instance is shared.
289286

@@ -332,9 +329,9 @@ Disable running the browser in headless mode.
332329
import xCrawl from 'x-crawl'
333330

334331
const myXCrawl = xCrawl({
335-
maxRetry: 3,
336-
// Cancel running the browser in headless mode
337-
crawlPage: { launchBrowser: { headless: false } }
332+
maxRetry: 3,
333+
// Cancel running the browser in headless mode
334+
crawlPage: { launchBrowser: { headless: false } }
338335
})
339336

340337
myXCrawl.crawlPage('https://www.example.com').then((res) => {})

docs/cn.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@ x-crawl 是一个灵活的 Node.js 多功能爬虫库。灵活的使用方式和
1010

1111
- **🔥 异步同步** - 只需更改一下 mode 属性即可切换异步或同步爬取模式。
1212
- **⚙️ 多种用途** - 可爬页面、爬接口、爬文件以及轮询爬,满足各种场景需求。
13+
- **☁️ 爬取 SPA** - 爬取 SPA(单页应用程序)生成预渲染内容(即“SSR”(服务器端渲染))。
14+
- **⚒️ 控制页面** - 自动化表单提交、UI 测试、键盘输入、事件操作、打开浏览器等。
1315
- **🖋️ 写法灵活** - 同种爬取 API 适配多种配置,每种配置方式都非常独特。
1416
- **⏱️ 间隔爬取** - 无间隔、固定间隔以及随机间隔,产生或避免高并发爬取。
1517
- **🔄 失败重试** - 避免因短暂的问题而造成爬取失败,自定义重试次数。
1618
- **➡️ 轮换代理** - 配合失败重试,自定义错误次数以及 HTTP 状态码自动轮换代理。
1719
- **👀 设备指纹** - 零配置或自定义配置,避免指纹识别从不同位置识别并跟踪我们。
1820
- **🚀 优先队列** - 根据单个爬取目标的优先级可以优先于其他目标提前爬取。
19-
- **☁️ 爬取 SPA** - 爬取 SPA(单页应用程序)生成预渲染内容(即“SSR”(服务器端渲染))。
20-
- **⚒️ 控制页面** - 可以表单提交、键盘输入、事件操作、生成页面的屏幕截图等。
2121
- **🧾 捕获记录** - 对爬取进行捕获记录,并在终端使用彩色字符串提醒。
2222
- **🦾 TypeScript** - 拥有类型,通过泛型实现完整的类型。
2323

@@ -162,12 +162,9 @@ myXCrawl.startPolling({ d: 1 }, async (count, stopPolling) => {
162162
await new Promise((r) => setTimeout(r, 300))
163163

164164
// 获取页面图片的 URL
165-
const urls = await page.$$eval(
166-
`${elSelectorMap[id - 1]} img`,
167-
(imgEls) => {
168-
return imgEls.map((item) => item.src)
169-
}
170-
)
165+
const urls = await page.$$eval(`${elSelectorMap[id - 1]} img`, (imgEls) => {
166+
return imgEls.map((item) => item.src)
167+
})
171168
targets.push(...urls)
172169

173170
// 关闭页面

0 commit comments

Comments
 (0)