Skip to content

Commit 5959d1a

Browse files
committed
docs: describe the AI assisted crawler
1 parent 1600707 commit 5959d1a

File tree

6 files changed

+43
-3
lines changed

6 files changed

+43
-3
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,16 @@ It consists of two parts:
2626
- **🧾 Crawl information** - Controllable crawl information, which will output colored string information in the terminal.
2727
- **🦾 TypeScript** - Own types and implement complete types through generics.
2828

29+
## AI assisted crawler
30+
31+
With the rapid development of network technology, website updates have become more frequent, and changes in class names or structures often bring considerable challenges to crawlers that rely on these elements. Against this background, crawlers combined with AI technology have become a powerful weapon to meet this challenge.
32+
33+
First of all, changes in class names or structures after website updates may cause traditional crawler strategies to fail. This is because crawlers often rely on fixed class names or structures to locate and extract the required information. Once these elements change, the crawler may not be able to accurately find the required data, thus affecting the effectiveness and accuracy of data crawling.
34+
35+
However, crawlers combined with AI technology are better able to cope with this change. AI can also understand and parse the semantic information of web pages through natural language processing and other technologies to more accurately extract the required data.
36+
37+
To sum up, crawlers combined with AI technology can better cope with the problem of class name or structure changes after website updates.
38+
2939
## Example
3040

3141
The combination of crawler and AI allows the crawler and AI to obtain pictures of high-rated vacation rentals according to our instructions:
@@ -70,6 +80,8 @@ crawlApp.crawlPage('https://www.airbnb.cn/s/select_homes').then(async (res) => {
7080
})
7181
```
7282

83+
**You can even pass the entire HTML to AI to help us operate it. Since the website content is more complex, you also need to describe the location to be taken more accurately. The most important thing is that it will consume more Tokens.**
84+
7385
Pictures of highly rated vacation rentals climbed to:
7486

7587
![](https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/example.png)

docs/about/issues.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
#Issues
1+
# Issues
22

33
If you have **questions, requirements, and good suggestions**, you can raise **Issues** in [GitHub Issues](https://github.com/coder-hxl/x-crawl/issues).

docs/api/custom.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#custom
1+
# custom
22

33
custom is a method of AI application instance, usually used for user-defined AI functions.
44

docs/api/parse-elements.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#parseElements
1+
# parseElements
22

33
parseElements is a method of AI application instances, typically used for intelligent on-demand analysis of elements.
44

docs/cn/guide/index.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,16 @@ x-crawl 是一个灵活的 Node.js AI 辅助爬虫库。灵活的使用方式和
2626
- **🧾 爬取信息** - 可控的爬取信息,会在终端输出彩色字符串信息。
2727
- **🦾 TypeScript** - 拥有类型,通过泛型实现完整的类型。
2828

29+
## AI 辅助爬虫
30+
31+
随着网络技术的日新月异,网站更新变得愈发频繁,而类名或结构的改变往往给依赖这些元素的爬虫带来不小的挑战。在这样的背景下,结合 AI 技术的爬虫成为了应对这一挑战的有力武器。
32+
33+
首先,网站更新后类名或结构的改变可能导致传统的爬虫抓取策略失效。这是因为爬虫通常依赖于固定的类名或结构来定位并提取所需信息。一旦这些元素发生变化,爬虫就可能无法准确找到所需数据,从而影响数据抓取的效果和准确性。
34+
35+
然而,结合 AI 技术的爬虫则能够更好地应对这种变化。AI 还可以通过自然语言处理等技术,理解并解析网页的语义信息,从而更准确地提取所需数据。
36+
37+
综上所述,结合 AI 技术的爬虫能够更好地应对网站更新后类名或结构改变的问题。
38+
2939
## 示例
3040

3141
爬虫和 AI 结合,让爬虫和 AI 根据我们的指令获取高评分度假屋的图片:
@@ -70,6 +80,10 @@ crawlApp.crawlPage('https://www.airbnb.cn/s/select_homes').then(async (res) => {
7080
})
7181
```
7282

83+
::: tip
84+
你甚至可以将整个 HTML 传给 AI 帮我们操作,由于网站内容更加复杂你还需要更准确描述要取的位置,最重要的是会消耗更多 Tokens 。
85+
:::
86+
7387
爬到的高评分度假屋图片:
7488

7589
![](/example.png)

docs/guide/index.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,16 @@ It consists of two parts:
2626
- **🧾 Crawl information** - Controllable crawl information, which will output colored string information in the terminal.
2727
- **🦾 TypeScript** - Own types and implement complete types through generics.
2828

29+
## AI assisted crawler
30+
31+
With the rapid development of network technology, website updates have become more frequent, and changes in class names or structures often bring considerable challenges to crawlers that rely on these elements. Against this background, crawlers combined with AI technology have become a powerful weapon to meet this challenge.
32+
33+
First of all, changes in class names or structures after website updates may cause traditional crawler strategies to fail. This is because crawlers often rely on fixed class names or structures to locate and extract the required information. Once these elements change, the crawler may not be able to accurately find the required data, thus affecting the effectiveness and accuracy of data crawling.
34+
35+
However, crawlers combined with AI technology are better able to cope with this change. AI can also understand and parse the semantic information of web pages through natural language processing and other technologies to more accurately extract the required data.
36+
37+
To sum up, crawlers combined with AI technology can better cope with the problem of class name or structure changes after website updates.
38+
2939
## Example
3040

3141
The combination of crawler and AI allows the crawler and AI to obtain pictures of high-rated vacation rentals according to our instructions:
@@ -70,6 +80,10 @@ crawlApp.crawlPage('https://www.airbnb.cn/s/select_homes').then(async (res) => {
7080
})
7181
```
7282

83+
::: tip
84+
You can even pass the entire HTML to AI to help us operate it. Since the website content is more complex, you also need to describe the location to be taken more accurately. The most important thing is that it will consume more Tokens.
85+
:::
86+
7387
Pictures of highly rated vacation rentals climbed to:
7488

7589
![](/example.png)

0 commit comments

Comments
 (0)