Skip to content

Commit bffed08

Browse files
committed
feat: release a new version
1 parent ea33d4c commit bffed08

File tree

5 files changed

+78
-47
lines changed

5 files changed

+78
-47
lines changed

CHANGELOG.md

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,29 @@
1+
# [v10.1.0](https://github.com/coder-hxl/x-crawl/compare/v10.0.2..v10.1.0) (2025-04-06)
2+
3+
### 🚀 Features
4+
5+
- Added ollama
6+
- Change the openai model type to string
7+
8+
### ⛓️ Dependencies
9+
10+
- puppeteer from 22.13.1 to 24.6.0
11+
- openai from 4.52.7 to 4.91.1
12+
- upgrade non-major dependencies to the latest version
13+
14+
---
15+
16+
### 🚀 特征
17+
18+
- 新增 ollama
19+
- openai 模型类型改为 string
20+
21+
### ⛓️ 依赖关系
22+
23+
- puppeteer 从 22.13.1 升至 24.6.0
24+
- openai 从 4.52.7 升至 4.91.1
25+
- 非主要依赖项升级最新版本
26+
127
# [v10.0.2](https://github.com/coder-hxl/x-crawl/compare/v10.0.1..v10.0.2) (2024-07-21)
228

329
### 🚀 Features
@@ -14,7 +40,7 @@
1440

1541
### 🚀 特征
1642

17-
- OpenAIChatModel 类型新增 'gpt-4o' | 'gpt-4o-2024-05-13' | 'gpt-4-turbo' | 'gpt-4-turbo-2024-04-09' ,与 openai 保持同步。
43+
- OpenAIChatModel 类型新增 'gpt-4o' | 'gpt-4o-2024-05-13' | 'gpt-4-turbo' | 'gpt-4-turbo-2024-04-09' ,与 openai 保持同步。
1844

1945
### ⛓️ 依赖关系
2046

README.md

Lines changed: 21 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ x-crawl is a flexible Node.js AI-assisted crawler library. Flexible usage and po
77
It consists of two parts:
88

99
- Crawler: It consists of a crawler API and various functions that can work normally even without relying on AI.
10-
- AI: Currently based on the large AI model provided by OpenAI, AI simplifies many tedious operations.
10+
- AI: Integrate ollama and openai, AI simplifies many tedious operations.
1111

1212
> If you find x-crawl helpful, or you like x-crawl, you can give [x-crawl repository](https://github.com/coder-hxl/x-crawl) a like on GitHub A star. Your support is the driving force for our continuous improvement! thank you for your support!
1313
1414
## Features
1515

16-
- **🤖 AI Assistance** - Powerful AI assistance function makes crawler work more efficient, intelligent and convenient.
16+
- **🤖 AI Assistance** - Integrate ollama and openai, powerful AI assistance function makes crawler work more efficient, intelligent and convenient.
1717
- **🖋️ Flexible writing** - A single crawling API is suitable for multiple configurations, and each configuration method has its own advantages.
1818
- **⚙️Multiple uses** - Supports crawling dynamic pages, static pages, interface data and file data.
1919
- **⚒️ Control page** - Crawling dynamic pages supports automated operations, keyboard input, event operations, etc.
@@ -56,28 +56,30 @@ const crawlOpenAIApp = createCrawlOpenAI({
5656
})
5757

5858
// crawlPage is used to crawl pages
59-
crawlApp.crawlPage('https://www.example.cn/s/select_homes').then(async (res) => {
60-
const { page, browser } = res.data
59+
crawlApp
60+
.crawlPage('https://www.example.cn/s/select_homes')
61+
.then(async (res) => {
62+
const { page, browser } = res.data
6163

62-
// Wait for the element to appear on the page and get the HTML
63-
const targetSelector = '[data-tracking-id="TOP_REVIEWED_LISTINGS"]'
64-
await page.waitForSelector(targetSelector)
65-
const highlyHTML = await page.$eval(targetSelector, (el) => el.innerHTML)
64+
// Wait for the element to appear on the page and get the HTML
65+
const targetSelector = '[data-tracking-id="TOP_REVIEWED_LISTINGS"]'
66+
await page.waitForSelector(targetSelector)
67+
const highlyHTML = await page.$eval(targetSelector, (el) => el.innerHTML)
6668

67-
// Let AI obtain image links and remove duplicates
68-
const srcResult = await crawlOpenAIApp.parseElements(
69-
highlyHTML,
70-
`Get the image link, don't source it inside, and de-duplicate it`
71-
)
69+
// Let AI obtain image links and remove duplicates
70+
const srcResult = await crawlOpenAIApp.parseElements(
71+
highlyHTML,
72+
`Get the image link, don't source it inside, and de-duplicate it`
73+
)
7274

73-
browser.close()
75+
browser.close()
7476

75-
// crawlFile is used to crawl file resources
76-
crawlApp.crawlFile({
77-
targets: srcResult.elements.map((item) => item.src),
78-
storeDirs: './upload'
77+
// crawlFile is used to crawl file resources
78+
crawlApp.crawlFile({
79+
targets: srcResult.elements.map((item) => item.src),
80+
storeDirs: './upload'
81+
})
7982
})
80-
})
8183
```
8284

8385
> [!TIP]

package.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"private": true,
33
"name": "x-crawl",
4-
"version": "10.0.2",
4+
"version": "10.1.0",
55
"author": "coderHXL",
66
"description": "x-crawl is a flexible Node.js AI-assisted crawler library.",
77
"license": "MIT",
@@ -68,4 +68,4 @@
6868
"fingerprint",
6969
"multifunction"
7070
]
71-
}
71+
}

publish/README.md

Lines changed: 25 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ x-crawl is a flexible Node.js AI-assisted crawler library. Flexible usage and po
77
It consists of two parts:
88

99
- Crawler: It consists of a crawler API and various functions that can work normally even without relying on AI.
10-
- AI: Currently based on the large AI model provided by OpenAI, AI simplifies many tedious operations.
10+
- AI: Integrate ollama and openai, AI simplifies many tedious operations.
1111

1212
> If you find x-crawl helpful, or you like x-crawl, you can give [x-crawl repository](https://github.com/coder-hxl/x-crawl) a like on GitHub A star. Your support is the driving force for our continuous improvement! thank you for your support!
1313
1414
## Features
1515

16-
- **🤖 AI Assistance** - Powerful AI assistance function makes crawler work more efficient, intelligent and convenient.
16+
- **🤖 AI Assistance** - Integrate ollama and openai, powerful AI assistance function makes crawler work more efficient, intelligent and convenient.
1717
- **🖋️ Flexible writing** - A single crawling API is suitable for multiple configurations, and each configuration method has its own advantages.
1818
- **⚙️Multiple uses** - Supports crawling dynamic pages, static pages, interface data and file data.
1919
- **⚒️ Control page** - Crawling dynamic pages supports automated operations, keyboard input, event operations, etc.
@@ -56,28 +56,30 @@ const crawlOpenAIApp = createCrawlOpenAI({
5656
})
5757

5858
// crawlPage is used to crawl pages
59-
crawlApp.crawlPage('https://www.example.cn/s/select_homes').then(async (res) => {
60-
const { page, browser } = res.data
61-
62-
// Wait for the element to appear on the page and get the HTML
63-
const targetSelector = '[data-tracking-id="TOP_REVIEWED_LISTINGS"]'
64-
await page.waitForSelector(targetSelector)
65-
const highlyHTML = await page.$eval(targetSelector, (el) => el.innerHTML)
66-
67-
// Let the AI get the image link and de-duplicate it (the more detailed the description, the better)
68-
const srcResult = await crawlOpenAIApp.parseElements(
69-
highlyHTML,
70-
`Get the image link, don't source it inside, and de-duplicate it`
71-
)
72-
73-
browser.close()
74-
75-
// crawlFile is used to crawl file resources
76-
crawlApp.crawlFile({
77-
targets: srcResult.elements.map((item) => item.src),
78-
storeDirs: './upload'
59+
crawlApp
60+
.crawlPage('https://www.example.cn/s/select_homes')
61+
.then(async (res) => {
62+
const { page, browser } = res.data
63+
64+
// Wait for the element to appear on the page and get the HTML
65+
const targetSelector = '[data-tracking-id="TOP_REVIEWED_LISTINGS"]'
66+
await page.waitForSelector(targetSelector)
67+
const highlyHTML = await page.$eval(targetSelector, (el) => el.innerHTML)
68+
69+
// Let the AI get the image link and de-duplicate it (the more detailed the description, the better)
70+
const srcResult = await crawlOpenAIApp.parseElements(
71+
highlyHTML,
72+
`Get the image link, don't source it inside, and de-duplicate it`
73+
)
74+
75+
browser.close()
76+
77+
// crawlFile is used to crawl file resources
78+
crawlApp.crawlFile({
79+
targets: srcResult.elements.map((item) => item.src),
80+
storeDirs: './upload'
81+
})
7982
})
80-
})
8183
```
8284

8385
**You can even send the whole HTML to the AI to help us operate, because the website content is more complex you also need to describe the location to get more accurately, and will consume a lot of Tokens.**

publish/package.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "x-crawl",
3-
"version": "10.0.2",
3+
"version": "10.1.0",
44
"author": "coderHXL",
55
"description": "x-crawl is a flexible Node.js AI-assisted crawler library.",
66
"license": "MIT",
@@ -41,8 +41,9 @@
4141
"dependencies": {
4242
"chalk": "5.4.1",
4343
"https-proxy-agent": "^7.0.6",
44+
"ollama": "^0.5.14",
4445
"openai": "^4.91.1",
4546
"ora": "^8.2.0",
4647
"puppeteer": "24.6.0"
4748
}
48-
}
49+
}

0 commit comments

Comments
 (0)