闲鱼商品搜索API

基于 FastAPI 构建的闲鱼商品搜索接口，支持异步并发请求和自动化数据去重存储。

功能特性

🔍 关键词商品搜索（支持分页）
⚡ 异步高性能爬取（Playwright 无头浏览器）
🛡️ 智能数据去重（基于链接特征哈希值）
💾 数据持久化存储（关系数据库）
📊 返回新增记录统计信息

技术栈

组件	用途
FastAPI	RESTful API框架
Playwright	浏览器自动化爬取
Tortoise ORM	异步数据库ORM
SQL	数据持久化存储
Uvicorn	ASGI服务器

快速开始

环境配置

安装依赖

pip install -r requirements.txt
playwright install chromium

创建 .env 文件（请修改为自己的信息）

DATABASE_URL=mysql://user:password@localhost/xianyu

启动服务

python spider.py

API 文档

访问 http://localhost:8000/docs 查看交互式文档

搜索接口

POST /search/

请求参数示例：

{
  "keyword": "手机",
  "max_pages": 1
}

响应示例：

{
  "status": "success",
  "keyword": "手机",
  "total_results": 30,
  "new_records": 5,
  "new_record_ids": [101,102,103,104,105]
}

使用示例

建议使用 Apifox 或者 Postman 进行测试

cURL 请求

curl -X POST "http://localhost:8000/search/" \
-H "Content-Type: application/json" \
-d '{"keyword": "笔记本电脑", "max_pages": 2}'

Python 客户端

import requests

response = requests.post(
    "http://localhost:8000/search/",
    json={"keyword": "数码相机", "max_pages": 3}
)
print(response.json())

注意事项

法律合规
使用前请确保遵守《网络安全法》和闲鱼平台 Robots 协议，本代码仅用于学习研究
反爬机制
建议配置代理 IP 池和随机请求间隔，默认配置可能触发反爬限制
性能调优

调整数据库连接池配置（pool_recycle等参数）
建议生产环境部署时增加 Redis 缓存层

版权声明

本项目采用 MIT License，请合理使用并注明出处。数据抓取结果不得用于商业用途。

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
spider.py		spider.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

闲鱼商品搜索API

功能特性

技术栈

快速开始

环境配置

启动服务

API 文档

搜索接口

使用示例

cURL 请求

Python 客户端

注意事项

版权声明

About

Uh oh!

Releases

Packages

Languages

superboyyy/xianyu_spider

Folders and files

Latest commit

History

Repository files navigation

闲鱼商品搜索API

功能特性

技术栈

快速开始

环境配置

启动服务

API 文档

搜索接口

使用示例

cURL 请求

Python 客户端

注意事项

版权声明

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages