Skip to content

Commit 08c28e6

Browse files
authored
Merge pull request #658 from persist-1/feature/sqlite-support
增加对本地Sqlite数据库的支持(在不便于使用Mysql服务时也能使用数据库进行相关操作)
2 parents c5509ab + 9457455 commit 08c28e6

35 files changed

+2560
-1206
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,11 +199,23 @@ python main.py --help
199199

200200
支持多种数据存储方式:
201201

202+
- **SQLite 数据库**:轻量级数据库,无需服务器,适合个人使用(推荐)
203+
- 参数:`--save_data_option sqlite`
204+
- 自动创建数据库文件
202205
- **MySQL 数据库**:支持关系型数据库 MySQL 中保存(需要提前创建数据库)
203206
- 执行 `python db.py` 初始化数据库表结构(只在首次执行)
204207
- **CSV 文件**:支持保存到 CSV 中(`data/` 目录下)
205208
- **JSON 文件**:支持保存到 JSON 中(`data/` 目录下)
206209

210+
### 使用示例:
211+
```shell
212+
# 使用 SQLite(推荐个人用户使用)
213+
uv run main.py --platform xhs --lt qrcode --type search --save_data_option sqlite
214+
215+
# 使用 MySQL
216+
uv run main.py --platform xhs --lt qrcode --type search --save_data_option db
217+
```
218+
207219
---
208220

209221
[🚀 MediaCrawlerPro 重磅发布 🚀!更多的功能,更好的架构设计!](https://github.com/MediaCrawlerPro)

README_en.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,11 +195,23 @@ python main.py --help
195195

196196
Supports multiple data storage methods:
197197

198+
- **SQLite Database**: Lightweight database without server, ideal for personal use (recommended)
199+
- Parameter: `--save_data_option sqlite`
200+
- Database file created automatically
198201
- **MySQL Database**: Supports saving to relational database MySQL (need to create database in advance)
199202
- Execute `python db.py` to initialize database table structure (only execute on first run)
200203
- **CSV Files**: Supports saving to CSV (under `data/` directory)
201204
- **JSON Files**: Supports saving to JSON (under `data/` directory)
202205

206+
### Usage Examples:
207+
```shell
208+
# Use SQLite (recommended for personal users)
209+
uv run main.py --platform xhs --lt qrcode --type search --save_data_option sqlite
210+
211+
# Use MySQL
212+
uv run main.py --platform xhs --lt qrcode --type search --save_data_option db
213+
```
214+
203215
---
204216

205217
[🚀 MediaCrawlerPro Major Release 🚀! More features, better architectural design!](https://github.com/MediaCrawlerPro)

README_es.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,11 +195,23 @@ python main.py --help
195195

196196
Soporta múltiples métodos de almacenamiento de datos:
197197

198+
- **Base de Datos SQLite**: Base de datos ligera sin servidor, ideal para uso personal (recomendado)
199+
- Parámetro: `--save_data_option sqlite`
200+
- Se crea automáticamente el archivo de base de datos
198201
- **Base de Datos MySQL**: Soporta guardar en base de datos relacional MySQL (necesita crear base de datos con anticipación)
199202
- Ejecute `python db.py` para inicializar la estructura de tablas de la base de datos (solo ejecutar en la primera ejecución)
200203
- **Archivos CSV**: Soporta guardar en CSV (bajo el directorio `data/`)
201204
- **Archivos JSON**: Soporta guardar en JSON (bajo el directorio `data/`)
202205

206+
### Ejemplos de Uso:
207+
```shell
208+
# Usar SQLite (recomendado para usuarios personales)
209+
uv run main.py --platform xhs --lt qrcode --type search --save_data_option sqlite
210+
211+
# Usar MySQL
212+
uv run main.py --platform xhs --lt qrcode --type search --save_data_option db
213+
```
214+
203215
---
204216

205217
[🚀 ¡Lanzamiento Mayor de MediaCrawlerPro 🚀! ¡Más características, mejor diseño arquitectónico!](https://github.com/MediaCrawlerPro)

async_sqlite_db.py

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# 声明:本代码仅供学习和研究目的使用。使用者应遵守以下原则:
2+
# 1. 不得用于任何商业用途。
3+
# 2. 使用时应遵守目标平台的使用条款和robots.txt规则。
4+
# 3. 不得进行大规模爬取或对平台造成运营干扰。
5+
# 4. 应合理控制请求频率,避免给目标平台带来不必要的负担。
6+
# 5. 不得用于任何非法或不当的用途。
7+
#
8+
# 详细许可条款请参阅项目根目录下的LICENSE文件。
9+
# 使用本代码即表示您同意遵守上述原则和LICENSE中的所有条款。
10+
11+
12+
# -*- coding: utf-8 -*-
13+
# @Author : relakkes@gmail.com
14+
# @Time : 2024/4/6 14:21
15+
# @Desc : 异步SQLite的增删改查封装
16+
from typing import Any, Dict, List, Union
17+
18+
import aiosqlite
19+
20+
21+
class AsyncSqliteDB:
22+
def __init__(self, db_path: str) -> None:
23+
self.__db_path = db_path
24+
25+
async def query(self, sql: str, *args: Union[str, int]) -> List[Dict[str, Any]]:
26+
"""
27+
从给定的 SQL 中查询记录,返回的是一个列表
28+
:param sql: 查询的sql
29+
:param args: sql中传递动态参数列表
30+
:return:
31+
"""
32+
async with aiosqlite.connect(self.__db_path) as conn:
33+
conn.row_factory = aiosqlite.Row
34+
async with conn.execute(sql, args) as cursor:
35+
rows = await cursor.fetchall()
36+
return [dict(row) for row in rows] if rows else []
37+
38+
async def get_first(self, sql: str, *args: Union[str, int]) -> Union[Dict[str, Any], None]:
39+
"""
40+
从给定的 SQL 中查询记录,返回的是符合条件的第一个结果
41+
:param sql: 查询的sql
42+
:param args:sql中传递动态参数列表
43+
:return:
44+
"""
45+
async with aiosqlite.connect(self.__db_path) as conn:
46+
conn.row_factory = aiosqlite.Row
47+
async with conn.execute(sql, args) as cursor:
48+
row = await cursor.fetchone()
49+
return dict(row) if row else None
50+
51+
async def item_to_table(self, table_name: str, item: Dict[str, Any]) -> int:
52+
"""
53+
表中插入数据
54+
:param table_name: 表名
55+
:param item: 一条记录的字典信息
56+
:return:
57+
"""
58+
fields = list(item.keys())
59+
values = list(item.values())
60+
fieldstr = ','.join(fields)
61+
valstr = ','.join(['?'] * len(item))
62+
sql = f"INSERT INTO {table_name} ({fieldstr}) VALUES({valstr})"
63+
async with aiosqlite.connect(self.__db_path) as conn:
64+
async with conn.execute(sql, values) as cursor:
65+
await conn.commit()
66+
return cursor.lastrowid
67+
68+
async def update_table(self, table_name: str, updates: Dict[str, Any], field_where: str,
69+
value_where: Union[str, int, float]) -> int:
70+
"""
71+
更新指定表的记录
72+
:param table_name: 表名
73+
:param updates: 需要更新的字段和值的 key - value 映射
74+
:param field_where: update 语句 where 条件中的字段名
75+
:param value_where: update 语句 where 条件中的字段值
76+
:return:
77+
"""
78+
upsets = []
79+
values = []
80+
for k, v in updates.items():
81+
upsets.append(f'{k}=?')
82+
values.append(v)
83+
upsets_str = ','.join(upsets)
84+
values.append(value_where)
85+
sql = f'UPDATE {table_name} SET {upsets_str} WHERE {field_where}=?'
86+
async with aiosqlite.connect(self.__db_path) as conn:
87+
async with conn.execute(sql, values) as cursor:
88+
await conn.commit()
89+
return cursor.rowcount
90+
91+
async def execute(self, sql: str, *args: Union[str, int]) -> int:
92+
"""
93+
需要更新、写入等操作的 excute 执行语句
94+
:param sql:
95+
:param args:
96+
:return:
97+
"""
98+
async with aiosqlite.connect(self.__db_path) as conn:
99+
async with conn.execute(sql, args) as cursor:
100+
await conn.commit()
101+
return cursor.rowcount
102+
103+
async def executescript(self, sql_script: str) -> None:
104+
"""
105+
执行SQL脚本,用于初始化数据库表结构
106+
:param sql_script: SQL脚本内容
107+
:return:
108+
"""
109+
async with aiosqlite.connect(self.__db_path) as conn:
110+
await conn.executescript(sql_script)
111+
await conn.commit()

cmd_arg/arg.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ async def parse_cmd():
3333
parser.add_argument('--get_sub_comment', type=str2bool,
3434
help=''''whether to crawl level two comment, supported values case insensitive ('yes', 'true', 't', 'y', '1', 'no', 'false', 'f', 'n', '0')''', default=config.ENABLE_GET_SUB_COMMENTS)
3535
parser.add_argument('--save_data_option', type=str,
36-
help='where to save the data (csv or db or json)', choices=['csv', 'db', 'json'], default=config.SAVE_DATA_OPTION)
36+
help='where to save the data (csv or db or json or sqlite)', choices=['csv', 'db', 'json', 'sqlite'], default=config.SAVE_DATA_OPTION)
3737
parser.add_argument('--cookies', type=str,
3838
help='cookies used for cookie login type', default=config.COOKIES)
3939

config/base_config.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,8 @@
7474
# 设置为False可以保持浏览器运行,便于调试
7575
AUTO_CLOSE_BROWSER = True
7676

77-
# 数据保存类型选项配置,支持三种类型:csv、db、json, 最好保存到DB,有排重的功能。
78-
SAVE_DATA_OPTION = "json" # csv or db or json
77+
# 数据保存类型选项配置,支持四种类型:csv、db、json、sqlite, 最好保存到DB,有排重的功能。
78+
SAVE_DATA_OPTION = "json" # csv or db or json or sqlite
7979

8080
# 用户浏览器缓存的浏览器文件配置
8181
USER_DATA_DIR = "%s_user_data_dir" # %s will be replaced by platform name

config/db_config.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@
1212
import os
1313

1414
# mysql config
15-
RELATION_DB_PWD = os.getenv("RELATION_DB_PWD", "123456")
16-
RELATION_DB_USER = os.getenv("RELATION_DB_USER", "root")
17-
RELATION_DB_HOST = os.getenv("RELATION_DB_HOST", "localhost")
18-
RELATION_DB_PORT = os.getenv("RELATION_DB_PORT", 3306)
19-
RELATION_DB_NAME = os.getenv("RELATION_DB_NAME", "media_crawler")
15+
MYSQL_DB_PWD = os.getenv("MYSQL_DB_PWD", "123456")
16+
MYSQL_DB_USER = os.getenv("MYSQL_DB_USER", "root")
17+
MYSQL_DB_HOST = os.getenv("MYSQL_DB_HOST", "localhost")
18+
MYSQL_DB_PORT = os.getenv("MYSQL_DB_PORT", 3306)
19+
MYSQL_DB_NAME = os.getenv("MYSQL_DB_NAME", "media_crawler")
2020

2121

2222
# redis config
@@ -27,4 +27,7 @@
2727

2828
# cache type
2929
CACHE_TYPE_REDIS = "redis"
30-
CACHE_TYPE_MEMORY = "memory"
30+
CACHE_TYPE_MEMORY = "memory"
31+
32+
# sqlite config
33+
SQLITE_DB_PATH = os.path.join(os.path.dirname(os.path.dirname(__file__)), "schema", "sqlite_tables.db")

0 commit comments

Comments
 (0)