-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: integrate MySQL support and enhance database management #3902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Added MySQL database implementation with connection settings and session management. - Introduced a new AstrBotMySQLSettings class for configuration. - Updated database helper functions to support both SQLite and MySQL. - Enhanced platform statistics retrieval with time series data for both database types. - Refactored existing SQLite methods to align with new database structure and functionality. closes: #3848
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - 我已经审阅了你的修改,这里是一些反馈:
- 新的平台统计聚合逻辑(
get_platform_stats和get_platform_stats_time_series)目前在 SQLite 和 MySQL 之间存在重复,差异仅在 SQL 方言上;建议将共有的聚合结构提取到一个共享的辅助函数里,以减少偏差并保持各数据库间行为一致。 - 在
mysql.py中,NOT_GIVEN被定义为一个TypeVar,但在update_persona中被用作运行时的哨兵值;为了更清晰且更符合惯例,建议使用一个唯一的对象实例(例如NOT_GIVEN = object())来承担这个角色。 - 迁移辅助函数现在会在
DatabaseType.MYSQL时直接短路,这意味着如果用户切换到 MySQL,则任何现有的 SQLite 数据都会被忽略;如果这是有意为之,建议在配置或代码注释中将该行为说明清楚,以避免对“迁移数据缺失”的困惑。
给 AI Agent 的提示词
Please address the comments from this code review:
## Overall Comments
- 新的平台统计聚合逻辑(`get_platform_stats` 和 `get_platform_stats_time_series`)目前在 SQLite 和 MySQL 之间存在重复,差异仅在 SQL 方言上;建议将共有的聚合结构提取到一个共享的辅助函数里,以减少偏差并保持各数据库间行为一致。
- 在 `mysql.py` 中,`NOT_GIVEN` 被定义为一个 `TypeVar`,但在 `update_persona` 中被用作运行时的哨兵值;为了更清晰且更符合惯例,建议使用一个唯一的对象实例(例如 `NOT_GIVEN = object()`)来承担这个角色。
- 迁移辅助函数现在会在 `DatabaseType.MYSQL` 时直接短路,这意味着如果用户切换到 MySQL,则任何现有的 SQLite 数据都会被忽略;如果这是有意为之,建议在配置或代码注释中将该行为说明清楚,以避免对“迁移数据缺失”的困惑。
## Individual Comments
### Comment 1
<location> `astrbot/core/db/mysql.py:103-106` </location>
<code_context>
+ async with self.AsyncSessionLocal() as session:
+ yield session
+
+ async def initialize(self) -> None:
+ """Initialize the database by creating tables if they do not exist."""
+ async with self.engine.begin() as conn:
+ await conn.run_sync(SQLModel.metadata.create_all)
+ await conn.commit()
+
</code_context>
<issue_to_address>
**issue (bug_risk):** 避免在 `engine.begin()` 上下文中调用 `conn.commit()`;该上下文管理器已经处理事务的提交和回滚。
在 `async with self.engine.begin() as conn:` 中,SQLAlchemy 会开启一个事务,并在退出时自动提交或回滚。额外的 `await conn.commit()` 是不必要的,并且在某些配置下可能导致错误。请仅依赖上下文管理器的自动提交行为。
</issue_to_address>
### Comment 2
<location> `astrbot/core/db/mysql.py:840` </location>
<code_context>
+ result = await session.execute(query)
+ return list(result.scalars().all())
+
+ async def update_platform_session(
+ self,
+ session_id: str,
+ display_name: str | None = None,
+ ) -> None:
+ """Update a Platform session's updated_at timestamp and optionally display_name."""
+ async with self.get_db() as session:
+ session: AsyncSession
+ async with session.begin():
+ values: dict[str, T.Any] = {"updated_at": datetime.now(timezone.utc)}
+ if display_name is not None:
+ values["display_name"] = display_name
+
+ await session.execute(
+ update(PlatformSession)
+ .where(col(PlatformSession.session_id) == session_id)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** 请让 `updated_at` 的时间戳时区处理方式与模型的其他字段保持一致,以避免混用“有时区/无时区”的时间对象。
该模块中其他时间戳(例如统计信息和消息历史)使用的是无时区的 `datetime.now()`,因此这里是唯一一个使用 UTC 有时区时间的字段。根据 `PlatformSession.updated_at` 的定义方式不同,这种不一致可能会导致警告或比较问题。请在整个数据库层选择一种统一的约定(统一使用 UTC 有时区,或统一使用无时区),并一致地应用。
```suggestion
values: dict[str, T.Any] = {"updated_at": datetime.now()}
```
</issue_to_address>
### Comment 3
<location> `astrbot/core/db/migration/migra_webchat_session.py:41` </location>
<code_context>
select(
col(PlatformMessageHistory.user_id),
- col(PlatformMessageHistory.sender_name),
+ func.max(PlatformMessageHistory.sender_name).label("sender_name"),
func.min(PlatformMessageHistory.created_at).label("earliest"),
func.max(PlatformMessageHistory.updated_at).label("latest"),
</code_context>
<issue_to_address>
**question (bug_risk):** 使用 `MAX` 来聚合 `sender_name` 能解决 SQL 分组问题,但会产生一个任意的名字;请确认这是否符合预期语义。
`func.max` 可以满足 `ONLY_FULL_GROUP_BY` 的要求,但它会选择字典序中最大的那个名字,在一个分组内这基本上是任意的。如果被选中的 `sender_name` 在语义或 UI 上有意义,建议考虑基于特定行(例如最早消息或最新消息)来派生,而不是依赖 `MAX`。
</issue_to_address>帮我变得更有用!请在每条评论上点击 👍 或 👎,我会根据你的反馈改进后续评审。
Original comment in English
Hey there - I've reviewed your changes - here's some feedback:
- The new platform stats aggregation logic (
get_platform_statsandget_platform_stats_time_series) is now duplicated between SQLite and MySQL with only SQL dialect differences; consider extracting the common aggregation shape to a shared helper to reduce drift and keep behavior aligned across databases. - In
mysql.py,NOT_GIVENis defined as aTypeVarbut used as a runtime sentinel inupdate_persona; it would be clearer and more conventional to use a unique object instance (e.g.NOT_GIVEN = object()) for this purpose. - The migration helper now short-circuits for
DatabaseType.MYSQL, meaning any existing SQLite data is ignored if a user switches to MySQL; if this is intentional, consider making the behavior explicit in configuration or code comments to avoid confusion about missing migrated data.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new platform stats aggregation logic (`get_platform_stats` and `get_platform_stats_time_series`) is now duplicated between SQLite and MySQL with only SQL dialect differences; consider extracting the common aggregation shape to a shared helper to reduce drift and keep behavior aligned across databases.
- In `mysql.py`, `NOT_GIVEN` is defined as a `TypeVar` but used as a runtime sentinel in `update_persona`; it would be clearer and more conventional to use a unique object instance (e.g. `NOT_GIVEN = object()`) for this purpose.
- The migration helper now short-circuits for `DatabaseType.MYSQL`, meaning any existing SQLite data is ignored if a user switches to MySQL; if this is intentional, consider making the behavior explicit in configuration or code comments to avoid confusion about missing migrated data.
## Individual Comments
### Comment 1
<location> `astrbot/core/db/mysql.py:103-106` </location>
<code_context>
+ async with self.AsyncSessionLocal() as session:
+ yield session
+
+ async def initialize(self) -> None:
+ """Initialize the database by creating tables if they do not exist."""
+ async with self.engine.begin() as conn:
+ await conn.run_sync(SQLModel.metadata.create_all)
+ await conn.commit()
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid calling `conn.commit()` inside `engine.begin()`; the context manager already manages the transaction.
Within `async with self.engine.begin() as conn:`, SQLAlchemy opens a transaction and commits or rolls back on exit. The extra `await conn.commit()` is unnecessary and can cause errors in some configurations. Please rely on the context manager’s automatic commit instead.
</issue_to_address>
### Comment 2
<location> `astrbot/core/db/mysql.py:840` </location>
<code_context>
+ result = await session.execute(query)
+ return list(result.scalars().all())
+
+ async def update_platform_session(
+ self,
+ session_id: str,
+ display_name: str | None = None,
+ ) -> None:
+ """Update a Platform session's updated_at timestamp and optionally display_name."""
+ async with self.get_db() as session:
+ session: AsyncSession
+ async with session.begin():
+ values: dict[str, T.Any] = {"updated_at": datetime.now(timezone.utc)}
+ if display_name is not None:
+ values["display_name"] = display_name
+
+ await session.execute(
+ update(PlatformSession)
+ .where(col(PlatformSession.session_id) == session_id)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Align timestamp timezone handling for `updated_at` with the rest of the model to avoid mixed aware/naive datetimes.
Other timestamps in this module (e.g., stats and message history) use naive `datetime.now()`, so this is the only UTC-aware field. Depending on how `PlatformSession.updated_at` is defined, this mismatch can cause warnings or comparison issues. Please choose a single convention (UTC-aware or naive) and apply it consistently across the DB layer.
```suggestion
values: dict[str, T.Any] = {"updated_at": datetime.now()}
```
</issue_to_address>
### Comment 3
<location> `astrbot/core/db/migration/migra_webchat_session.py:41` </location>
<code_context>
select(
col(PlatformMessageHistory.user_id),
- col(PlatformMessageHistory.sender_name),
+ func.max(PlatformMessageHistory.sender_name).label("sender_name"),
func.min(PlatformMessageHistory.created_at).label("earliest"),
func.max(PlatformMessageHistory.updated_at).label("latest"),
</code_context>
<issue_to_address>
**question (bug_risk):** Aggregating `sender_name` with `MAX` fixes SQL grouping but produces an arbitrary name; confirm this matches the intended semantics.
`func.max` satisfies `ONLY_FULL_GROUP_BY` but picks the lexicographically greatest name, which is essentially arbitrary within the group. If the chosen `sender_name` has semantic or UI significance, consider deriving it from a specific row (e.g., earliest or latest message) rather than relying on `MAX`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| async with self.get_db() as session: | ||
| session: AsyncSession | ||
| async with session.begin(): | ||
| values: dict[str, T.Any] = {"updated_at": datetime.now(timezone.utc)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): 请让 updated_at 的时间戳时区处理方式与模型的其他字段保持一致,以避免混用“有时区/无时区”的时间对象。
该模块中其他时间戳(例如统计信息和消息历史)使用的是无时区的 datetime.now(),因此这里是唯一一个使用 UTC 有时区时间的字段。根据 PlatformSession.updated_at 的定义方式不同,这种不一致可能会导致警告或比较问题。请在整个数据库层选择一种统一的约定(统一使用 UTC 有时区,或统一使用无时区),并一致地应用。
| values: dict[str, T.Any] = {"updated_at": datetime.now(timezone.utc)} | |
| values: dict[str, T.Any] = {"updated_at": datetime.now()} |
Original comment in English
suggestion (bug_risk): Align timestamp timezone handling for updated_at with the rest of the model to avoid mixed aware/naive datetimes.
Other timestamps in this module (e.g., stats and message history) use naive datetime.now(), so this is the only UTC-aware field. Depending on how PlatformSession.updated_at is defined, this mismatch can cause warnings or comparison issues. Please choose a single convention (UTC-aware or naive) and apply it consistently across the DB layer.
| values: dict[str, T.Any] = {"updated_at": datetime.now(timezone.utc)} | |
| values: dict[str, T.Any] = {"updated_at": datetime.now()} |
| select( | ||
| col(PlatformMessageHistory.user_id), | ||
| col(PlatformMessageHistory.sender_name), | ||
| func.max(PlatformMessageHistory.sender_name).label("sender_name"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (bug_risk): 使用 MAX 来聚合 sender_name 能解决 SQL 分组问题,但会产生一个任意的名字;请确认这是否符合预期语义。
func.max 可以满足 ONLY_FULL_GROUP_BY 的要求,但它会选择字典序中最大的那个名字,在一个分组内这基本上是任意的。如果被选中的 sender_name 在语义或 UI 上有意义,建议考虑基于特定行(例如最早消息或最新消息)来派生,而不是依赖 MAX。
Original comment in English
question (bug_risk): Aggregating sender_name with MAX fixes SQL grouping but produces an arbitrary name; confirm this matches the intended semantics.
func.max satisfies ONLY_FULL_GROUP_BY but picks the lexicographically greatest name, which is essentially arbitrary within the group. If the chosen sender_name has semantic or UI significance, consider deriving it from a specific row (e.g., earliest or latest message) rather than relying on MAX.
|
干脆用pg数据库吧 |
都可以支持 |
closes: #3848
Modifications / 改动点
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
requirements.txt和pyproject.toml文件相应位置。/ I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations inrequirements.txtandpyproject.toml.Summary by Sourcery
在现有 SQLite 的基础上集成可配置的 MySQL 数据库支持,并扩展共享的数据库抽象层和统计处理逻辑,使其可在两种后端上通用。
New Features:
MySQLDatabase实现,支持异步连接/会话管理,并完整支持会话(conversations)、角色配置(personas)、偏好设置(preferences)、会话(sessions)、附件(attachments)以及平台消息历史(platform message history)。AstrBotMySQLSettings和数据库工厂(database factory),通过环境变量配置在运行时选择 SQLite 或 MySQL。Enhancements:
BaseDatabase抽象,增加数据库类型元数据和时间序列统计方法,并为 MySQL 后端禁用传统的 SQLite 迁移逻辑。sender_name,以兼容分组查询。Build:
aiomysql依赖以支持异步 MySQL。Original summary in English
Summary by Sourcery
Integrate configurable MySQL database support alongside existing SQLite and extend shared database abstractions and stats handling to work across both backends.
New Features:
Enhancements:
Build: