-
Notifications
You must be signed in to change notification settings - Fork 165
feat(BA-3726): Implement ErrorLog Service, Repository Layer
#7803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ErrorLogErrorLog
ErrorLogErrorLog Service, Repository Layer
4ad2213 to
c43e4bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements the ErrorLog service and repository layer to support error logging functionality in the Backend.AI manager. The implementation follows the established patterns for services and repositories in the codebase.
Key changes:
- Introduces ErrorLogData type and ErrorLogSeverity enum for type-safe error log handling
- Implements ErrorLogRepository with database operations and resilience policies
- Adds ErrorLogService with create action support
- Provides comprehensive unit tests for both repository and service layers
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ai/backend/manager/data/error_log/types.py | Defines ErrorLogSeverity enum and ErrorLogData dataclass for type-safe error log representation |
| src/ai/backend/manager/data/error_log/init.py | Exports error log data types |
| src/ai/backend/manager/models/error_logs.py | Adds ErrorLogRow model with SQLAlchemy table definition and to_dataclass conversion |
| src/ai/backend/manager/repositories/error_log/creators.py | Implements ErrorLogCreatorSpec for creating error log entries |
| src/ai/backend/manager/repositories/error_log/db_source/db_source.py | Provides ErrorLogDBSource for database operations with resilience policies |
| src/ai/backend/manager/repositories/error_log/db_source/init.py | Exports ErrorLogDBSource |
| src/ai/backend/manager/repositories/error_log/repository.py | Implements ErrorLogRepository with create method and resilience |
| src/ai/backend/manager/repositories/error_log/repositories.py | Defines ErrorLogRepositories container class |
| src/ai/backend/manager/repositories/error_log/init.py | Exports error log repository components |
| src/ai/backend/manager/repositories/repositories.py | Integrates ErrorLogRepositories into main Repositories class |
| src/ai/backend/manager/services/error_log/actions/base.py | Defines base ErrorLogAction class for error log operations |
| src/ai/backend/manager/services/error_log/actions/create.py | Implements CreateErrorLogAction and CreateErrorLogActionResult |
| src/ai/backend/manager/services/error_log/actions/init.py | Exports error log actions |
| src/ai/backend/manager/services/error_log/service.py | Implements ErrorLogService with create method |
| src/ai/backend/manager/services/error_log/processors.py | Defines ErrorLogProcessors for action processing |
| src/ai/backend/manager/services/error_log/init.py | Exports error log service components |
| src/ai/backend/manager/services/processors.py | Integrates ErrorLogService and ErrorLogProcessors into service infrastructure |
| src/ai/backend/common/metrics/metric.py | Adds ERROR_LOG_REPOSITORY and ERROR_LOG_DB_SOURCE layer types for metrics |
| tests/unit/manager/repositories/error_log/test_error_log_repository.py | Provides comprehensive repository tests with database operations |
| tests/unit/manager/repositories/error_log/BUILD | Defines test build configuration |
| tests/unit/manager/services/error_log/test_error_log_service.py | Implements service layer tests with mocked repository |
| tests/unit/manager/services/error_log/BUILD | Defines test build configuration |
| changes/7803.feature.md | Documents the feature implementation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @dataclass | ||
| class ErrorLogService: | ||
| """Service for error log operations.""" | ||
|
|
||
| _repository: ErrorLogRepository | ||
|
|
||
| def __init__(self, repository: ErrorLogRepository) -> None: | ||
| self._repository = repository |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @DataClass decorator should not be used with an explicit init method. Service classes in this codebase (like UserService, ImageService, DomainService) don't use @DataClass at all - they define init manually. Remove the @DataClass decorator from ErrorLogService to be consistent with existing service patterns.
| @@ -0,0 +1 @@ | |||
| Implement `ErrorLog` Service, Repository Layer | |||
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| error_log_db_source_resilience = Resilience( | ||
| policies=[ | ||
| MetricPolicy(MetricArgs(domain=DomainType.DB_SOURCE, layer=LayerType.ERROR_LOG_DB_SOURCE)), | ||
| RetryPolicy( | ||
| RetryArgs( | ||
| max_retries=5, | ||
| retry_delay=0.1, | ||
| backoff_strategy=BackoffStrategy.FIXED, | ||
| non_retryable_exceptions=(BackendAIError,), | ||
| ) | ||
| ), | ||
| ] | ||
| ) | ||
|
|
||
|
|
||
| class ErrorLogDBSource: | ||
| _db: ExtendedAsyncSAEngine | ||
|
|
||
| def __init__(self, db: ExtendedAsyncSAEngine) -> None: | ||
| self._db = db | ||
|
|
||
| @error_log_db_source_resilience.apply() | ||
| async def create(self, creator: Creator[ErrorLogRow]) -> ErrorLogData: | ||
| async with self._db.begin_session() as db_sess: | ||
| result = await execute_creator(db_sess, creator) | ||
| return result.row.to_dataclass() |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant resilience policies are applied at both the Repository and DBSource layers. This pattern differs from existing repositories (e.g., ImageRepository) where resilience is only applied at the Repository layer, not at the DBSource layer. Having resilience at both layers could result in excessive retries (up to 50 attempts: 10 repository retries * 5 DBSource retries). Remove the resilience decorator and policies from ErrorLogDBSource to align with the established pattern.
| async with with_tables( | ||
| database_connection, | ||
| [ | ||
| # FK dependency order: parents before children | ||
| DomainRow, | ||
| ScalingGroupRow, | ||
| UserResourcePolicyRow, | ||
| ProjectResourcePolicyRow, | ||
| KeyPairResourcePolicyRow, | ||
| UserRoleRow, | ||
| UserRow, | ||
| KeyPairRow, | ||
| GroupRow, | ||
| AgentRow, | ||
| VFolderRow, | ||
| ImageRow, | ||
| ResourcePresetRow, | ||
| EndpointRow, | ||
| DeploymentRevisionRow, | ||
| DeploymentAutoScalingPolicyRow, | ||
| DeploymentPolicyRow, | ||
| ErrorLogRow, | ||
| ], | ||
| ): | ||
| yield database_connection |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test fixture includes many unnecessary table dependencies that aren't required for error log testing. ErrorLogRow only depends on UserRow, which in turn requires DomainRow and UserResourcePolicyRow. The following tables can be removed from the fixture as they're not needed: ScalingGroupRow, ProjectResourcePolicyRow, KeyPairResourcePolicyRow, UserRoleRow, KeyPairRow, GroupRow, AgentRow, VFolderRow, ImageRow, ResourcePresetRow, EndpointRow, DeploymentRevisionRow, DeploymentAutoScalingPolicyRow, and DeploymentPolicyRow. Simplifying the fixture will make tests faster and easier to maintain.
524e633 to
7680e50
Compare
| @dataclass | ||
| class ErrorLogData: | ||
| id: uuid.UUID | ||
| meta: ErrorLogMeta | ||
| content: ErrorLogContent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the code.
| __table__ = error_logs | ||
|
|
||
| def __init__( | ||
| self, | ||
| severity: ErrorLogSeverity, | ||
| source: str, | ||
| message: str, | ||
| context_lang: str, | ||
| context_env: dict[str, Any], | ||
| user: uuid.UUID | None = None, | ||
| is_read: bool = False, | ||
| is_cleared: bool = False, | ||
| request_url: str | None = None, | ||
| request_status: int | None = None, | ||
| traceback: str | None = None, | ||
| created_at: datetime | None = None, | ||
| ) -> None: | ||
| self.severity = severity.value | ||
| self.source = source | ||
| self.user = user | ||
| self.is_read = is_read | ||
| self.is_cleared = is_cleared | ||
| self.message = message | ||
| self.context_lang = context_lang | ||
| self.context_env = context_env | ||
| self.request_url = request_url | ||
| self.request_status = request_status | ||
| self.traceback = traceback | ||
| if created_at: | ||
| self.created_at = created_at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the table code and migrate to SQLAlchemy 2.0. Let's address this in a follow-up.
resolves #7754 (BA-3726)
Checklist: (if applicable)
ai.backend.testdocsdirectory