Skip to content

Conversation

@jopemachine
Copy link
Member

@jopemachine jopemachine commented Jan 6, 2026

resolves #7754 (BA-3726)

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Installer updates including:
    • Fixtures for db schema changes
    • New mandatory config options
  • Update of end-to-end CLI integration tests in ai.backend.test
  • API server-client counterparts (e.g., manager API -> client SDK)
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation
  • Documentation
    • Contents in the docs directory
    • docstrings in public interfaces and type annotations

@jopemachine jopemachine changed the title feat: Apply Service, Repository pattern to ErrorLog feat(BA-3726): Apply Service, Repository pattern to ErrorLog Jan 6, 2026
@github-actions github-actions bot added size:L 100~500 LoC comp:manager Related to Manager component comp:common Related to Common component labels Jan 6, 2026
@jopemachine jopemachine added this to the 26.1 milestone Jan 6, 2026
@github-actions github-actions bot added size:XL 500~ LoC and removed size:L 100~500 LoC labels Jan 6, 2026
@jopemachine jopemachine changed the title feat(BA-3726): Apply Service, Repository pattern to ErrorLog feat(BA-3726): Implement ErrorLog Service, Repository Layer Jan 6, 2026
@jopemachine jopemachine marked this pull request as ready for review January 7, 2026 01:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the ErrorLog service and repository layer to support error logging functionality in the Backend.AI manager. The implementation follows the established patterns for services and repositories in the codebase.

Key changes:

  • Introduces ErrorLogData type and ErrorLogSeverity enum for type-safe error log handling
  • Implements ErrorLogRepository with database operations and resilience policies
  • Adds ErrorLogService with create action support
  • Provides comprehensive unit tests for both repository and service layers

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/ai/backend/manager/data/error_log/types.py Defines ErrorLogSeverity enum and ErrorLogData dataclass for type-safe error log representation
src/ai/backend/manager/data/error_log/init.py Exports error log data types
src/ai/backend/manager/models/error_logs.py Adds ErrorLogRow model with SQLAlchemy table definition and to_dataclass conversion
src/ai/backend/manager/repositories/error_log/creators.py Implements ErrorLogCreatorSpec for creating error log entries
src/ai/backend/manager/repositories/error_log/db_source/db_source.py Provides ErrorLogDBSource for database operations with resilience policies
src/ai/backend/manager/repositories/error_log/db_source/init.py Exports ErrorLogDBSource
src/ai/backend/manager/repositories/error_log/repository.py Implements ErrorLogRepository with create method and resilience
src/ai/backend/manager/repositories/error_log/repositories.py Defines ErrorLogRepositories container class
src/ai/backend/manager/repositories/error_log/init.py Exports error log repository components
src/ai/backend/manager/repositories/repositories.py Integrates ErrorLogRepositories into main Repositories class
src/ai/backend/manager/services/error_log/actions/base.py Defines base ErrorLogAction class for error log operations
src/ai/backend/manager/services/error_log/actions/create.py Implements CreateErrorLogAction and CreateErrorLogActionResult
src/ai/backend/manager/services/error_log/actions/init.py Exports error log actions
src/ai/backend/manager/services/error_log/service.py Implements ErrorLogService with create method
src/ai/backend/manager/services/error_log/processors.py Defines ErrorLogProcessors for action processing
src/ai/backend/manager/services/error_log/init.py Exports error log service components
src/ai/backend/manager/services/processors.py Integrates ErrorLogService and ErrorLogProcessors into service infrastructure
src/ai/backend/common/metrics/metric.py Adds ERROR_LOG_REPOSITORY and ERROR_LOG_DB_SOURCE layer types for metrics
tests/unit/manager/repositories/error_log/test_error_log_repository.py Provides comprehensive repository tests with database operations
tests/unit/manager/repositories/error_log/BUILD Defines test build configuration
tests/unit/manager/services/error_log/test_error_log_service.py Implements service layer tests with mocked repository
tests/unit/manager/services/error_log/BUILD Defines test build configuration
changes/7803.feature.md Documents the feature implementation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +14 to +21
@dataclass
class ErrorLogService:
"""Service for error log operations."""

_repository: ErrorLogRepository

def __init__(self, repository: ErrorLogRepository) -> None:
self._repository = repository
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @DataClass decorator should not be used with an explicit init method. Service classes in this codebase (like UserService, ImageService, DomainService) don't use @DataClass at all - they define init manually. Remove the @DataClass decorator from ErrorLogService to be consistent with existing service patterns.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1 @@
Implement `ErrorLog` Service, Repository Layer
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog filename is '7803.feature.md' but the PR description mentions resolving issue #7754. Verify that the changelog filename matches the correct issue number. If #7754 is the correct issue, the file should be renamed to '7754.feature.md'.

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +47
error_log_db_source_resilience = Resilience(
policies=[
MetricPolicy(MetricArgs(domain=DomainType.DB_SOURCE, layer=LayerType.ERROR_LOG_DB_SOURCE)),
RetryPolicy(
RetryArgs(
max_retries=5,
retry_delay=0.1,
backoff_strategy=BackoffStrategy.FIXED,
non_retryable_exceptions=(BackendAIError,),
)
),
]
)


class ErrorLogDBSource:
_db: ExtendedAsyncSAEngine

def __init__(self, db: ExtendedAsyncSAEngine) -> None:
self._db = db

@error_log_db_source_resilience.apply()
async def create(self, creator: Creator[ErrorLogRow]) -> ErrorLogData:
async with self._db.begin_session() as db_sess:
result = await execute_creator(db_sess, creator)
return result.row.to_dataclass()
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant resilience policies are applied at both the Repository and DBSource layers. This pattern differs from existing repositories (e.g., ImageRepository) where resilience is only applied at the Repository layer, not at the DBSource layer. Having resilience at both layers could result in excessive retries (up to 50 attempts: 10 repository retries * 5 DBSource retries). Remove the resilience decorator and policies from ErrorLogDBSource to align with the established pattern.

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +83
async with with_tables(
database_connection,
[
# FK dependency order: parents before children
DomainRow,
ScalingGroupRow,
UserResourcePolicyRow,
ProjectResourcePolicyRow,
KeyPairResourcePolicyRow,
UserRoleRow,
UserRow,
KeyPairRow,
GroupRow,
AgentRow,
VFolderRow,
ImageRow,
ResourcePresetRow,
EndpointRow,
DeploymentRevisionRow,
DeploymentAutoScalingPolicyRow,
DeploymentPolicyRow,
ErrorLogRow,
],
):
yield database_connection
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test fixture includes many unnecessary table dependencies that aren't required for error log testing. ErrorLogRow only depends on UserRow, which in turn requires DomainRow and UserResourcePolicyRow. The following tables can be removed from the fixture as they're not needed: ScalingGroupRow, ProjectResourcePolicyRow, KeyPairResourcePolicyRow, UserRoleRow, KeyPairRow, GroupRow, AgentRow, VFolderRow, ImageRow, ResourcePresetRow, EndpointRow, DeploymentRevisionRow, DeploymentAutoScalingPolicyRow, and DeploymentPolicyRow. Simplifying the fixture will make tests faster and easier to maintain.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +40
@dataclass
class ErrorLogData:
id: uuid.UUID
meta: ErrorLogMeta
content: ErrorLogContent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the code.

Comment on lines +46 to +75
__table__ = error_logs

def __init__(
self,
severity: ErrorLogSeverity,
source: str,
message: str,
context_lang: str,
context_env: dict[str, Any],
user: uuid.UUID | None = None,
is_read: bool = False,
is_cleared: bool = False,
request_url: str | None = None,
request_status: int | None = None,
traceback: str | None = None,
created_at: datetime | None = None,
) -> None:
self.severity = severity.value
self.source = source
self.user = user
self.is_read = is_read
self.is_cleared = is_cleared
self.message = message
self.context_lang = context_lang
self.context_env = context_env
self.request_url = request_url
self.request_status = request_status
self.traceback = traceback
if created_at:
self.created_at = created_at
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the table code and migrate to SQLAlchemy 2.0. Let's address this in a follow-up.

@HyeockJinKim HyeockJinKim added this pull request to the merge queue Jan 9, 2026
Merged via the queue into main with commit ca7f1eb Jan 9, 2026
31 checks passed
@HyeockJinKim HyeockJinKim deleted the feat/BA-3726 branch January 9, 2026 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:common Related to Common component comp:manager Related to Manager component size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Error Log Service, Repository Layer

4 participants