-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Overview
This document outlines comprehensive logging improvements for the Frontier gRPC Connect-based RPC server to enhance internal error debugging capabilities for developers. The recommendations focus on improving visibility across all layers of the Domain-Driven Design (DDD) architecture.
Current Logging Architecture
Strengths ✅
- Structured JSON logging with Zap logger
- Request ID propagation via X-Request-ID header
- Connect RPC interceptor logging with duration, method, and error codes
- Clean error mapping between layers (Repository → Service → Handler)
- Comprehensive audit logging for user actions
- Proper HTTP status code mapping from domain errors
Current Pain Points ❌
- Generic "Internal Server Error" responses hide root causes in production
- Minimal logging in service and repository layers
- Lost context when errors bubble up through layers
- No correlation between database query performance and errors
- Difficult to trace error origins across the DDD layers
- No automated alerting on error rate spikes
Detailed Logging Improvement Recommendations
1. Enhanced Error Context Propagation
Problem: Errors lose context as they bubble up from repository → service → handler layers.
Files to modify:
internal/store/postgres/user_repository.go:67core/user/service.go:71internal/api/v1beta1connect/user.go:42- Similar patterns across all repository and service files
Repository Layer Enhancement
// Example for internal/store/postgres/user_repository.go
func (r UserRepository) GetByID(ctx context.Context, id string) (user.User, error) {
logger := grpczap.Extract(ctx)
if strings.TrimSpace(id) == "" {
return user.User{}, user.ErrInvalidID
}
var fetchedUser User
userQuery, params, err := dialect.From(TABLE_USERS).
Where(goqu.Ex{
"id": id,
}).Where(notDisabledUserExp).ToSQL()
if err != nil {
return user.User{}, fmt.Errorf("%w: %w", queryErr, err)
}
if err = r.dbc.WithTimeout(ctx, TABLE_USERS, "GetByID", func(ctx context.Context) error {
return r.dbc.GetContext(ctx, &fetchedUser, userQuery, params...)
}); err != nil {
// Log the original database error with context
logger.Error("database query failed",
zap.String("operation", "user.GetByID"),
zap.String("table", TABLE_USERS),
zap.String("user_id", id),
zap.String("query", userQuery),
zap.Any("params", params),
zap.String("error_type", fmt.Sprintf("%T", err)),
zap.Error(err))
err = checkPostgresError(err)
switch {
case errors.Is(err, ErrDuplicateKey):
return user.User{}, fmt.Errorf("user lookup failed for id=%s: %w", id, user.ErrConflict)
case errors.Is(err, sql.ErrNoRows):
return user.User{}, fmt.Errorf("user not found for id=%s: %w", id, user.ErrNotExist)
case errors.Is(err, ErrInvalidTextRepresentation):
return user.User{}, fmt.Errorf("invalid UUID format for id=%s: %w", id, user.ErrInvalidUUID)
default:
return user.User{}, fmt.Errorf("database error in user.GetByID id=%s: %w", id, err)
}
}
transformedUser, err := fetchedUser.transformToUser()
if err != nil {
logger.Error("user transformation failed",
zap.String("operation", "user.GetByID"),
zap.String("user_id", id),
zap.Error(err))
return user.User{}, fmt.Errorf("user transform error for id=%s: %w", id, err)
}
return transformedUser, nil
}Service Layer Enhancement
// Example for core/user/service.go
func (s Service) Create(ctx context.Context, user User) (User, error) {
logger := grpczap.Extract(ctx)
logger.Info("creating user",
zap.String("operation", "user.Create"),
zap.String("email", user.Email),
zap.String("name", user.Name))
result, err := s.repository.Create(ctx, User{
Name: strings.ToLower(user.Name),
Email: strings.ToLower(user.Email),
State: Enabled,
Avatar: user.Avatar,
Title: user.Title,
Metadata: user.Metadata,
})
if err != nil {
logger.Error("user creation failed",
zap.String("operation", "user.Create"),
zap.String("email", user.Email),
zap.String("name", user.Name),
zap.String("error_type", fmt.Sprintf("%T", err)),
zap.Error(err))
return User{}, fmt.Errorf("service.Create failed for user %s: %w", user.Email, err)
}
logger.Info("user created successfully",
zap.String("operation", "user.Create"),
zap.String("user_id", result.ID),
zap.String("email", result.Email))
return result, nil
}
func (s Service) GetByID(ctx context.Context, id string) (User, error) {
logger := grpczap.Extract(ctx)
logger.Debug("fetching user",
zap.String("operation", "user.GetByID"),
zap.String("id", id))
var result User
var err error
if isValidEmail(id) {
result, err = s.GetByEmail(ctx, id)
} else if utils.IsValidUUID(id) {
result, err = s.repository.GetByID(ctx, id)
} else {
result, err = s.repository.GetByName(ctx, strings.ToLower(id))
}
if err != nil {
logger.Error("user fetch failed",
zap.String("operation", "user.GetByID"),
zap.String("id", id),
zap.String("lookup_type", getLookupType(id)),
zap.String("error_type", fmt.Sprintf("%T", err)),
zap.Error(err))
return User{}, fmt.Errorf("service.GetByID failed for id=%s: %w", id, err)
}
return result, nil
}
func getLookupType(id string) string {
if isValidEmail(id) {
return "email"
} else if utils.IsValidUUID(id) {
return "uuid"
}
return "name"
}2. Enhanced Handler Error Logging
Problem: Handlers map all internal errors to generic "Internal Server Error" without logging specifics.
Files to modify:
internal/api/v1beta1connect/user.go- All handler files in
internal/api/v1beta1connect/*.go
// Example for internal/api/v1beta1connect/user.go
func (h *ConnectHandler) CreateUser(ctx context.Context, request *connect.Request[frontierv1beta1.CreateUserRequest]) (*connect.Response[frontierv1beta1.CreateUserResponse], error) {
logger := grpczap.Extract(ctx)
requestID := request.Header().Get(consts.RequestIDHeader)
if request.Msg.GetBody() == nil {
logger.Warn("create user request missing body",
zap.String("operation", "CreateUser"),
zap.String("request_id", requestID))
return nil, connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
}
email := strings.TrimSpace(request.Msg.GetBody().GetEmail())
if email == "" {
currentUserEmail, ok := authenticate.GetEmailFromContext(ctx)
if !ok {
logger.Warn("create user missing email and no authenticated context",
zap.String("operation", "CreateUser"),
zap.String("request_id", requestID))
return nil, connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
}
currentUserEmail = strings.TrimSpace(currentUserEmail)
if currentUserEmail == "" {
logger.Error("authenticated user has empty email",
zap.String("operation", "CreateUser"),
zap.String("request_id", requestID))
return nil, connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
}
email = currentUserEmail
}
title := request.Msg.GetBody().GetTitle()
name := strings.TrimSpace(request.Msg.GetBody().GetName())
if name == "" {
name = str.GenerateUserSlug(email)
}
var metaDataMap metadata.Metadata
if request.Msg.GetBody().GetMetadata() != nil {
metaDataMap = metadata.Build(request.Msg.GetBody().GetMetadata().AsMap())
if err := h.metaSchemaService.Validate(metaDataMap, userMetaSchema); err != nil {
logger.Warn("metadata validation failed",
zap.String("operation", "CreateUser"),
zap.String("request_id", requestID),
zap.String("email", email),
zap.Error(err))
return nil, connect.NewError(connect.CodeInvalidArgument, ErrBadBodyMetaSchemaError)
}
}
logger.Info("creating user",
zap.String("operation", "CreateUser"),
zap.String("request_id", requestID),
zap.String("email", email),
zap.String("name", name))
newUser, err := h.userService.Create(ctx, user.User{
Title: title,
Email: email,
Name: name,
Avatar: request.Msg.GetBody().GetAvatar(),
Metadata: metaDataMap,
})
if err != nil {
// Log detailed error info before generic mapping
logger.Error("user service create failed",
zap.String("operation", "CreateUser"),
zap.String("request_id", requestID),
zap.String("email", email),
zap.String("name", name),
zap.String("error_type", fmt.Sprintf("%T", err)),
zap.Error(err))
switch {
case errors.Is(err, user.ErrConflict):
return nil, connect.NewError(connect.CodeAlreadyExists, ErrConflictRequest)
case errors.Is(errors.Unwrap(err), user.ErrKeyDoesNotExists):
return nil, connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
default:
// Log additional context for internal errors
logger.Error("unexpected error in CreateUser",
zap.String("operation", "CreateUser"),
zap.String("request_id", requestID),
zap.String("user_email", email),
zap.String("user_name", name),
zap.String("error_chain", fmt.Sprintf("%+v", err)),
zap.Error(err))
return nil, connect.NewError(connect.CodeInternal, ErrInternalServerError)
}
}
transformedUser, err := transformUserToPB(newUser)
if err != nil {
logger.Error("user transformation to protobuf failed",
zap.String("operation", "CreateUser"),
zap.String("request_id", requestID),
zap.String("user_id", newUser.ID),
zap.String("error_type", fmt.Sprintf("%T", err)),
zap.Error(err))
return nil, connect.NewError(connect.CodeInternal, ErrInternalServerError)
}
// Log successful creation
logger.Info("user created successfully",
zap.String("operation", "CreateUser"),
zap.String("request_id", requestID),
zap.String("user_id", newUser.ID),
zap.String("email", newUser.Email))
audit.GetAuditor(ctx, schema.PlatformOrgID.String()).
LogWithAttrs(audit.UserCreatedEvent, audit.UserTarget(newUser.ID), map[string]string{
"email": newUser.Email,
"name": newUser.Name,
"title": newUser.Title,
"avatar": newUser.Avatar,
})
return connect.NewResponse(&frontierv1beta1.CreateUserResponse{User: transformedUser}), nil
}3. Database Query Performance Logging
Problem: No visibility into slow queries or database performance issues.
Files to modify:
- All repository files in
internal/store/postgres/
// Enhanced database operation logging
func (r UserRepository) GetByID(ctx context.Context, id string) (user.User, error) {
logger := grpczap.Extract(ctx)
startTime := time.Now()
if strings.TrimSpace(id) == "" {
return user.User{}, user.ErrInvalidID
}
var fetchedUser User
userQuery, params, err := dialect.From(TABLE_USERS).
Where(goqu.Ex{
"id": id,
}).Where(notDisabledUserExp).ToSQL()
if err != nil {
return user.User{}, fmt.Errorf("%w: %w", queryErr, err)
}
if err = r.dbc.WithTimeout(ctx, TABLE_USERS, "GetByID", func(ctx context.Context) error {
return r.dbc.GetContext(ctx, &fetchedUser, userQuery, params...)
}); err != nil {
duration := time.Since(startTime)
logger.Error("database operation failed",
zap.String("operation", "user.GetByID"),
zap.String("table", TABLE_USERS),
zap.String("user_id", id),
zap.Duration("query_duration", duration),
zap.String("query", userQuery),
zap.Any("query_params", params),
zap.String("error_type", fmt.Sprintf("%T", err)),
zap.Error(err))
// ... existing error handling
} else {
duration := time.Since(startTime)
// Log slow queries for performance monitoring
if duration > 100*time.Millisecond {
logger.Warn("slow database query detected",
zap.String("operation", "user.GetByID"),
zap.String("table", TABLE_USERS),
zap.String("user_id", id),
zap.Duration("query_duration", duration),
zap.String("query", userQuery),
zap.Any("query_params", params))
}
// Debug logging for all queries in development
logger.Debug("database query completed",
zap.String("operation", "user.GetByID"),
zap.String("table", TABLE_USERS),
zap.String("user_id", id),
zap.Duration("query_duration", duration))
}
transformedUser, err := fetchedUser.transformToUser()
if err != nil {
logger.Error("user transformation failed",
zap.String("operation", "user.GetByID"),
zap.String("user_id", id),
zap.Error(err))
return user.User{}, fmt.Errorf("%w: %w", parseErr, err)
}
return transformedUser, nil
}
// Enhanced Create method with transaction logging
func (r UserRepository) Create(ctx context.Context, usr user.User) (user.User, error) {
logger := grpczap.Extract(ctx)
startTime := time.Now()
if strings.TrimSpace(usr.Email) == "" || strings.TrimSpace(usr.Name) == "" {
return user.User{}, user.ErrInvalidDetails
}
insertRow := goqu.Record{
"name": strings.ToLower(usr.Name),
"email": strings.ToLower(usr.Email),
"title": usr.Title,
"avatar": usr.Avatar,
"created_at": goqu.L("now()"),
"updated_at": goqu.L("now()"),
}
if usr.Metadata != nil {
marshaledMetadata, err := json.Marshal(usr.Metadata)
if err != nil {
logger.Error("metadata marshaling failed",
zap.String("operation", "user.Create"),
zap.String("email", usr.Email),
zap.Error(err))
return user.User{}, fmt.Errorf("%w: %w", parseErr, err)
}
insertRow["metadata"] = marshaledMetadata
}
if usr.State != "" {
insertRow["state"] = usr.State
}
createQuery, params, err := dialect.Insert(TABLE_USERS).Rows(insertRow).Returning(&User{}).ToSQL()
if err != nil {
return user.User{}, fmt.Errorf("%w: %w", queryErr, err)
}
tx, err := r.dbc.BeginTxx(ctx, nil)
if err != nil {
logger.Error("transaction begin failed",
zap.String("operation", "user.Create"),
zap.String("email", usr.Email),
zap.Error(err))
return user.User{}, err
}
var userModel User
if err = r.dbc.WithTimeout(ctx, TABLE_USERS, "Create", func(ctx context.Context) error {
return tx.QueryRowxContext(ctx, createQuery, params...).StructScan(&userModel)
}); err != nil {
duration := time.Since(startTime)
logger.Error("user creation query failed",
zap.String("operation", "user.Create"),
zap.String("table", TABLE_USERS),
zap.String("email", usr.Email),
zap.String("name", usr.Name),
zap.Duration("query_duration", duration),
zap.String("query", createQuery),
zap.Any("query_params", params),
zap.String("error_type", fmt.Sprintf("%T", err)),
zap.Error(err))
if rollbackErr := tx.Rollback(); rollbackErr != nil {
logger.Error("transaction rollback failed",
zap.String("operation", "user.Create"),
zap.String("email", usr.Email),
zap.Error(rollbackErr))
}
err = checkPostgresError(err)
switch {
case errors.Is(err, ErrDuplicateKey):
return user.User{}, user.ErrConflict
default:
return user.User{}, err
}
}
if err = tx.Commit(); err != nil {
duration := time.Since(startTime)
logger.Error("transaction commit failed",
zap.String("operation", "user.Create"),
zap.String("email", usr.Email),
zap.Duration("total_duration", duration),
zap.Error(err))
return user.User{}, err
}
duration := time.Since(startTime)
logger.Info("user created successfully",
zap.String("operation", "user.Create"),
zap.String("user_id", userModel.ID),
zap.String("email", usr.Email),
zap.Duration("total_duration", duration))
transformedUser, err := userModel.transformToUser()
if err != nil {
logger.Error("user transformation failed after creation",
zap.String("operation", "user.Create"),
zap.String("user_id", userModel.ID),
zap.Error(err))
return user.User{}, fmt.Errorf("%w: %w", parseErr, err)
}
return transformedUser, nil
}4. Enhanced Request Correlation
Problem: Hard to trace a request's journey through all layers.
Files to modify:
pkg/server/connect_interceptors/logger.go
// Enhanced Connect interceptor logging
func UnaryConnectLoggerInterceptor(logger *zap.Logger, opts *LoggerOptions) connect.UnaryInterceptorFunc {
if opts == nil {
opts = NewLoggerOptions()
}
return func(next connect.UnaryFunc) connect.UnaryFunc {
return func(ctx context.Context, req connect.AnyRequest) (connect.AnyResponse, error) {
if !opts.decider(req.Spec().Procedure) {
return next(ctx, req)
}
// Embed enhanced logger in context with request metadata
requestID := req.Header().Get(consts.RequestIDHeader)
enrichedLogger := logger.With(
zap.String("request_id", requestID),
zap.String("method", req.Spec().Procedure))
ctx = context.WithValue(ctx, loggerContextKey, enrichedLogger)
startTime := time.Now()
// Log request start
enrichedLogger.Debug("request started",
zap.String("user_agent", req.Header().Get("User-Agent")),
zap.String("remote_addr", req.Header().Get("X-Forwarded-For")),
zap.String("content_type", req.Header().Get("Content-Type")))
resp, err := next(ctx, req)
duration := time.Since(startTime)
code := connect.Code(0)
if connectErr, ok := err.(*connect.Error); ok {
code = connectErr.Code()
}
fields := []zap.Field{
zap.String("system", "connect_rpc"),
zap.Time("start_time", startTime),
zap.String("method", req.Spec().Procedure),
zap.Int64("time_ms", duration.Milliseconds()),
zap.String("code", code.String()),
zap.String("request_id", requestID),
zap.String("user_agent", req.Header().Get("User-Agent")),
zap.String("remote_addr", req.Header().Get("X-Forwarded-For")),
}
// Add response size if available
if resp != nil {
if size := resp.Header().Get("Content-Length"); size != "" {
fields = append(fields, zap.String("response_size", size))
}
}
if err != nil {
fields = append(fields,
zap.String("error_type", fmt.Sprintf("%T", err)),
zap.Bool("is_user_error", isUserError(code)),
zap.Error(err))
}
// Log completion with appropriate level
if err == nil {
enrichedLogger.Info("request completed", fields...)
return resp, err
}
switch code {
case connect.CodeCanceled:
enrichedLogger.Warn("client cancelled request", fields...)
case connect.CodeDeadlineExceeded:
enrichedLogger.Warn("request timeout", fields...)
case connect.CodeInvalidArgument,
connect.CodeNotFound,
connect.CodeAlreadyExists,
connect.CodeUnauthenticated,
connect.CodePermissionDenied,
connect.CodeFailedPrecondition,
connect.CodeOutOfRange:
enrichedLogger.Warn("client error", fields...)
default:
enrichedLogger.Error("server error", fields...)
}
return resp, err
}
}
}
func isUserError(code connect.Code) bool {
return code == connect.CodeInvalidArgument ||
code == connect.CodeNotFound ||
code == connect.CodeAlreadyExists ||
code == connect.CodeUnauthenticated ||
code == connect.CodePermissionDenied ||
code == connect.CodeFailedPrecondition ||
code == connect.CodeOutOfRange
}5. Centralized Error Classification Utility
Problem: Error mapping is scattered across handlers with inconsistent logging.
New file: internal/api/v1beta1connect/error_handler.go
package v1beta1connect
import (
"context"
"fmt"
"connectrpc.com/connect"
"github.com/grpc-ecosystem/go-grpc-middleware/logging/zap/ctxzap"
"github.com/pkg/errors"
"go.uber.org/zap"
"github.com/raystack/frontier/core/user"
"github.com/raystack/frontier/core/organization"
"github.com/raystack/frontier/core/project"
// ... other domain imports
)
type ErrorHandler struct{}
func NewErrorHandler() *ErrorHandler {
return &ErrorHandler{}
}
func (e *ErrorHandler) HandleServiceError(ctx context.Context, operation string, err error, contextFields ...zap.Field) error {
logger := ctxzap.Extract(ctx)
baseFields := []zap.Field{
zap.String("operation", operation),
zap.String("error_type", fmt.Sprintf("%T", err)),
zap.String("error_chain", fmt.Sprintf("%+v", err)),
zap.Error(err),
}
baseFields = append(baseFields, contextFields...)
// User domain errors
switch {
case errors.Is(err, user.ErrNotExist):
logger.Warn("user not found", baseFields...)
return connect.NewError(connect.CodeNotFound, ErrUserNotExist)
case errors.Is(err, user.ErrConflict):
logger.Warn("user conflict", baseFields...)
return connect.NewError(connect.CodeAlreadyExists, ErrConflictRequest)
case errors.Is(err, user.ErrInvalidDetails):
logger.Warn("invalid user details", baseFields...)
return connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
case errors.Is(err, user.ErrInvalidID):
logger.Warn("invalid user ID", baseFields...)
return connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
case errors.Is(err, user.ErrInvalidUUID):
logger.Warn("invalid user UUID", baseFields...)
return connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
// Organization domain errors
case errors.Is(err, organization.ErrNotExist):
logger.Warn("organization not found", baseFields...)
return connect.NewError(connect.CodeNotFound, ErrNotFound)
case errors.Is(err, organization.ErrConflict):
logger.Warn("organization conflict", baseFields...)
return connect.NewError(connect.CodeAlreadyExists, ErrConflictRequest)
case errors.Is(err, organization.ErrDisabled):
logger.Warn("organization disabled", baseFields...)
return connect.NewError(connect.CodePermissionDenied, ErrOrganizationDisabled)
// Project domain errors
case errors.Is(err, project.ErrNotExist):
logger.Warn("project not found", baseFields...)
return connect.NewError(connect.CodeNotFound, ErrNotFound)
case errors.Is(err, project.ErrConflict):
logger.Warn("project conflict", baseFields...)
return connect.NewError(connect.CodeAlreadyExists, ErrConflictRequest)
// Generic validation errors
case isValidationError(err):
logger.Warn("validation error", baseFields...)
return connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
// Authentication/Authorization errors
case isAuthError(err):
logger.Warn("authentication/authorization error", baseFields...)
return connect.NewError(connect.CodeUnauthenticated, ErrUnauthenticated)
default:
// Log all unhandled errors as server errors
logger.Error("unhandled service error", baseFields...)
return connect.NewError(connect.CodeInternal, ErrInternalServerError)
}
}
func isValidationError(err error) bool {
// Add checks for validation-related errors
return false // Implement based on your validation error types
}
func isAuthError(err error) bool {
// Add checks for authentication-related errors
return false // Implement based on your auth error types
}
// Usage example in handlers
func (h *ConnectHandler) GetUser(ctx context.Context, request *connect.Request[frontierv1beta1.GetUserRequest]) (*connect.Response[frontierv1beta1.GetUserResponse], error) {
errorHandler := NewErrorHandler()
fetchedUser, err := h.userService.GetByID(ctx, request.Msg.GetId())
if err != nil {
return nil, errorHandler.HandleServiceError(ctx, "GetUser", err,
zap.String("user_id", request.Msg.GetId()),
zap.String("request_id", request.Header().Get(consts.RequestIDHeader)))
}
userPB, err := transformUserToPB(fetchedUser)
if err != nil {
return nil, errorHandler.HandleServiceError(ctx, "GetUser.Transform", err,
zap.String("user_id", fetchedUser.ID))
}
return connect.NewResponse(&frontierv1beta1.GetUserResponse{
User: userPB,
}), nil
}6. Metric-Based Error Monitoring
Problem: No automated alerting on error rate spikes.
Files to modify:
pkg/server/connect_interceptors/logger.go
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
// Add metrics to the interceptor
var (
requestCount = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "frontier_requests_total",
Help: "Total number of requests processed",
},
[]string{"method", "code"},
)
errorCount = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "frontier_errors_total",
Help: "Total number of errors by type and layer",
},
[]string{"method", "error_type", "layer", "code"},
)
requestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "frontier_request_duration_seconds",
Help: "Request duration in seconds",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "code"},
)
slowQueryCount = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "frontier_slow_queries_total",
Help: "Total number of slow database queries",
},
[]string{"table", "operation"},
)
)
func UnaryConnectLoggerInterceptor(logger *zap.Logger, opts *LoggerOptions) connect.UnaryInterceptorFunc {
if opts == nil {
opts = NewLoggerOptions()
}
return func(next connect.UnaryFunc) connect.UnaryFunc {
return func(ctx context.Context, req connect.AnyRequest) (connect.AnyResponse, error) {
if !opts.decider(req.Spec().Procedure) {
return next(ctx, req)
}
startTime := time.Now()
resp, err := next(ctx, req)
duration := time.Since(startTime)
code := connect.Code(0)
if connectErr, ok := err.(*connect.Error); ok {
code = connectErr.Code()
}
// Record metrics
method := req.Spec().Procedure
codeStr := code.String()
requestCount.WithLabelValues(method, codeStr).Inc()
requestDuration.WithLabelValues(method, codeStr).Observe(duration.Seconds())
if err != nil {
errorCount.WithLabelValues(
method,
fmt.Sprintf("%T", err),
"handler",
codeStr).Inc()
}
// ... existing logging code ...
return resp, err
}
}
}
// Add to repository methods for database metrics
func (r UserRepository) recordSlowQuery(table, operation string, duration time.Duration) {
if duration > 100*time.Millisecond {
slowQueryCount.WithLabelValues(table, operation).Inc()
}
}Implementation Priority
High Priority (Immediate Impact) 🔴
- Enhanced handler error logging - Provides immediate debugging visibility
- Database operation logging with performance metrics
- Service layer contextual logging - Fills the current logging gap
- Request correlation enhancements - Improves traceability
Medium Priority (Next Phase) 🟡
- Centralized error classification utility - Reduces code duplication
- Error rate metrics - Enables monitoring and alerting
- Enhanced error context propagation - Improves error investigation
Low Priority (Future Improvements) 🟢
- Advanced query analysis and optimization logging
- Automated error alerting integration
- Error trend analysis and reporting
Example Debug Scenario
With these improvements implemented, when a user creation fails, developers will see complete tracing:
Before (Current State)
{
"level": "error",
"time": "2024-01-15T10:30:45Z",
"msg": "finished call",
"method": "/raystack.frontier.v1beta1.FrontierService/CreateUser",
"code": "internal",
"error": "Internal Server Error"
}After (With Improvements)
{
"level": "error",
"time": "2024-01-15T10:30:45Z",
"msg": "database query failed",
"system": "connect_rpc",
"request_id": "req_abc123",
"method": "/raystack.frontier.v1beta1.FrontierService/CreateUser",
"operation": "user.Create",
"table": "users",
"email": "[email protected]",
"query": "INSERT INTO users (name, email, title, avatar, created_at, updated_at) VALUES ($1, $2, $3, $4, now(), now()) RETURNING *",
"query_params": ["testuser", "[email protected]", "", "", "now()", "now()"],
"query_duration": "2ms",
"error_type": "*pq.Error",
"error_chain": "service.Create failed for user [email protected]: database error in user.Create: pq: duplicate key value violates unique constraint \"users_email_key\"",
"error": "duplicate key value violates unique constraint \"users_email_key\""
}
{
"level": "error",
"time": "2024-01-15T10:30:45Z",
"msg": "user service create failed",
"operation": "CreateUser",
"request_id": "req_abc123",
"email": "[email protected]",
"name": "testuser",
"error_type": "*fmt.wrapError",
"error": "service.Create failed for user [email protected]: database error in user.Create: pq: duplicate key value violates unique constraint \"users_email_key\""
}This provides complete context for debugging:
- Original request details (request ID, email, name)
- Failing operation (user.Create)
- Actual SQL query and parameters
- Root database error (constraint violation)
- Error propagation chain through all layers
- Performance metrics (query duration)
Benefits
- Faster Issue Resolution - Developers can immediately identify root causes
- Better Production Monitoring - Proactive detection of issues before they escalate
- Improved Code Quality - Visibility into performance bottlenecks
- Enhanced Debugging - Complete request tracing through all DDD layers
- Better Error Classification - Distinguishes between user errors and system errors
- Operational Insights - Database performance and system health visibility
Next Steps
- Start with High Priority items - Focus on immediate debugging improvements
- Implement incrementally - Roll out layer by layer to avoid disruption
- Test thoroughly - Ensure logging doesn't impact performance significantly
- Set up monitoring - Configure alerts based on new error metrics
- Document patterns - Create guidelines for consistent logging across teams
- Review and iterate - Gather feedback from developers and refine approach