Skip to content

Logging Improvements for Frontier #1280

@whoAbhishekSah

Description

@whoAbhishekSah

Overview

This document outlines comprehensive logging improvements for the Frontier gRPC Connect-based RPC server to enhance internal error debugging capabilities for developers. The recommendations focus on improving visibility across all layers of the Domain-Driven Design (DDD) architecture.

Current Logging Architecture

Strengths ✅

  • Structured JSON logging with Zap logger
  • Request ID propagation via X-Request-ID header
  • Connect RPC interceptor logging with duration, method, and error codes
  • Clean error mapping between layers (Repository → Service → Handler)
  • Comprehensive audit logging for user actions
  • Proper HTTP status code mapping from domain errors

Current Pain Points ❌

  • Generic "Internal Server Error" responses hide root causes in production
  • Minimal logging in service and repository layers
  • Lost context when errors bubble up through layers
  • No correlation between database query performance and errors
  • Difficult to trace error origins across the DDD layers
  • No automated alerting on error rate spikes

Detailed Logging Improvement Recommendations

1. Enhanced Error Context Propagation

Problem: Errors lose context as they bubble up from repository → service → handler layers.

Files to modify:

  • internal/store/postgres/user_repository.go:67
  • core/user/service.go:71
  • internal/api/v1beta1connect/user.go:42
  • Similar patterns across all repository and service files

Repository Layer Enhancement

// Example for internal/store/postgres/user_repository.go
func (r UserRepository) GetByID(ctx context.Context, id string) (user.User, error) {
    logger := grpczap.Extract(ctx)

    if strings.TrimSpace(id) == "" {
        return user.User{}, user.ErrInvalidID
    }

    var fetchedUser User
    userQuery, params, err := dialect.From(TABLE_USERS).
        Where(goqu.Ex{
            "id": id,
        }).Where(notDisabledUserExp).ToSQL()
    if err != nil {
        return user.User{}, fmt.Errorf("%w: %w", queryErr, err)
    }

    if err = r.dbc.WithTimeout(ctx, TABLE_USERS, "GetByID", func(ctx context.Context) error {
        return r.dbc.GetContext(ctx, &fetchedUser, userQuery, params...)
    }); err != nil {
        // Log the original database error with context
        logger.Error("database query failed",
            zap.String("operation", "user.GetByID"),
            zap.String("table", TABLE_USERS),
            zap.String("user_id", id),
            zap.String("query", userQuery),
            zap.Any("params", params),
            zap.String("error_type", fmt.Sprintf("%T", err)),
            zap.Error(err))

        err = checkPostgresError(err)
        switch {
        case errors.Is(err, ErrDuplicateKey):
            return user.User{}, fmt.Errorf("user lookup failed for id=%s: %w", id, user.ErrConflict)
        case errors.Is(err, sql.ErrNoRows):
            return user.User{}, fmt.Errorf("user not found for id=%s: %w", id, user.ErrNotExist)
        case errors.Is(err, ErrInvalidTextRepresentation):
            return user.User{}, fmt.Errorf("invalid UUID format for id=%s: %w", id, user.ErrInvalidUUID)
        default:
            return user.User{}, fmt.Errorf("database error in user.GetByID id=%s: %w", id, err)
        }
    }

    transformedUser, err := fetchedUser.transformToUser()
    if err != nil {
        logger.Error("user transformation failed",
            zap.String("operation", "user.GetByID"),
            zap.String("user_id", id),
            zap.Error(err))
        return user.User{}, fmt.Errorf("user transform error for id=%s: %w", id, err)
    }

    return transformedUser, nil
}

Service Layer Enhancement

// Example for core/user/service.go
func (s Service) Create(ctx context.Context, user User) (User, error) {
    logger := grpczap.Extract(ctx)
    logger.Info("creating user",
        zap.String("operation", "user.Create"),
        zap.String("email", user.Email),
        zap.String("name", user.Name))

    result, err := s.repository.Create(ctx, User{
        Name:     strings.ToLower(user.Name),
        Email:    strings.ToLower(user.Email),
        State:    Enabled,
        Avatar:   user.Avatar,
        Title:    user.Title,
        Metadata: user.Metadata,
    })

    if err != nil {
        logger.Error("user creation failed",
            zap.String("operation", "user.Create"),
            zap.String("email", user.Email),
            zap.String("name", user.Name),
            zap.String("error_type", fmt.Sprintf("%T", err)),
            zap.Error(err))
        return User{}, fmt.Errorf("service.Create failed for user %s: %w", user.Email, err)
    }

    logger.Info("user created successfully",
        zap.String("operation", "user.Create"),
        zap.String("user_id", result.ID),
        zap.String("email", result.Email))

    return result, nil
}

func (s Service) GetByID(ctx context.Context, id string) (User, error) {
    logger := grpczap.Extract(ctx)

    logger.Debug("fetching user",
        zap.String("operation", "user.GetByID"),
        zap.String("id", id))

    var result User
    var err error

    if isValidEmail(id) {
        result, err = s.GetByEmail(ctx, id)
    } else if utils.IsValidUUID(id) {
        result, err = s.repository.GetByID(ctx, id)
    } else {
        result, err = s.repository.GetByName(ctx, strings.ToLower(id))
    }

    if err != nil {
        logger.Error("user fetch failed",
            zap.String("operation", "user.GetByID"),
            zap.String("id", id),
            zap.String("lookup_type", getLookupType(id)),
            zap.String("error_type", fmt.Sprintf("%T", err)),
            zap.Error(err))
        return User{}, fmt.Errorf("service.GetByID failed for id=%s: %w", id, err)
    }

    return result, nil
}

func getLookupType(id string) string {
    if isValidEmail(id) {
        return "email"
    } else if utils.IsValidUUID(id) {
        return "uuid"
    }
    return "name"
}

2. Enhanced Handler Error Logging

Problem: Handlers map all internal errors to generic "Internal Server Error" without logging specifics.

Files to modify:

  • internal/api/v1beta1connect/user.go
  • All handler files in internal/api/v1beta1connect/*.go
// Example for internal/api/v1beta1connect/user.go
func (h *ConnectHandler) CreateUser(ctx context.Context, request *connect.Request[frontierv1beta1.CreateUserRequest]) (*connect.Response[frontierv1beta1.CreateUserResponse], error) {
    logger := grpczap.Extract(ctx)
    requestID := request.Header().Get(consts.RequestIDHeader)

    if request.Msg.GetBody() == nil {
        logger.Warn("create user request missing body",
            zap.String("operation", "CreateUser"),
            zap.String("request_id", requestID))
        return nil, connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
    }

    email := strings.TrimSpace(request.Msg.GetBody().GetEmail())
    if email == "" {
        currentUserEmail, ok := authenticate.GetEmailFromContext(ctx)
        if !ok {
            logger.Warn("create user missing email and no authenticated context",
                zap.String("operation", "CreateUser"),
                zap.String("request_id", requestID))
            return nil, connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
        }
        currentUserEmail = strings.TrimSpace(currentUserEmail)
        if currentUserEmail == "" {
            logger.Error("authenticated user has empty email",
                zap.String("operation", "CreateUser"),
                zap.String("request_id", requestID))
            return nil, connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
        }
        email = currentUserEmail
    }

    title := request.Msg.GetBody().GetTitle()
    name := strings.TrimSpace(request.Msg.GetBody().GetName())
    if name == "" {
        name = str.GenerateUserSlug(email)
    }

    var metaDataMap metadata.Metadata
    if request.Msg.GetBody().GetMetadata() != nil {
        metaDataMap = metadata.Build(request.Msg.GetBody().GetMetadata().AsMap())
        if err := h.metaSchemaService.Validate(metaDataMap, userMetaSchema); err != nil {
            logger.Warn("metadata validation failed",
                zap.String("operation", "CreateUser"),
                zap.String("request_id", requestID),
                zap.String("email", email),
                zap.Error(err))
            return nil, connect.NewError(connect.CodeInvalidArgument, ErrBadBodyMetaSchemaError)
        }
    }

    logger.Info("creating user",
        zap.String("operation", "CreateUser"),
        zap.String("request_id", requestID),
        zap.String("email", email),
        zap.String("name", name))

    newUser, err := h.userService.Create(ctx, user.User{
        Title:    title,
        Email:    email,
        Name:     name,
        Avatar:   request.Msg.GetBody().GetAvatar(),
        Metadata: metaDataMap,
    })

    if err != nil {
        // Log detailed error info before generic mapping
        logger.Error("user service create failed",
            zap.String("operation", "CreateUser"),
            zap.String("request_id", requestID),
            zap.String("email", email),
            zap.String("name", name),
            zap.String("error_type", fmt.Sprintf("%T", err)),
            zap.Error(err))

        switch {
        case errors.Is(err, user.ErrConflict):
            return nil, connect.NewError(connect.CodeAlreadyExists, ErrConflictRequest)
        case errors.Is(errors.Unwrap(err), user.ErrKeyDoesNotExists):
            return nil, connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
        default:
            // Log additional context for internal errors
            logger.Error("unexpected error in CreateUser",
                zap.String("operation", "CreateUser"),
                zap.String("request_id", requestID),
                zap.String("user_email", email),
                zap.String("user_name", name),
                zap.String("error_chain", fmt.Sprintf("%+v", err)),
                zap.Error(err))
            return nil, connect.NewError(connect.CodeInternal, ErrInternalServerError)
        }
    }

    transformedUser, err := transformUserToPB(newUser)
    if err != nil {
        logger.Error("user transformation to protobuf failed",
            zap.String("operation", "CreateUser"),
            zap.String("request_id", requestID),
            zap.String("user_id", newUser.ID),
            zap.String("error_type", fmt.Sprintf("%T", err)),
            zap.Error(err))
        return nil, connect.NewError(connect.CodeInternal, ErrInternalServerError)
    }

    // Log successful creation
    logger.Info("user created successfully",
        zap.String("operation", "CreateUser"),
        zap.String("request_id", requestID),
        zap.String("user_id", newUser.ID),
        zap.String("email", newUser.Email))

    audit.GetAuditor(ctx, schema.PlatformOrgID.String()).
        LogWithAttrs(audit.UserCreatedEvent, audit.UserTarget(newUser.ID), map[string]string{
            "email":  newUser.Email,
            "name":   newUser.Name,
            "title":  newUser.Title,
            "avatar": newUser.Avatar,
        })

    return connect.NewResponse(&frontierv1beta1.CreateUserResponse{User: transformedUser}), nil
}

3. Database Query Performance Logging

Problem: No visibility into slow queries or database performance issues.

Files to modify:

  • All repository files in internal/store/postgres/
// Enhanced database operation logging
func (r UserRepository) GetByID(ctx context.Context, id string) (user.User, error) {
    logger := grpczap.Extract(ctx)
    startTime := time.Now()

    if strings.TrimSpace(id) == "" {
        return user.User{}, user.ErrInvalidID
    }

    var fetchedUser User
    userQuery, params, err := dialect.From(TABLE_USERS).
        Where(goqu.Ex{
            "id": id,
        }).Where(notDisabledUserExp).ToSQL()
    if err != nil {
        return user.User{}, fmt.Errorf("%w: %w", queryErr, err)
    }

    if err = r.dbc.WithTimeout(ctx, TABLE_USERS, "GetByID", func(ctx context.Context) error {
        return r.dbc.GetContext(ctx, &fetchedUser, userQuery, params...)
    }); err != nil {
        duration := time.Since(startTime)
        logger.Error("database operation failed",
            zap.String("operation", "user.GetByID"),
            zap.String("table", TABLE_USERS),
            zap.String("user_id", id),
            zap.Duration("query_duration", duration),
            zap.String("query", userQuery),
            zap.Any("query_params", params),
            zap.String("error_type", fmt.Sprintf("%T", err)),
            zap.Error(err))

        // ... existing error handling
    } else {
        duration := time.Since(startTime)

        // Log slow queries for performance monitoring
        if duration > 100*time.Millisecond {
            logger.Warn("slow database query detected",
                zap.String("operation", "user.GetByID"),
                zap.String("table", TABLE_USERS),
                zap.String("user_id", id),
                zap.Duration("query_duration", duration),
                zap.String("query", userQuery),
                zap.Any("query_params", params))
        }

        // Debug logging for all queries in development
        logger.Debug("database query completed",
            zap.String("operation", "user.GetByID"),
            zap.String("table", TABLE_USERS),
            zap.String("user_id", id),
            zap.Duration("query_duration", duration))
    }

    transformedUser, err := fetchedUser.transformToUser()
    if err != nil {
        logger.Error("user transformation failed",
            zap.String("operation", "user.GetByID"),
            zap.String("user_id", id),
            zap.Error(err))
        return user.User{}, fmt.Errorf("%w: %w", parseErr, err)
    }

    return transformedUser, nil
}

// Enhanced Create method with transaction logging
func (r UserRepository) Create(ctx context.Context, usr user.User) (user.User, error) {
    logger := grpczap.Extract(ctx)
    startTime := time.Now()

    if strings.TrimSpace(usr.Email) == "" || strings.TrimSpace(usr.Name) == "" {
        return user.User{}, user.ErrInvalidDetails
    }

    insertRow := goqu.Record{
        "name":       strings.ToLower(usr.Name),
        "email":      strings.ToLower(usr.Email),
        "title":      usr.Title,
        "avatar":     usr.Avatar,
        "created_at": goqu.L("now()"),
        "updated_at": goqu.L("now()"),
    }

    if usr.Metadata != nil {
        marshaledMetadata, err := json.Marshal(usr.Metadata)
        if err != nil {
            logger.Error("metadata marshaling failed",
                zap.String("operation", "user.Create"),
                zap.String("email", usr.Email),
                zap.Error(err))
            return user.User{}, fmt.Errorf("%w: %w", parseErr, err)
        }
        insertRow["metadata"] = marshaledMetadata
    }

    if usr.State != "" {
        insertRow["state"] = usr.State
    }

    createQuery, params, err := dialect.Insert(TABLE_USERS).Rows(insertRow).Returning(&User{}).ToSQL()
    if err != nil {
        return user.User{}, fmt.Errorf("%w: %w", queryErr, err)
    }

    tx, err := r.dbc.BeginTxx(ctx, nil)
    if err != nil {
        logger.Error("transaction begin failed",
            zap.String("operation", "user.Create"),
            zap.String("email", usr.Email),
            zap.Error(err))
        return user.User{}, err
    }

    var userModel User
    if err = r.dbc.WithTimeout(ctx, TABLE_USERS, "Create", func(ctx context.Context) error {
        return tx.QueryRowxContext(ctx, createQuery, params...).StructScan(&userModel)
    }); err != nil {
        duration := time.Since(startTime)

        logger.Error("user creation query failed",
            zap.String("operation", "user.Create"),
            zap.String("table", TABLE_USERS),
            zap.String("email", usr.Email),
            zap.String("name", usr.Name),
            zap.Duration("query_duration", duration),
            zap.String("query", createQuery),
            zap.Any("query_params", params),
            zap.String("error_type", fmt.Sprintf("%T", err)),
            zap.Error(err))

        if rollbackErr := tx.Rollback(); rollbackErr != nil {
            logger.Error("transaction rollback failed",
                zap.String("operation", "user.Create"),
                zap.String("email", usr.Email),
                zap.Error(rollbackErr))
        }

        err = checkPostgresError(err)
        switch {
        case errors.Is(err, ErrDuplicateKey):
            return user.User{}, user.ErrConflict
        default:
            return user.User{}, err
        }
    }

    if err = tx.Commit(); err != nil {
        duration := time.Since(startTime)
        logger.Error("transaction commit failed",
            zap.String("operation", "user.Create"),
            zap.String("email", usr.Email),
            zap.Duration("total_duration", duration),
            zap.Error(err))
        return user.User{}, err
    }

    duration := time.Since(startTime)
    logger.Info("user created successfully",
        zap.String("operation", "user.Create"),
        zap.String("user_id", userModel.ID),
        zap.String("email", usr.Email),
        zap.Duration("total_duration", duration))

    transformedUser, err := userModel.transformToUser()
    if err != nil {
        logger.Error("user transformation failed after creation",
            zap.String("operation", "user.Create"),
            zap.String("user_id", userModel.ID),
            zap.Error(err))
        return user.User{}, fmt.Errorf("%w: %w", parseErr, err)
    }

    return transformedUser, nil
}

4. Enhanced Request Correlation

Problem: Hard to trace a request's journey through all layers.

Files to modify:

  • pkg/server/connect_interceptors/logger.go
// Enhanced Connect interceptor logging
func UnaryConnectLoggerInterceptor(logger *zap.Logger, opts *LoggerOptions) connect.UnaryInterceptorFunc {
    if opts == nil {
        opts = NewLoggerOptions()
    }

    return func(next connect.UnaryFunc) connect.UnaryFunc {
        return func(ctx context.Context, req connect.AnyRequest) (connect.AnyResponse, error) {
            if !opts.decider(req.Spec().Procedure) {
                return next(ctx, req)
            }

            // Embed enhanced logger in context with request metadata
            requestID := req.Header().Get(consts.RequestIDHeader)
            enrichedLogger := logger.With(
                zap.String("request_id", requestID),
                zap.String("method", req.Spec().Procedure))
            ctx = context.WithValue(ctx, loggerContextKey, enrichedLogger)

            startTime := time.Now()

            // Log request start
            enrichedLogger.Debug("request started",
                zap.String("user_agent", req.Header().Get("User-Agent")),
                zap.String("remote_addr", req.Header().Get("X-Forwarded-For")),
                zap.String("content_type", req.Header().Get("Content-Type")))

            resp, err := next(ctx, req)
            duration := time.Since(startTime)

            code := connect.Code(0)
            if connectErr, ok := err.(*connect.Error); ok {
                code = connectErr.Code()
            }

            fields := []zap.Field{
                zap.String("system", "connect_rpc"),
                zap.Time("start_time", startTime),
                zap.String("method", req.Spec().Procedure),
                zap.Int64("time_ms", duration.Milliseconds()),
                zap.String("code", code.String()),
                zap.String("request_id", requestID),
                zap.String("user_agent", req.Header().Get("User-Agent")),
                zap.String("remote_addr", req.Header().Get("X-Forwarded-For")),
            }

            // Add response size if available
            if resp != nil {
                if size := resp.Header().Get("Content-Length"); size != "" {
                    fields = append(fields, zap.String("response_size", size))
                }
            }

            if err != nil {
                fields = append(fields,
                    zap.String("error_type", fmt.Sprintf("%T", err)),
                    zap.Bool("is_user_error", isUserError(code)),
                    zap.Error(err))
            }

            // Log completion with appropriate level
            if err == nil {
                enrichedLogger.Info("request completed", fields...)
                return resp, err
            }

            switch code {
            case connect.CodeCanceled:
                enrichedLogger.Warn("client cancelled request", fields...)
            case connect.CodeDeadlineExceeded:
                enrichedLogger.Warn("request timeout", fields...)
            case connect.CodeInvalidArgument,
                connect.CodeNotFound,
                connect.CodeAlreadyExists,
                connect.CodeUnauthenticated,
                connect.CodePermissionDenied,
                connect.CodeFailedPrecondition,
                connect.CodeOutOfRange:
                enrichedLogger.Warn("client error", fields...)
            default:
                enrichedLogger.Error("server error", fields...)
            }
            return resp, err
        }
    }
}

func isUserError(code connect.Code) bool {
    return code == connect.CodeInvalidArgument ||
           code == connect.CodeNotFound ||
           code == connect.CodeAlreadyExists ||
           code == connect.CodeUnauthenticated ||
           code == connect.CodePermissionDenied ||
           code == connect.CodeFailedPrecondition ||
           code == connect.CodeOutOfRange
}

5. Centralized Error Classification Utility

Problem: Error mapping is scattered across handlers with inconsistent logging.

New file: internal/api/v1beta1connect/error_handler.go

package v1beta1connect

import (
    "context"
    "fmt"

    "connectrpc.com/connect"
    "github.com/grpc-ecosystem/go-grpc-middleware/logging/zap/ctxzap"
    "github.com/pkg/errors"
    "go.uber.org/zap"

    "github.com/raystack/frontier/core/user"
    "github.com/raystack/frontier/core/organization"
    "github.com/raystack/frontier/core/project"
    // ... other domain imports
)

type ErrorHandler struct{}

func NewErrorHandler() *ErrorHandler {
    return &ErrorHandler{}
}

func (e *ErrorHandler) HandleServiceError(ctx context.Context, operation string, err error, contextFields ...zap.Field) error {
    logger := ctxzap.Extract(ctx)

    baseFields := []zap.Field{
        zap.String("operation", operation),
        zap.String("error_type", fmt.Sprintf("%T", err)),
        zap.String("error_chain", fmt.Sprintf("%+v", err)),
        zap.Error(err),
    }
    baseFields = append(baseFields, contextFields...)

    // User domain errors
    switch {
    case errors.Is(err, user.ErrNotExist):
        logger.Warn("user not found", baseFields...)
        return connect.NewError(connect.CodeNotFound, ErrUserNotExist)
    case errors.Is(err, user.ErrConflict):
        logger.Warn("user conflict", baseFields...)
        return connect.NewError(connect.CodeAlreadyExists, ErrConflictRequest)
    case errors.Is(err, user.ErrInvalidDetails):
        logger.Warn("invalid user details", baseFields...)
        return connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
    case errors.Is(err, user.ErrInvalidID):
        logger.Warn("invalid user ID", baseFields...)
        return connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)
    case errors.Is(err, user.ErrInvalidUUID):
        logger.Warn("invalid user UUID", baseFields...)
        return connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)

    // Organization domain errors
    case errors.Is(err, organization.ErrNotExist):
        logger.Warn("organization not found", baseFields...)
        return connect.NewError(connect.CodeNotFound, ErrNotFound)
    case errors.Is(err, organization.ErrConflict):
        logger.Warn("organization conflict", baseFields...)
        return connect.NewError(connect.CodeAlreadyExists, ErrConflictRequest)
    case errors.Is(err, organization.ErrDisabled):
        logger.Warn("organization disabled", baseFields...)
        return connect.NewError(connect.CodePermissionDenied, ErrOrganizationDisabled)

    // Project domain errors
    case errors.Is(err, project.ErrNotExist):
        logger.Warn("project not found", baseFields...)
        return connect.NewError(connect.CodeNotFound, ErrNotFound)
    case errors.Is(err, project.ErrConflict):
        logger.Warn("project conflict", baseFields...)
        return connect.NewError(connect.CodeAlreadyExists, ErrConflictRequest)

    // Generic validation errors
    case isValidationError(err):
        logger.Warn("validation error", baseFields...)
        return connect.NewError(connect.CodeInvalidArgument, ErrBadRequest)

    // Authentication/Authorization errors
    case isAuthError(err):
        logger.Warn("authentication/authorization error", baseFields...)
        return connect.NewError(connect.CodeUnauthenticated, ErrUnauthenticated)

    default:
        // Log all unhandled errors as server errors
        logger.Error("unhandled service error", baseFields...)
        return connect.NewError(connect.CodeInternal, ErrInternalServerError)
    }
}

func isValidationError(err error) bool {
    // Add checks for validation-related errors
    return false // Implement based on your validation error types
}

func isAuthError(err error) bool {
    // Add checks for authentication-related errors
    return false // Implement based on your auth error types
}

// Usage example in handlers
func (h *ConnectHandler) GetUser(ctx context.Context, request *connect.Request[frontierv1beta1.GetUserRequest]) (*connect.Response[frontierv1beta1.GetUserResponse], error) {
    errorHandler := NewErrorHandler()

    fetchedUser, err := h.userService.GetByID(ctx, request.Msg.GetId())
    if err != nil {
        return nil, errorHandler.HandleServiceError(ctx, "GetUser", err,
            zap.String("user_id", request.Msg.GetId()),
            zap.String("request_id", request.Header().Get(consts.RequestIDHeader)))
    }

    userPB, err := transformUserToPB(fetchedUser)
    if err != nil {
        return nil, errorHandler.HandleServiceError(ctx, "GetUser.Transform", err,
            zap.String("user_id", fetchedUser.ID))
    }

    return connect.NewResponse(&frontierv1beta1.GetUserResponse{
        User: userPB,
    }), nil
}

6. Metric-Based Error Monitoring

Problem: No automated alerting on error rate spikes.

Files to modify:

  • pkg/server/connect_interceptors/logger.go
import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

// Add metrics to the interceptor
var (
    requestCount = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "frontier_requests_total",
            Help: "Total number of requests processed",
        },
        []string{"method", "code"},
    )

    errorCount = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "frontier_errors_total",
            Help: "Total number of errors by type and layer",
        },
        []string{"method", "error_type", "layer", "code"},
    )

    requestDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "frontier_request_duration_seconds",
            Help:    "Request duration in seconds",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method", "code"},
    )

    slowQueryCount = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "frontier_slow_queries_total",
            Help: "Total number of slow database queries",
        },
        []string{"table", "operation"},
    )
)

func UnaryConnectLoggerInterceptor(logger *zap.Logger, opts *LoggerOptions) connect.UnaryInterceptorFunc {
    if opts == nil {
        opts = NewLoggerOptions()
    }

    return func(next connect.UnaryFunc) connect.UnaryFunc {
        return func(ctx context.Context, req connect.AnyRequest) (connect.AnyResponse, error) {
            if !opts.decider(req.Spec().Procedure) {
                return next(ctx, req)
            }

            startTime := time.Now()
            resp, err := next(ctx, req)
            duration := time.Since(startTime)

            code := connect.Code(0)
            if connectErr, ok := err.(*connect.Error); ok {
                code = connectErr.Code()
            }

            // Record metrics
            method := req.Spec().Procedure
            codeStr := code.String()

            requestCount.WithLabelValues(method, codeStr).Inc()
            requestDuration.WithLabelValues(method, codeStr).Observe(duration.Seconds())

            if err != nil {
                errorCount.WithLabelValues(
                    method,
                    fmt.Sprintf("%T", err),
                    "handler",
                    codeStr).Inc()
            }

            // ... existing logging code ...

            return resp, err
        }
    }
}

// Add to repository methods for database metrics
func (r UserRepository) recordSlowQuery(table, operation string, duration time.Duration) {
    if duration > 100*time.Millisecond {
        slowQueryCount.WithLabelValues(table, operation).Inc()
    }
}

Implementation Priority

High Priority (Immediate Impact) 🔴

  1. Enhanced handler error logging - Provides immediate debugging visibility
  2. Database operation logging with performance metrics
  3. Service layer contextual logging - Fills the current logging gap
  4. Request correlation enhancements - Improves traceability

Medium Priority (Next Phase) 🟡

  1. Centralized error classification utility - Reduces code duplication
  2. Error rate metrics - Enables monitoring and alerting
  3. Enhanced error context propagation - Improves error investigation

Low Priority (Future Improvements) 🟢

  1. Advanced query analysis and optimization logging
  2. Automated error alerting integration
  3. Error trend analysis and reporting

Example Debug Scenario

With these improvements implemented, when a user creation fails, developers will see complete tracing:

Before (Current State)

{
  "level": "error",
  "time": "2024-01-15T10:30:45Z",
  "msg": "finished call",
  "method": "/raystack.frontier.v1beta1.FrontierService/CreateUser",
  "code": "internal",
  "error": "Internal Server Error"
}

After (With Improvements)

{
  "level": "error",
  "time": "2024-01-15T10:30:45Z",
  "msg": "database query failed",
  "system": "connect_rpc",
  "request_id": "req_abc123",
  "method": "/raystack.frontier.v1beta1.FrontierService/CreateUser",
  "operation": "user.Create",
  "table": "users",
  "email": "[email protected]",
  "query": "INSERT INTO users (name, email, title, avatar, created_at, updated_at) VALUES ($1, $2, $3, $4, now(), now()) RETURNING *",
  "query_params": ["testuser", "[email protected]", "", "", "now()", "now()"],
  "query_duration": "2ms",
  "error_type": "*pq.Error",
  "error_chain": "service.Create failed for user [email protected]: database error in user.Create: pq: duplicate key value violates unique constraint \"users_email_key\"",
  "error": "duplicate key value violates unique constraint \"users_email_key\""
}

{
  "level": "error",
  "time": "2024-01-15T10:30:45Z",
  "msg": "user service create failed",
  "operation": "CreateUser",
  "request_id": "req_abc123",
  "email": "[email protected]",
  "name": "testuser",
  "error_type": "*fmt.wrapError",
  "error": "service.Create failed for user [email protected]: database error in user.Create: pq: duplicate key value violates unique constraint \"users_email_key\""
}

This provides complete context for debugging:

  • Original request details (request ID, email, name)
  • Failing operation (user.Create)
  • Actual SQL query and parameters
  • Root database error (constraint violation)
  • Error propagation chain through all layers
  • Performance metrics (query duration)

Benefits

  1. Faster Issue Resolution - Developers can immediately identify root causes
  2. Better Production Monitoring - Proactive detection of issues before they escalate
  3. Improved Code Quality - Visibility into performance bottlenecks
  4. Enhanced Debugging - Complete request tracing through all DDD layers
  5. Better Error Classification - Distinguishes between user errors and system errors
  6. Operational Insights - Database performance and system health visibility

Next Steps

  1. Start with High Priority items - Focus on immediate debugging improvements
  2. Implement incrementally - Roll out layer by layer to avoid disruption
  3. Test thoroughly - Ensure logging doesn't impact performance significantly
  4. Set up monitoring - Configure alerts based on new error metrics
  5. Document patterns - Create guidelines for consistent logging across teams
  6. Review and iterate - Gather feedback from developers and refine approach

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions