Skip to content

feat(log)!: intoduce log v2 with OpenTelemetry and slog#25701

Closed
technicallyty wants to merge 137 commits intomainfrom
technicallyty/otel-logging
Closed

feat(log)!: intoduce log v2 with OpenTelemetry and slog#25701
technicallyty wants to merge 137 commits intomainfrom
technicallyty/otel-logging

Conversation

@technicallyty
Copy link
Contributor

@technicallyty technicallyty commented Dec 16, 2025

Description

Closes: SDK-430

refactors the log package to add support for OpenTelemetry trace correlation. notable changes:

  • extended logging interface to add context methods - allows trace correlation.
  • logging constructor function signature changed
  • logging options all changed to be more standard i.e. log.With<Option> rather than log.SomeOption
  • if no opentelemetry configuration is set, we simply use zerolog as we did before.
  • if an OpenTelemetry configuration is set, we use slog, and all logs are output to console and forwarded to the configured logger provider (otel.yml).
  • New flag introduced, --log_no_console, to allow logs to ONLY go to otel, not to console. (performance related)
  • reduced log levels to align with slog. some zerolog levels won't be supported now

@github-actions github-actions bot removed the C:Store label Jan 6, 2026
@technicallyty technicallyty marked this pull request as ready for review January 6, 2026 19:46
@swift1337 swift1337 self-requested a review January 6, 2026 19:56
log/CHANGELOG.md Outdated

## [Unreleased]

* [#25701](https://github.com/cosmos/cosmos-sdk/pull/25701) Introduce log v2, enabling OpenTelemetry logging with slog. The logging interface has been updated to accommodate Context logging methods, which allows correlation of logs with traces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we link to the upgrade guide for this?

log/logger.go Outdated
opt(&logCfg)
// newZerologLogger creates a Logger backed by zerolog directly.
// This is the fast path with zero allocations.
func newZerologLogger(dst io.Writer, cfg *Config) Logger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we filled out the default config and overrode it. Why are we making this change and should cfg be nullable (a pointer)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still do that, it's just pulled up to the top level NewLogger now. ill remove the pointers

}
}

func TestVerboseMode(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we remove this test accidentally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, adding back in

// Impl returns the underlying zerolog logger.
// It can be used to use zerolog structured API directly instead of the wrapper.
func (l zeroLogWrapper) Impl() interface{} {
func (l *zeroLogWrapper) Impl() interface{} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return the concrete type here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would break the interface

// nil = check LoggerProvider.
// true = force enable
// false = force disable
EnableOTEL *bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wanted to have presence checking here. if set at all, we know that this was explicitly set with the WithOTEL or WIthoutOTEL options. if unset, we can make a decision based on if they had a provider set in their otel configuration.

on second thought though, perhaps we just enable it if they have otel set at all. not actually sure its useful to have this expression here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a boolfalse by default. no need for a pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above comment

// this test ensures that when the With and WithContext methods are called,
// that the log wrapper is properly copied with all of its associated options
// otherwise, verbose mode will fail
func TestLoggerWith(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this test would be fine to still include?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added back, but removed the wrapper3 case because the return type changed, making that test case invalid.

ctx := CreateExecuteContext(context.Background())

rootCmd.PersistentFlags().String(flags.FlagLogLevel, zerolog.InfoLevel.String(), "The logging level (trace|debug|info|warn|error|fatal|panic|disabled or '*:<level>,<key>:<level>')")
rootCmd.PersistentFlags().String(flags.FlagLogLevel, "info", "The logging level (debug|info|warn|error|disabled or '*:<level>,<key>:<level>')")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not use the library values from slog etc or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah added slog.InfoLevel.String() here instead of "info"

server/util.go Outdated
viper.New(),
cmtcfg.DefaultConfig(),
log.NewLogger(os.Stdout),
log.NewLogger("cosmos-sdk", log.WithConsoleWriter(os.Stdout)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we define a constant for this name? or a default somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it to a const block

@@ -1,373 +0,0 @@
//go:build !app_v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this need to be removed (not opposed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. meant to add back after testing.

Copy link
Member

@swift1337 swift1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see OTEL in logging, left some comments related to the design

FlagVerboseLogLevel = "verbose_log_level"
FlagLogFormat = "log_format"
FlagLogNoColor = "log_no_color"
FlagLogNoConsole = "log_no_console"
Copy link
Member

@swift1337 swift1337 Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this refactoring is a good opportunity to fix flags' naming:

  • --log-level
  • --log-level-verbose (what's that?)
  • --log-format
  • --log-no-color
  • --log-disable-stdout → is a more suitable name IMO

Copy link
Contributor Author

@technicallyty technicallyty Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to reduce API breakage here, leaning towards keeping the flags the same

as for --log-disable-stdout, kinda lean against that as well. we can log to stderr as well, and using a flag with that name could imply stderr logs will display, when the flag currently disables all console output entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to break a couple of flags since this PR already introduces breaking changes. Especially, since current log flags have this weird -- + _ casing.

we can log to stderr as well

good call, maybe then --log-disable-console ?

log/logger.go Outdated

// InfoContext takes a context, message and key/value pairs and logs with level INFO.
// The context is used for trace/span correlation when using OpenTelemetry.
InfoContext(ctx context.Context, msg string, keyVals ...any)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this API a bit too excessive. All of this can be expressed with:

// Logger is the Cosmos SDK logger interface.
type Logger interface {
	// Info takes a message and a set of key/value pairs and logs with level INFO.
	// The key of the tuple must be a string.
	Info(msg string, keyVals ...any)

	// Warn takes a message and a set of key/value pairs and logs with level WARN.
	// The key of the tuple must be a string.
	Warn(msg string, keyVals ...any)

	// Error takes a message and a set of key/value pairs and logs with level ERR.
	// The key of the tuple must be a string.
	Error(msg string, keyVals ...any)

	// Debug takes a message and a set of key/value pairs and logs with level DEBUG.
	// The key of the tuple must be a string.
	Debug(msg string, keyVals ...any)

  // Ctx returns logger with context attached. Usable for propagating OTEL traces.
	Ctx(ctx context.Context) Logger

	// With returns a new wrapped logger with additional context provided by a set.
	With(keyVals ...any) Logger

	// Impl returns the underlying logger implementation.
	// It is used to access the full functionalities of the underlying logger.
	// Advanced users can type cast the returned value to the actual logger.
	Impl() any
}

Example:

app.logger.InfoContext(ctx, "InitChain", "initialHeight", req.InitialHeight, "chainID", req.ChainId)

becomes →

app.logger.Ctx(ctx).Info("InitChain", "initialHeight", req.InitialHeight, "chainID", req.ChainId)

this also allows to keep most of the code that doesn't need OTEL. Another benefit is that now we can reuse ctx logger in larger functions:

func (k Keeper) LargeMigration(...) error {
  ctx, span := tracer.Start(...)
  logger := app.logger.Ctx(ctx)

  // 20 LoC ...

  logger.Info("abc", "k", "v")

  // another 50 LoC

  logger.Info("xyz", "k2", "v2")
}

logger.Ctx() can be a noop for zerolog. for slog we can have:

type slogLogger struct {
+ ctx context.Context	
  log *slog.Logger
}

and call Info() / InfoContext() based on ctx == nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rationale for all the InfoContext, etc. functions is basically to match the slog API. But your Ctx method approach also makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is much better - updated here remove contextual methods, add context attachment

// nil = check LoggerProvider.
// true = force enable
// false = force disable
EnableOTEL *bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a boolfalse by default. no need for a pointer

The `cosmossdk.io/log` package provides a structured logging implementation for the Cosmos SDK using [zerolog](https://github.com/rs/zerolog) with optional OpenTelemetry integration.

To use a logger wrapping an instance of the standard library's `log/slog` package, use `cosmossdk.io/log/slog`.
## Features
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we miss two more features:

  • Ability to set and get a logger from ctx (zerolog example: ctx = zerolog.WithContext(ctx), zerolog.Ctx(ctx))
  • Ability to retrieve a global logger for fallback ie use log.Info(...) (where log is the package name)

This would result in a "feature parity" with other logging libs

Copy link
Contributor Author

@technicallyty technicallyty Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the global thing is interesting, i think id prefer to kick that can down the road though. esp since logging conventions in SDK are already pretty solidified imo. could be wrong though

Copy link
Member

@swift1337 swift1337 Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it's not urgent + easy to implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I responded in the issue. Basically, I think this proposed context approach follow zerolog conventions but is quite different than the otel approach. If we're adopting otel more generally, I'd prefer we lean towards otel conventions.

ctx context.Context
}

func (l slogLogger) Ctx(ctx context.Context) Logger {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's explicitly create a new struct

Suggested change
func (l slogLogger) Ctx(ctx context.Context) Logger {
func (l *slogLogger) Ctx(ctx context.Context) Logger {
return &slogLogger{log: l.log, ctx: ctx}
}

return slogLogger{log: log}
}

func (l slogLogger) Info(msg string, keyVals ...any) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's skip slogLogger allocation for each call

Suggested change
func (l slogLogger) Info(msg string, keyVals ...any) {
func (l *slogLogger) Info(msg string, keyVals ...any) {

//
// // OTEL-only (no console output)
// logger := log.NewLogger("cosmos-sdk", log.WithOTEL(), log.WithoutConsole())
func NewLogger(name string, opts ...Option) Logger {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of a simpler and more idiomatic naming?

Suggested change
func NewLogger(name string, opts ...Option) Logger {
func New(name string, opts ...Option) Logger {


// slogLogger satisfies Logger with logging backed by an instance of *slog.Logger.
type slogLogger struct {
log *slog.Logger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right. We can't wrap a *slog.Logger because then file and line information will be all wrong. We can only wrap a *slog.Handler but then we need to basically reimplement the logger functionality of checking enabled and capturing the program counter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C:CLI C:Cosmovisor Issues and PR related to Cosmovisor C:log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants