Skip to content

Conversation

@shsms
Copy link
Contributor

@shsms shsms commented Feb 13, 2025

No description provided.

@shsms shsms requested a review from a team as a code owner February 13, 2025 09:40
@shsms shsms requested review from llucax and removed request for a team February 13, 2025 09:40
@github-actions github-actions bot added the part:core Affects the SDK core components (data structures, etc.) label Feb 13, 2025
@shsms shsms self-assigned this Feb 13, 2025
@shsms shsms added the cmd:skip-release-notes It is not necessary to update release notes for this PR label Feb 13, 2025
@shsms
Copy link
Contributor Author

shsms commented Feb 13, 2025

These stack traces only add noise, and when there are many components, there is a lot of noise. I have never looked into these for debugging, and I don't think others have either.

Also, if we need them, all debug logs (or at least startup debug logs) should have them, but I'm not sure we need them because we can follow the stack trace by just following the code.

@llucax
Copy link
Contributor

llucax commented Feb 13, 2025

They are useful when debugging issues with the channel registry. Luckily this doesn't happen often, but when it does, this trace could be fundamental to figure out what's going on.

I do agree this is not needed 99.9% of the time and it is very noisy. What about printing them only if an environment variable is present? Maybe we can have something FREQUENZ_SDK_DEBUG=channel_registry,some_other_thing.

Another alternative would be to use a specific logger for this and disable it by default:

_logger_chan_create = logging.getLogger(__name__ + ":channel_create")
if _logger_chan_create.level == NOT_SET:
    _logger_chan_create.level = INFO

I'm undecided, both have pros and cons.

@llucax
Copy link
Contributor

llucax commented Feb 13, 2025

Also, if we need them, all debug logs (or at least startup debug logs) should have them, but I'm not sure we need them because we can follow the stack trace by just following the code.

The problem is usually you see the issue when trying to retrieve a channel from the registry, and if it wasn't found (but you expected it to be found), or the other way around, a new one is created and you don't know who and where it was created or not before. You could use a debugger and put breaking points in every place where a channels is retrieved (or created), but that's a lot of work and easy to miss points.

@llucax
Copy link
Contributor

llucax commented Feb 13, 2025

I'm undecided, both have pros and cons.

On the other hand, it seems like env vars are a much more common practice, almost all programming languages runtimes and libraries use this method for enabling debug info, even Python itself.

@shsms
Copy link
Contributor Author

shsms commented Feb 13, 2025

env var sounds like a good idea, and it would give us more flexibility. Maybe we can even generalize it, by making a filter or similar for the logs. Maybe FREQUENZ_SDK_LOGGING=trace:channel_registry,other;debug=, or FREQUENZ_SDK_TRACE_LOGS=registry,other. Because calling it DEBUG makes it a bit ambiguous I think.

But I would make that a separate issue so that we can take this PR in now.

@llucax
Copy link
Contributor

llucax commented Feb 14, 2025

I wouldn't remove it without a replacement, unless you are going to address it immediately after this is merged, otherwise it will be forgotten and we won't have it when we need it.

I prefer to leave it in a hacky way (even FREQUENZ_TRACE_CHANNEL_CREATION=1 if we don't want to decide on a generic mechanism yet) than removing it without a short term plan to add it back.


About the env var name, for me debug makes sense because it is what's typically used, at least in Python. For example:

But also other C libraries, like LD_DEBUG.

This could be used to enable extra logging but also to maybe include more expensive sanity checks, etc. We could call it FREQUENZ_SDK_DEBUG=trace_channel_registry if you want to make it more explicit that this particular "feature" is enabling logging/tracing.

To play devil's (Sahas') advocate, gRPC uses "trace" for tracing: GRPC_TRACE.

I find FREQUENZ_SDK_LOGGING super confusing because we already have a completely separate way to configure logging, but I guess if you prefer something like FREQUENZ_SDK_TRACE=channel_registry, I'm also fine with that. I think calling it FREQUENZ_SDK_DEBUG still is more general and allow us to add other debugging features besides tracing in the future.

@shsms
Copy link
Contributor Author

shsms commented Feb 14, 2025

I called it trace because it was about printing stack traces. But we could discuss this during the next meeting and try to come up with a comprehensive design. If you can walk me through the existing logging config mechanism (I haven't been following that), maybe we can find a way to integrate this in there as well.

@shsms shsms marked this pull request as draft February 14, 2025 09:27
@llucax
Copy link
Contributor

llucax commented Feb 14, 2025

It is basically using Python standard logging facility. It is a complex subsystem to summarize in a few lines, but the main point is it is hierarchical and you can inherit config. So this approach I mentioned before would use that:

_logger_chan_create = logging.getLogger(__name__ + ":channel_create")
if _logger_chan_create.level == NOT_SET:
    _logger_chan_create.level = INFO

Then users can configure logging at runtime, let's say __name__ = "x" to make logger names short here. You can then configure logger x:channel_create to have level INFO by default (what I did there) and log messages with DEBUG, so they won't be shown. Then users can re-configure the logger at runtime (using for example logging.getLogger(x:channel_create").setLever = DEBUG) to enable this tracing.

There is even a whole "standard" configuration that can be done via logging.config from a config dict (that can be parsed from a config file, like YAML, TOML, JSON, etc. Anything that returns a dict) and create formatters, handlers, filters as defined there. For example:

[loggers."x:channel_create"]
level = "DEBUG"

Could be used to enable this tracing.

But this assumes a lot of work from the users side, so for these kind of "debugging" features, I think using an env var makes more sense. Also hardcoding default log levels is not a very common practice.

Another solution would be to not change the default level, and let users set the log level for this particular tracing logging to INFO themselves, if they even enable DEBUG logging globally or for any parent logger, but again, that seems to require a lot of knowledge and configuration compared to a simple env var.

@shsms
Copy link
Contributor Author

shsms commented Feb 14, 2025

It is a complex subsystem to summarize in a few lines

I was thinking in the call, but this is good as well. :-)

@shsms shsms closed this Mar 21, 2025
@github-project-automation github-project-automation bot moved this from To do to Done in Python SDK Roadmap Mar 21, 2025
@shsms shsms deleted the registry-backtraces branch May 20, 2025 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cmd:skip-release-notes It is not necessary to update release notes for this PR part:core Affects the SDK core components (data structures, etc.)

Projects

Development

Successfully merging this pull request may close these issues.

2 participants