-
Notifications
You must be signed in to change notification settings - Fork 24
feat: Add support for logging #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@kylebarron A few comments. It's best practice in cloud-native systems to log to
Another common anti-pattern in logging is not structuring log statements in a way that is easily interpretable by machines. The assumption here is that the volume of logs in cloud-native systems quickly gets to the point where machines are mainly responsible for understanding and interpreting them, not humans. The logs created here are well structured for the most part, but a few NITs:
Some pieces of the log are well structured as That being said, JSON is the recommended format for structured logging, something like below. The log format is less important than having a consistent log structure within a log format, so if {
"level": "TRACE",
"timestamp": "2025-08-27T17:45:01.311335Z",
"message": "checkout waiting for idle connection: ('https', s3.us-east-1.amazonaws.com)",
"otel": {
"name": "S3Store.put_opts"
},
"object_store": {
"location": "test.txt",
"content_length": 13
}
} Ideally we just expose the log level to python, and the rust code emits logs from there. But I'm not totally sure if that is possible given I'm not familiar with binding rust into python. |
{
"timestamp": "2025-08-28T20:57:08.773915Z",
"level": "TRACE",
"fields": {
"message": "checkout waiting for idle connection: (\"http\", 127.0.0.1:57907)"
},
"target": "hyper_util::client::legacy::pool",
"span": {
"object_store.prefix": "",
"otel.name": "S3Store.list",
"name": "list"
},
"spans": [
{ "object_store.prefix": "", "otel.name": "S3Store.list", "name": "list" }
]
} and all logs in the file look like: {"timestamp":"2025-08-28T20:57:08.773915Z","level":"TRACE","fields":{"message":"checkout waiting for idle connection: (\"http\", 127.0.0.1:57907)"},"target":"hyper_util::client::legacy::pool","span":{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"},"spans":[{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"}]}
{"timestamp":"2025-08-28T20:57:08.774080Z","level":"DEBUG","fields":{"message":"starting new connection: http://127.0.0.1:57907/","log.target":"reqwest::connect","log.module_path":"reqwest::connect","log.file":"/Users/kyle/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/reqwest-0.12.22/src/connect.rs","log.line":789},"target":"reqwest::connect","span":{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"},"spans":[{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"}]}
{"timestamp":"2025-08-28T20:57:08.774138Z","level":"TRACE","fields":{"message":"Http::connect; scheme=Some(\"http\"), host=Some(\"127.0.0.1\"), port=Some(Port(57907))"},"target":"hyper_util::client::legacy::connect::http","span":{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"},"spans":[{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"}]}
{"timestamp":"2025-08-28T20:57:08.774190Z","level":"DEBUG","fields":{"message":"connecting to 127.0.0.1:57907"},"target":"hyper_util::client::legacy::connect::http","span":{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"},"spans":[{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"}]}
{"timestamp":"2025-08-28T20:57:08.774399Z","level":"DEBUG","fields":{"message":"connected to 127.0.0.1:57907"},"target":"hyper_util::client::legacy::connect::http","span":{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"},"spans":[{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"}]}
{"timestamp":"2025-08-28T20:57:08.774475Z","level":"TRACE","fields":{"message":"http1 handshake complete, spawning background dispatcher task"},"target":"hyper_util::client::legacy::client","span":{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"},"spans":[{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"}]}
{"timestamp":"2025-08-28T20:57:08.774502Z","level":"TRACE","fields":{"message":"waiting for connection to be ready"},"target":"hyper_util::client::legacy::client","span":{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"},"spans":[{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"}]}
{"timestamp":"2025-08-28T20:57:08.774591Z","level":"TRACE","fields":{"message":"connection is ready"},"target":"hyper_util::client::legacy::client","span":{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"},"spans":[{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"}]}
{"timestamp":"2025-08-28T20:57:08.774653Z","level":"TRACE","fields":{"message":"checkout dropped for (\"http\", 127.0.0.1:57907)"},"target":"hyper_util::client::legacy::pool","span":{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"},"spans":[{"object_store.prefix":"","otel.name":"S3Store.list","name":"list"}]} |
Latest commits let you mix and match stdout/stderr/file, so you can choose different configurations for each output destination This config: stderr_config = {
"format": "json",
"show_ansi": False,
"show_target": False,
"show_level": False,
}
stdout_config = {
"format": "pretty",
"show_ansi": True,
"show_target": True,
"show_level": True,
}
init_log(stderr=stderr_config, stdout=stdout_config, level="trace") |
@geospatial-jeff Should I support file-based logging still? Or do you think I should only give an option of stderr and stdout? |
No just do stderr and stdout, let end user decide where to send the logs. |
I think integrating with https://www.structlog.org/en/stable/ would be great |
Any Python based logging will be relatively slow because it requires the multithreaded Rust code to acquire the GIL to emit a log. That's why right now this PR only supports stdout, stderr, and files. Those still need a small lock on the Rust side, but it should be much less overhead than acquiring the GIL. |
@kylebarron I may need this feature soon 🙈 let me know if I can be of any help |
@vincentsarago would you be interested in testing from this branch |
Change list
log_init
function.instrumented_object_store
from https://github.com/datafusion-contrib/datafusion-tracing/blob/main/instrumented-object-store/src/instrumented_object_store.rs. If we go ahead with this approach, I'll make a PR there upstream with a couple changes (allowingT: ObjectStore
instead ofArc<dyn ObjectStore>
and adding aninner(&self) -> &T
method).init_log
function for creating a globaltracing
subscriber.init_log
function can only be called once because it initiates into aLogger
class that holds theguard
and avoids dropping it? We don't necessarily need to instantiate the tracing subscriber into global scope.Logger
class, it sets one logger at a time to be the thread default. I think it's easiest for now to just have a global logger.enum
s of whether the user has requested logging or not). I think that the performance impact will be negligible when logging has not been opted-into viainit_log
.Downsides:
To test
Install with
(replace git rev if more commits have been pushed)
Then use any obstore functions. I think only S3/GCP/Azure work with logging, not local file access.
Closes #93