Add general logging implementation#500
Conversation
f431306 to
3834c21
Compare
RobotSail
left a comment
There was a problem hiding this comment.
Thanks for the PR! A few things need to be fixed, but overall looks good!
There was a problem hiding this comment.
Main points:
- Consider using logging module for file management (only use a custom Formatter for JSONL).
- Define and enforce the form of the input dicts ("a recursive dict of string values")
We will also need some unit tests to validate the new addition. Overall, this looks like a very good start. Kudos.
src/instructlab/training/logger.py
Outdated
|
|
||
| try: | ||
| # Third Party | ||
| import wandb |
There was a problem hiding this comment.
Should it be also declared via optional-dependencies group in pyproject.toml?
There was a problem hiding this comment.
I'm not sure what is best here, so will defer to the team. However, my reasons for not including it are:
- We will potentially have support for 3-4 different logging libraries and would then need an optional dependency for each.
- Each of these dependency groups would be one package each. It isn't necessarily easier or more logical to do
pip install instructlab-training[wandb]than it is to dopip install instructlab-training wandb - It should be relatively clear to the user that they need to install
wandbto use the WandbLogger and if not the error message should clarify that.
There was a problem hiding this comment.
I've seen packages with dozens of these one-entry dependencies. :) What it gives you is being able to request particular versions of libraries if needed.
There was a problem hiding this comment.
Either way is fine. For PyTorch you still have to install tensorboard separately even though it's technically part of the API. So I don't know if we need a one-off optional requirement just for wandb, but if there are other packages that we need to install to make it work then it could make sense.
src/instructlab/training/logger.py
Outdated
| """Create and initialize a logger of the specified type. | ||
|
|
||
| Args: | ||
| logger_type: Type of logger to create (must be one of ["file", "wandb", "tensorboard", "async"]) |
There was a problem hiding this comment.
Suggest to construct possible options programmatically to avoid drift. That said, you can probably enforce "one of" semantics with a type hint using enum constructed from allowed options.
| ) | ||
| parser.add_argument("--log_level", type=str, default="INFO") | ||
| parser.add_argument("--run_name", type=str, default=None) | ||
| parser.add_argument("--logger_type", type=str, default="async") |
There was a problem hiding this comment.
you can list allowed choices: https://docs.python.org/3/library/argparse.html#choices
There was a problem hiding this comment.
In practice it's better to avoid this though, I find choices to be super clunky and not as easy to work with.
There was a problem hiding this comment.
Can you please elaborate? As long as choices are calculated (from a enum or dict keys), one doesn't need to touch the argument at all.
There was a problem hiding this comment.
I personally agree w/ Ihar- if we maintained a mapping of available logger types in instructlab.training.logger we could generate the list of logging backends available dynamically. That's self-documenting and easier for a user to inspect via the --help message
There was a problem hiding this comment.
(Granted, that we even have argparse as an interface to the library is not optimal and we should probably get rid of it, switching to proper function arguments.)
RobotSail
left a comment
There was a problem hiding this comment.
Thank you for making the requested changes and checking functionality of TensorBoard + Wandb. LGTM!
|
One other scenario that may be useful is being able to use multiple log destinations at the same time. This would allow to collect same metrics in multiple formats and use preferred tools for different analysis. This is where integration with python logging module could be also helpful since it allows to define multiple destinations for the same messages through propagate. I'm thinking of enabling all these loggers in CI training runs and collecting all of the outputs as github artifacts. |
Yeah that's something I've been thinking about. It would also be very easy to just implement a "MultiLogger" class that just loops through its nested loggers. I will look into the logging module more and try to see if it would work well with wandb/tensorboard. I do think it is useful to have the "metric logger" separate from the regular run logging, so I would want to make sure it's possible to do that while using the logging module for regular logs. |
|
@fynnsu Yes please do that if you can. I would keep the implementation simple (re-use the existing code you've already written ) and just make it so it does the existing AsyncStructuredLogger as a default and then everything else can just be added on after-the-fact. This way we can still retain log data even when using TensorBoard or anything else. |
777a5eb to
908cb5e
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
| ) | ||
| parser.add_argument("--log_level", type=str, default="INFO") | ||
| parser.add_argument("--run_name", type=str, default=None) | ||
| parser.add_argument("--logger_type", type=str, default="async") |
There was a problem hiding this comment.
(Granted, that we even have argparse as an interface to the library is not optimal and we should probably get rid of it, switching to proper function arguments.)
| ) | ||
| ``` | ||
| """ | ||
| if not loggers: |
There was a problem hiding this comment.
Disagree on raising since it's a valid input. What it should probably mean - if doesn't already - is that all previously set loggers should be disabled.
booxter
left a comment
There was a problem hiding this comment.
Overall I think this is good to go. Some extra care with test env cleanup is advised (restore env vars; clean up loggers), and some usage document as requested by James (and me before, though perhaps I wasn't explicit as to what I ask for). Otherwise I'm ready to merge this in.
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
5f47d0a to
55a9275
Compare
|
@booxter @JamesKunstle I've added a |
booxter
left a comment
There was a problem hiding this comment.
Thank you for pulling this off and addressing all the comments (sometimes misleading!) I like how this functionality integrates with stdlib logging approach. We should strive to be pythonic.
| python src/instructlab/training/main_ds.py \ | ||
| ... \ | ||
| --run_name "my_run" \ | ||
| --logger_type "async,tensorboard,wandb" \ |
There was a problem hiding this comment.
love how easy it is to do multi-backend, or implement another backend. ❤️
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
JamesKunstle
left a comment
There was a problem hiding this comment.
awesome, very excited to have this!
Adds a general metric logger format with support for logging to json file, wandb, tensorboard, and the existing async logger.
User can select the
--logger_type:async: uses the existing AsyncStructuredLogger, this is the default option and default file names are the same for backwards compatibility reasonsfile: basic jsonl file logger (essentially same asasyncbut synchronous)tensorboard: Uses PyTorch'storch.utils.tensorboard.SummaryWriterto write logs in tensorboard format.wandb: logs to wandb. Currently untested (I don't have an account and need to set that up first)User can also specify
--run_nameas a string. Instances of{rank},{local_rank}, and{time}in the string will be replaced with their respective value.e.g.
{time}_rank{rank}->2025-04-25T17:26:01.477437_rank0