Add general logging implementation by fynnsu · Pull Request #500 · instructlab/training

fynnsu · 2025-04-25T21:28:05Z

Adds a general metric logger format with support for logging to json file, wandb, tensorboard, and the existing async logger.

User can select the --logger_type:

async: uses the existing AsyncStructuredLogger, this is the default option and default file names are the same for backwards compatibility reasons
file: basic jsonl file logger (essentially same as async but synchronous)
tensorboard: Uses PyTorch's torch.utils.tensorboard.SummaryWriter to write logs in tensorboard format.
wandb: logs to wandb. Currently untested (I don't have an account and need to set that up first)

User can also specify --run_name as a string. Instances of {rank}, {local_rank}, and {time} in the string will be replaced with their respective value.

e.g. {time}_rank{rank} -> 2025-04-25T17:26:01.477437_rank0

RobotSail

Thanks for the PR! A few things need to be fixed, but overall looks good!

src/instructlab/training/logger.py

src/instructlab/training/main_ds.py

booxter

Main points:

Consider using logging module for file management (only use a custom Formatter for JSONL).
Define and enforce the form of the input dicts ("a recursive dict of string values")

We will also need some unit tests to validate the new addition. Overall, this looks like a very good start. Kudos.

runs/2025-04-28T11:53:34.123994_rank0.jsonl

booxter · 2025-04-28T17:58:09Z

src/instructlab/training/logger.py

+
+try:
+    # Third Party
+    import wandb


Should it be also declared via optional-dependencies group in pyproject.toml?

I'm not sure what is best here, so will defer to the team. However, my reasons for not including it are:

We will potentially have support for 3-4 different logging libraries and would then need an optional dependency for each.

Each of these dependency groups would be one package each. It isn't necessarily easier or more logical to do pip install instructlab-training[wandb] than it is to do pip install instructlab-training wandb

It should be relatively clear to the user that they need to install wandb to use the WandbLogger and if not the error message should clarify that.

I've seen packages with dozens of these one-entry dependencies. :) What it gives you is being able to request particular versions of libraries if needed.

Either way is fine. For PyTorch you still have to install tensorboard separately even though it's technically part of the API. So I don't know if we need a one-off optional requirement just for wandb, but if there are other packages that we need to install to make it work then it could make sense.

src/instructlab/training/logger.py

booxter · 2025-04-28T19:58:12Z

src/instructlab/training/logger.py

+    """Create and initialize a logger of the specified type.
+
+    Args:
+        logger_type: Type of logger to create (must be one of ["file", "wandb", "tensorboard", "async"])


Suggest to construct possible options programmatically to avoid drift. That said, you can probably enforce "one of" semantics with a type hint using enum constructed from allowed options.

src/instructlab/training/logger.py

booxter · 2025-04-28T20:00:48Z

src/instructlab/training/main_ds.py

    )
    parser.add_argument("--log_level", type=str, default="INFO")
+    parser.add_argument("--run_name", type=str, default=None)
+    parser.add_argument("--logger_type", type=str, default="async")


you can list allowed choices: https://docs.python.org/3/library/argparse.html#choices

In practice it's better to avoid this though, I find choices to be super clunky and not as easy to work with.

Can you please elaborate? As long as choices are calculated (from a enum or dict keys), one doesn't need to touch the argument at all.

I personally agree w/ Ihar- if we maintained a mapping of available logger types in instructlab.training.logger we could generate the list of logging backends available dynamically. That's self-documenting and easier for a user to inspect via the --help message

(Granted, that we even have argparse as an interface to the library is not optimal and we should probably get rid of it, switching to proper function arguments.)

RobotSail

Thank you for making the requested changes and checking functionality of TensorBoard + Wandb. LGTM!

booxter · 2025-04-28T20:29:07Z

One other scenario that may be useful is being able to use multiple log destinations at the same time. This would allow to collect same metrics in multiple formats and use preferred tools for different analysis. This is where integration with python logging module could be also helpful since it allows to define multiple destinations for the same messages through propagate.

I'm thinking of enabling all these loggers in CI training runs and collecting all of the outputs as github artifacts.

fynnsu · 2025-04-28T21:21:09Z

One other scenario that may be useful is being able to use multiple log destinations at the same time. This would allow to collect same metrics in multiple formats and use preferred tools for different analysis. This is where integration with python logging module could be also helpful since it allows to define multiple destinations for the same messages through propagate.

I'm thinking of enabling all these loggers in CI training runs and collecting all of the outputs as github artifacts.

Yeah that's something I've been thinking about. It would also be very easy to just implement a "MultiLogger" class that just loops through its nested loggers. I will look into the logging module more and try to see if it would work well with wandb/tensorboard. I do think it is useful to have the "metric logger" separate from the regular run logging, so I would want to make sure it's possible to do that while using the logging module for regular logs.

RobotSail · 2025-04-28T21:29:23Z

@fynnsu Yes please do that if you can. I would keep the implementation simple (re-use the existing code you've already written ) and just make it so it does the existing AsyncStructuredLogger as a default and then everything else can just be added on after-the-fact. This way we can still retain log data even when using TensorBoard or anything else.

mergify · 2025-05-02T16:02:31Z

This pull request has merge conflicts that must be resolved before it can be
merged. @fynnsu please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

tests/unit/test_logger.py

src/instructlab/training/logger.py

tests/unit/test_logger.py

booxter · 2025-05-08T15:31:58Z

src/instructlab/training/main_ds.py

    )
    parser.add_argument("--log_level", type=str, default="INFO")
+    parser.add_argument("--run_name", type=str, default=None)
+    parser.add_argument("--logger_type", type=str, default="async")


(Granted, that we even have argparse as an interface to the library is not optimal and we should probably get rid of it, switching to proper function arguments.)

booxter · 2025-05-08T15:34:34Z

src/instructlab/training/logger.py

+        )
+        ```
+    """
+    if not loggers:


Disagree on raising since it's a valid input. What it should probably mean - if doesn't already - is that all previously set loggers should be disabled.

booxter

Overall I think this is good to go. Some extra care with test env cleanup is advised (restore env vars; clean up loggers), and some usage document as requested by James (and me before, though perhaps I wasn't explicit as to what I ask for). Otherwise I'm ready to merge this in.

src/instructlab/training/logger.py

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

src/instructlab/training/async_logger.py

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

fynnsu · 2025-05-12T14:52:01Z

@booxter @JamesKunstle I've added a docs/logging.md file that describes both stdlib logging and its integration into instructlab.training. Let me know if anything is unclear or needs further explanation.

booxter

Thank you for pulling this off and addressing all the comments (sometimes misleading!) I like how this functionality integrates with stdlib logging approach. We should strive to be pythonic.

docs/logging.md

booxter · 2025-05-12T14:49:21Z

docs/logging.md

+python src/instructlab/training/main_ds.py \
+    ... \
+    --run_name "my_run" \
+    --logger_type "async,tensorboard,wandb" \


love how easy it is to do multi-backend, or implement another backend. ❤️

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

JamesKunstle

awesome, very excited to have this!

mergify bot added the ci-failure label Apr 25, 2025

fynnsu force-pushed the general_logging branch 2 times, most recently from f431306 to 3834c21 Compare April 25, 2025 21:49

mergify bot added ci-failure and removed ci-failure labels Apr 25, 2025

RobotSail requested changes Apr 26, 2025

View reviewed changes

fynnsu force-pushed the general_logging branch from bbf466c to 9586f3f Compare April 28, 2025 17:51

mergify bot added ci-failure and removed ci-failure labels Apr 28, 2025

fynnsu force-pushed the general_logging branch from 9586f3f to fec2ecf Compare April 28, 2025 18:04

mergify bot added ci-failure and removed ci-failure labels Apr 28, 2025

booxter reviewed Apr 28, 2025

View reviewed changes

This was referenced Apr 28, 2025

[Epic][Improvement] Logging overhaul #369

Open

Add TensorBoard Visualization #360

Closed

CI: automate quality assessment / regression detection #503

Closed

RobotSail approved these changes Apr 28, 2025

View reviewed changes

mergify bot added the one-approval label Apr 28, 2025

mergify bot added dependencies Pull requests that update a dependency file and removed ci-failure labels Apr 30, 2025

fynnsu force-pushed the general_logging branch 4 times, most recently from 777a5eb to 908cb5e Compare May 2, 2025 13:38

mergify bot added the ci-failure label May 2, 2025

mergify bot added the needs-rebase label May 2, 2025

Add wandb and tensorboard to tox.ini py3-unit deps

5c2c7ff

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

booxter reviewed May 8, 2025

View reviewed changes

src/instructlab/training/logger.py Outdated Show resolved Hide resolved

src/instructlab/training/logger.py Outdated Show resolved Hide resolved

fynnsu added 2 commits May 8, 2025 16:47

Move AsyncStructuredLogger log folder initialization to async_logger.py

ea5514d

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Implement suggestions

8713cd4

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

fynnsu force-pushed the general_logging branch from d0b3e37 to 8713cd4 Compare May 8, 2025 20:49

mergify bot added the ci-failure label May 8, 2025

format/isort

19170c6

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

mergify bot removed the ci-failure label May 8, 2025

booxter reviewed May 8, 2025

View reviewed changes

src/instructlab/training/async_logger.py Show resolved Hide resolved

Improve handling of lazy directory creation in async_logger

f999720

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

mergify bot added the ci-failure label May 12, 2025

fynnsu force-pushed the general_logging branch 2 times, most recently from 5f47d0a to 55a9275 Compare May 12, 2025 14:46

mergify bot removed the ci-failure label May 12, 2025

booxter approved these changes May 12, 2025

View reviewed changes

mergify bot removed the one-approval label May 12, 2025

booxter requested a review from JamesKunstle May 12, 2025 14:53

Add logging doc

e283c9a

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

fynnsu force-pushed the general_logging branch from 55a9275 to e283c9a Compare May 12, 2025 14:57

booxter approved these changes May 12, 2025

View reviewed changes

mergify bot added ci-failure and removed ci-failure labels May 12, 2025

JamesKunstle approved these changes May 12, 2025

View reviewed changes

JamesKunstle merged commit 7682500 into instructlab:main May 12, 2025
16 checks passed

fynnsu deleted the general_logging branch May 12, 2025 22:11

This was referenced May 27, 2025

Logging Changes Broke Data Processing Output #569

Closed

Loss of Color and Vertical Formatting in Logs After Logging Refactor #570

Closed

Conversation

fynnsu commented Apr 25, 2025

Uh oh!

RobotSail left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

booxter left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RobotSail left a comment

Choose a reason for hiding this comment

Uh oh!

booxter commented Apr 28, 2025

Uh oh!

fynnsu commented Apr 28, 2025

Uh oh!

RobotSail commented Apr 28, 2025

Uh oh!

mergify bot commented May 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

booxter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fynnsu commented May 12, 2025

Uh oh!

booxter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

booxter left a comment •

edited

Loading