Skip to content

[python-package] model_to_string() returns empty string intermittently when early_stopping callback is used #7186

@nbx-liz

Description

@nbx-liz

Description

When using LGBMRegressor.fit() with eval_set and lgb.early_stopping() callback, model_to_string() intermittently returns an empty model string containing only pandas_categorical metadata (no tree structure). This causes model_from_string() at engine.py:350 to fail with:

LightGBMError: Model file doesn't specify the number of classes

The booster itself is valid (correct num_trees(), num_feature(), current_iteration()), but the C API LGBM_BoosterSaveModelToString produces an incomplete output.

Reproducibility

  • Failure rate: ~5-10% per fit() call
  • Affected versions: 4.3.0, 4.5.0, 4.6.0 (all tested)
  • Platform: Linux (WSL2), Python 3.11
  • Non-deterministic: same data/params sometimes succeeds, sometimes fails

Minimal Reproduction

import lightgbm as lgb
import pandas as pd
import numpy as np

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv"
df = pd.read_csv(url)
y = df["price"].values
X = df.drop(columns=["price"])
for col in X.select_dtypes(include="object").columns:
    X[col] = X[col].astype("category").cat.codes

idx = np.random.RandomState(42).permutation(len(X))
X_train, y_train = X.iloc[idx[:40000]], y[idx[:40000]]
X_valid, y_valid = X.iloc[idx[40000:45000]], y[idx[40000:45000]]

# Repeatedly fit new models — ~5-10% will crash
for i in range(100):
    n_est = np.random.randint(600, 2500)
    obj = np.random.choice(["huber", "mae"])
    model = lgb.LGBMRegressor(
        n_estimators=n_est, objective=obj,
        learning_rate=0.01, max_depth=8, verbose=-1,
    )
    model.fit(
        X_train, y_train,
        eval_set=[(X_valid, y_valid)],
        callbacks=[lgb.early_stopping(150, verbose=False), lgb.log_evaluation(-1)],
    )
    # Crash happens inside fit() at engine.py:350

Condition Isolation

Condition Failure rate
eval_set + early_stopping callback ~8%
eval_set + log_evaluation only 0%
eval_set only (no callbacks) 0%
No eval_set 0%

The early_stopping callback does not need to actually trigger (models run to num_boost_round). Its mere presence in the callback list is sufficient.

Debug Findings

By patching engine.train() with keep_training_booster=True and manually inspecting model_to_string():

  • Empty model string: 167 bytes, containing only pandas_categorical:[...]
  • Booster state is valid: num_trees()=1597, num_feature()=9, current_iteration()=1597
  • The C API LGBM_BoosterSaveModelToString returns a truncated result despite the booster holding a valid model

Expected Behavior

model_to_string() should always return the complete model string including tree structures when num_trees() > 0.

Environment

LightGBM: 4.3.0 / 4.5.0 / 4.6.0 (all reproduce)
OS: Linux 6.6.87 (WSL2)
Python: 3.11.14

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions