Skip to content

Conversation

@faresobeid
Copy link
Contributor

@faresobeid faresobeid commented Feb 1, 2026

Note

Low Risk
Adds lightweight text-scanning and logging in the orchestrator metrics path; main risk is minor performance overhead or NaN/empty-data edge cases in metric computation.

Overview
Adds tracking of Chinese (CJK) characters in generated completions and logs aggregate metrics per training step.

Introduces is_chinese_char/count_chinese_chars in utils.py, decodes each rollout’s final completion in orchestrator.py, and reports chinese/char_count, chinese/char_ratio, and chinese/rollout_ratio to the monitor.

Written by Cursor Bugbot for commit dfc1720. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

if chinese_df.total_chars.sum() > 0
else 0.0
),
"chinese/rollout_ratio": chinese_df.has_chinese.mean(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty rollouts causes KeyError on DataFrame column access

Medium Severity

When train_rollouts is empty, chinese_stats becomes an empty list, and pd.DataFrame([]) creates a DataFrame with no columns. Accessing chinese_df.chinese_chars, chinese_df.total_chars, or chinese_df.has_chinese then raises a KeyError. The existing metrics_df pattern at line 584 avoids this by iterating over .columns (which is safe when empty), but the new code directly accesses column names. This would crash the orchestrator if the scheduler returns no rollouts.

Additional Locations (1)

Fix in Cursor Fix in Web

@Jackmin801
Copy link
Member

模型用中文不好吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants