Skip to content

[fix] Applied different rnd seed in bagel so that the noise would be sample…#129

Merged
kcz358 merged 1 commit intomainfrom
fix/bagel_rnd
Jan 14, 2026
Merged

[fix] Applied different rnd seed in bagel so that the noise would be sample…#129
kcz358 merged 1 commit intomainfrom
fix/bagel_rnd

Conversation

@kcz358
Copy link
Collaborator

@kcz358 kcz358 commented Jan 14, 2026

…d differently at different rank

Motivation

Modifications

Commit Message Convention

Please follow our standardized commit message format:

  • [feat] - New features or functionality
  • [fix] - Bug fixes
  • [docs] - Documentation changes only
  • [style] - Code style changes (formatting, missing semicolons, etc.)
  • [refactor] - Code refactoring without changing functionality
  • [perf] - Performance improvements
  • [test] - Adding or updating tests
  • [chore] - Maintenance tasks, dependency updates, etc.
  • [ci] - CI/CD configuration changes

Examples:

  • [feat] add qwen omni iterable dataset support
  • [fix] resolve bagel model configuration error
  • [docs] update training guide with YAML examples

See CONTRIBUTING.md for more details.

CI/CD Checks

Your PR will automatically run the following checks:

  • Linting: Code formatting with black (line-length=120) and import sorting with isort
  • Run pre-commit run --all-files locally to verify before pushing

Checklist

  • Follow commit message convention (see above)
  • Run pre-commit run --all-files and ensure all checks pass
  • Format your code with black (line-length=120) and isort
  • Add unit tests for new functionality
  • Update documentation as needed, including docstrings or example tutorials
  • Ensure all CI/CD checks pass

@kcz358 kcz358 merged commit 0beb0a0 into main Jan 14, 2026
2 checks passed
@kcz358 kcz358 deleted the fix/bagel_rnd branch January 14, 2026 08:43
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d6a092509

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 43 to +44
self.user_image_as_vae_condition = extra_kwargs.get("user_image_as_vae_condition", None)
self.set_random_seed(extra_kwargs.get("random_seed", 4396))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid resetting global RNG during Bagel init

Calling set_random_seed() from BagelDataProcessor.__init__ resets the global RNG state via transformers.set_seed, which happens after TrainRunner.set_random_seed() has already been applied. In runs that use the bagel processor, this overwrites the configured training seed and becomes the base seed used by seed_worker for DataLoader workers, so shuffling/dropout/noise no longer follow the training seed and are silently forced to the bagel default (and per-rank) value. This breaks reproducibility for experiments that expect the training seed to control randomness.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant