Skip to content

Use Generator to control the randomness for each backend dataloader#45

Merged
BirkhoffG merged 14 commits intomainfrom
generator-backends
Apr 25, 2025
Merged

Use Generator to control the randomness for each backend dataloader#45
BirkhoffG merged 14 commits intomainfrom
generator-backends

Conversation

@BirkhoffG
Copy link
Owner

@BirkhoffG BirkhoffG commented Apr 24, 2025

This fixes #43
This fixes #44

@BirkhoffG BirkhoffG added the enhancement New feature or request label Apr 24, 2025
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@BirkhoffG BirkhoffG requested a review from Copilot April 24, 2025 18:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the randomness handling in each backend dataloader by introducing and standardizing the use of a Generator class. Key changes include:

  • Updating the handling and initialization of random generators in notebooks and loader modules.
  • Changing the return type of the seed() method to Optional[int] to account for generators that do not set a seed.
  • Adding Generator import and handling in multiple files to streamline random seed control.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
nbs/utils.ipynb Updated generator type checking and modified seed() return type.
nbs/loader.torch.ipynb Adjusted generator initialization and ensured proper generator import.
nbs/loader.tf.ipynb Introduced get_seed function and updated generator usage in Tf loader.
nbs/loader.jax.ipynb Updated generator handling for JAX loader.
nbs/loader.base.ipynb Added generator parameter import and handling.
jax_dataloader/utils.py Updated seed() method return type and corrected generator logic.
jax_dataloader/loaders/torch.py Modified generator initialization to rely on the Generator class.
jax_dataloader/loaders/tensorflow.py Updated generator usage and added get_seed.
jax_dataloader/loaders/jax.py Updated generator usage to set the JAX PRNGKey from the Generator.
jax_dataloader/loaders/base.py Extended the signature to include the generator parameter.
jax_dataloader/_modidx.py Documented the new get_seed function in the module index.

nbs/utils.ipynb Outdated
" \"\"\"The initial seed of the generator\"\"\"\n",
" if self._seed is None:\n",
" raise ValueError(\"The seed is not specified. Please set the seed using `manual_seed()` or pass a generator.\")\n",
" # TODO: the seed might not be initizalized if the generator is a `jax.random.PRNGKey`\n",
Copy link

Copilot AI Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'initizalized' appears to be misspelled. Consider updating it to 'initialized'.

Suggested change
" # TODO: the seed might not be initizalized if the generator is a `jax.random.PRNGKey`\n",
" # TODO: the seed might not be initialized if the generator is a `jax.random.PRNGKey`\n",

Copilot uses AI. Check for mistakes.
"""The initial seed of the generator"""
if self._seed is None:
raise ValueError("The seed is not specified. Please set the seed using `manual_seed()` or pass a generator.")
# TODO: the seed might not be initizalized if the generator is a `jax.random.PRNGKey`
Copy link

Copilot AI Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'initizalized' should be corrected to 'initialized' for clarity.

Suggested change
# TODO: the seed might not be initizalized if the generator is a `jax.random.PRNGKey`
# TODO: the seed might not be initialized if the generator is a `jax.random.PRNGKey`

Copilot uses AI. Check for mistakes.
@BirkhoffG
Copy link
Owner Author

Maybe consider tf generator as well

@BirkhoffG
Copy link
Owner Author

Maybe consider tf generator as well

The tf.random.Generator is overly complicated and not usable. If the user thinks it is useful, then we might want to support it. Otherwise, I will skip it for now.

@BirkhoffG BirkhoffG merged commit 68dce91 into main Apr 25, 2025
8 checks passed
@BirkhoffG BirkhoffG deleted the generator-backends branch April 25, 2025 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement high-level API to use Generator in jdl.DataLoader Use Generator to control the randomness for each backend dataloader

2 participants