Conversation
Reviewer's GuideThis PR adds worker-aware seeding to Compose by computing an effective seed per DataLoader worker, synchronizing RNG state on each call, persisting the original seed for serialization, correcting API consistency in transform constructors and calls, and covering all cases with new comprehensive tests. Sequence diagram for worker-aware seeding in Compose callsequenceDiagram
participant User
participant DataLoaderWorker
participant Compose
participant Transform
User->>DataLoaderWorker: Request batch
DataLoaderWorker->>Compose: __call__(data)
Compose->>Compose: _check_worker_seed()
Compose->>Compose: Update RNG state if worker seed changed
Compose->>Transform: set_random_state()/set_random_seed(effective_seed)
Compose->>Compose: Apply transforms
Compose-->>DataLoaderWorker: Return augmented data
Class diagram for updated Compose and BaseCompose seeding logicclassDiagram
class BaseCompose {
+int|None seed
+np.random.Generator random_generator
+random.Random py_random
+set_random_seed(seed: int|None)
+_get_effective_seed(base_seed: int|None) int|None
}
class Compose {
+int|None _last_torch_seed
+_check_worker_seed()
+__setstate__(state: dict)
+set_random_seed(seed: int|None)
+__call__(*args, force_apply: bool = False, **data) dict
}
BaseCompose <|-- Compose
Class diagram for transform constructor and call API consistencyclassDiagram
class OneOf {
+__init__(transforms, p)
}
class ChannelDropout {
+__init__(transforms, p, channels)
+__call__(*args, force_apply: bool = False, **data) dict
}
OneOf --|> BaseCompose
ChannelDropout --|> BaseCompose
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey @ternaus - I've reviewed your changes - here's some feedback:
- You’re importing torch inside both _get_effective_seed and _check_worker_seed on every call, which can be expensive—consider doing the import (and worker_info retrieval) once at module load or caching the availability check.
- There’s a lot of duplicated seed‐initialization logic between init, set_random_seed, and setstate; extracting a single helper to centralize effective seed generation would reduce maintenance overhead.
- In the channel‐by‐channel loop you changed, _track_transform_params is now called with
datainstead ofsub_data, which likely breaks correct param tracking for those sub-transforms.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- You’re importing torch inside both _get_effective_seed and _check_worker_seed on every call, which can be expensive—consider doing the import (and worker_info retrieval) once at module load or caching the availability check.
- There’s a lot of duplicated seed‐initialization logic between __init__, set_random_seed, and __setstate__; extracting a single helper to centralize effective seed generation would reduce maintenance overhead.
- In the channel‐by‐channel loop you changed, _track_transform_params is now called with `data` instead of `sub_data`, which likely breaks correct param tracking for those sub-transforms.
## Individual Comments
### Comment 1
<location> `albumentations/core/composition.py:903` </location>
<code_context>
+ except (ImportError, AttributeError):
+ pass
+
+ def __setstate__(self, state: dict[str, Any]) -> None:
+ """Set state from unpickling and handle worker seed."""
+ self.__dict__.update(state)
</code_context>
<issue_to_address>
Reset _last_torch_seed in __setstate__
Reset `_last_torch_seed` to `None` before calling `set_random_seed` to guarantee the worker-seed sync runs after unpickling, even if the unpickled value matches the current seed.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
def __setstate__(self, state: dict[str, Any]) -> None:
"""Set state from unpickling and handle worker seed."""
self.__dict__.update(state)
# If we have a seed, recalculate effective seed in worker context
if hasattr(self, "seed") and self.seed is not None:
# Recalculate effective seed in worker context
self.set_random_seed(self.seed)
=======
def __setstate__(self, state: dict[str, Any]) -> None:
"""Set state from unpickling and handle worker seed."""
self.__dict__.update(state)
# If we have a seed, recalculate effective seed in worker context
if hasattr(self, "seed") and self.seed is not None:
# Reset _last_torch_seed to ensure worker-seed sync runs after unpickling
self._last_torch_seed = None
# Recalculate effective seed in worker context
self.set_random_seed(self.seed)
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `albumentations/core/composition.py:828` </location>
<code_context>
def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
- """Apply transformations to data.
+ """Apply transformations to data with automatic worker seed synchronization.
Args:
</code_context>
<issue_to_address>
Docstring no longer documents force_apply behavior
Please include a brief description of the `force_apply` parameter in the docstring to clarify its purpose and usage.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Pull Request Overview
This PR improves reproducibility in Compose transforms by incorporating worker-aware random seeding, ensuring deterministic behavior across single- and multi-process DataLoader contexts.
- Introduces worker-aware seed calculation and synchronization through new helper methods (_get_effective_seed, _check_worker_seed, and updates to set_random_seed and setstate).
- Updates tests to cover single- and multi-worker scenarios and adjusts serialization to include the original seed.
- Corrects parameter passing in OneOf and ChannelShuffle transforms and aligns default seed values with project guidelines.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_transforms.py | Adds configuration for FrequencyMasking and updates transform tests. |
| tests/test_serialization.py | Includes seed in serialization output. |
| tests/test_per_worker_seed.py | Adds comprehensive tests for worker-aware seed functionality. |
| albumentations/core/composition.py | Enhances seed handling with worker context and updates API usage. |
| .cursor/rules/albumentations-rules.mdc | Updates coding rules, including default seed value usage. |
Comments suppressed due to low confidence (3)
tests/test_transforms.py:1792
- Consider adding a brief comment or reference for the new transform A.FrequencyMasking to clarify its intended usage and ensure future maintainability.
A.FrequencyMasking,
albumentations/core/composition.py:830
- Since the type-check for force_apply was removed, please update the call docstring to clearly describe the accepted types and behavior of the force_apply parameter.
def __call__(self, *args: Any, force_apply: bool = False, **data: Any) -> dict[str, Any]:
albumentations/core/composition.py:869
- [nitpick] Expand the _check_worker_seed docstring to detail when the worker seed synchronization is triggered and how it propagates the updated seed to child transforms.
def _check_worker_seed(self) -> None:
Summary by Sourcery
Enable worker-aware random seeding in Compose transforms to ensure reproducible augmentations across single- and multi-process DataLoader contexts.
New Features:
Bug Fixes:
Enhancements:
Tests: