Skip to content

Conversation

@edyoshikun
Copy link
Member

This PR exposes some attributes existing in HCSDataloader that help speedup and reduce the the run-time, avoiding the timeout error.

edyoshikun and others added 30 commits April 15, 2025 15:50
* caching dataloader

* caching data module

* black

* ruff

* Bump torch to 2.4.1 (#174)

* update torch >2.4.1

* black

* ruff

* adding timeout to ram_dataloader

* bandaid to cached dataloader

* fixing the dataloader using torch collate_fn

* replacing dictionary with single array

* loading prior to epoch 0

* Revert "replacing dictionary with single array"

This reverts commit 8c13f49.

* using multiprocessing manager

* add sharded distributed sampler

* add example script for ddp caching

* format and lint

* addding the custom distrb sampler to hcs_ram.py

* adding sampler to val train dataloader

* fix divisibility of the last shard

* hcs_ram format and lint

* data module that only crops and does not collate

* wip: execute transforms on the GPU

* path for if not ddp

* fix randomness in inversion transform

* add option to pop the normalization metadata

* move gpu transform definition back to data module

* add tiled crop transform for validation

* add stack channel transform for gpu augmentation

* fix typing

* collate before sending to gpu

* inherit gpu transforms for livecell dataset

* update fcmae engine to apply per-dataset augmentations

* format and lint hcs_ram

* fix abc type hint

* update docstring style

* disable grad for validation transforms

* improve sample image logging in fcmae

* fix dataset length when batch size is larger than the dataset

* fix docstring

* add option to disable normalization metadata

* inherit gpu transform for ctmc

* remove duplicate method overrride

* update docstring for ctmc

* allow skipping caching for large datasets

* make the fcmae module compatible with image translation

* remove prototype implementation

* fix import path

* Arbitrary prediction time transforms (#209)

* fix spelling in docstring and comment

* add batched zoom transform for tta

* add standalone lightning module for arbitrary TTA

* fix composition of different zoom factors

* add docstrings

* wip: segmentation module

* avoid casting

* update import path from iohub

* make integer array in fixture

* labels fixture

* test segmentation metrics modules

* less strings

* test non-empty

* select which wells to include in fit
#205

* make well selection a mixin

* wip: mmap cache data module

* support exclusion of FOVs

* wip: precompute normalization

* add augmentations benchmark

* fix cpu threads default

* fix probability (affects cpu results)

* disable metadata tracking

* fix non-distributed initialization

* refactor transforms into submodules

* wip: bootstrap and distillation

* wip: balance distillation loss

* re-define cropping transforms

* wip: joint only

* redefine random flip dict transform

* cell classification data module

* supervised cell classifier

* do not import type hints at runtime

* update docstring

* backwards compatible import path

* fix annotations

* fix style

* fix dice score import

* fix dice score parameters

* apply formatting to exercise

* fix labels data type

* fix labels input shape

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
edyoshikun and others added 24 commits May 29, 2025 11:02
* refactor select well mixin into its own module

* refactor filter functions

* rename hcs tests

* fix import

* triplet: select fovs for training

* add test

* increase example size

* test exclude fovs

* use full fov path to exclude

* add unit test

* test fov names
* relax patch version

* tweak visualization
* point to stable version

* minor wording edit

* link tutorials on the main branch

* split demo from tutorials

* add vcp links

* fix typo

* add link to hek tutorial

* update descriptions and add neuromast

* wip: try to patch CI

* Revert "wip: try to patch CI"

This reverts commit 2ecf722.

* explain versioning policy

* add todo label
* restructure and test script

* black formatted

* rename utils.py

* add doctrings

* add numpy style doctrings

* ruff fixed import error

* remove redundant comments

* black formatted

* ruff corrected

* modified docstrings

* fix typing

* fix docstring type hint

* adding tests for general functionality and  simplifiying the normalization

* adding mahotas to metrics

* removing the classes since we will add independent unit tests.

* adding some unit tests

* cleaning up the pytests functions

* moving eps as class attribute.

* removing self.

---------

Co-authored-by: Ziwen Liu <[email protected]>
Co-authored-by: Ziwen Liu <[email protected]>
Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
@ziw-liu
Copy link
Collaborator

ziw-liu commented Jun 16, 2025

The diff is already in #240?

Base automatically changed from dynaclr_v2 to main June 23, 2025 19:07
@ziw-liu
Copy link
Collaborator

ziw-liu commented Jul 11, 2025

@edyoshikun do you still need this?

@edyoshikun
Copy link
Member Author

closing this as most of the implementation is already in main. @ziw-liu open a new PR that exposes the prefetching factors and workers for the TripletDataModule if we think this will bring us a speed boost.

@edyoshikun edyoshikun closed this Jul 16, 2025
@ziw-liu ziw-liu deleted the trip_concatdl branch July 17, 2025 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants