Debugging the TripletDataModule using the ConcatDataModule #257

edyoshikun · 2025-06-10T16:55:49Z

This PR exposes some attributes existing in HCSDataloader that help speedup and reduce the the run-time, avoiding the timeout error.

* caching dataloader * caching data module * black * ruff * Bump torch to 2.4.1 (#174) * update torch >2.4.1 * black * ruff * adding timeout to ram_dataloader * bandaid to cached dataloader * fixing the dataloader using torch collate_fn * replacing dictionary with single array * loading prior to epoch 0 * Revert "replacing dictionary with single array" This reverts commit 8c13f49. * using multiprocessing manager * add sharded distributed sampler * add example script for ddp caching * format and lint * addding the custom distrb sampler to hcs_ram.py * adding sampler to val train dataloader * fix divisibility of the last shard * hcs_ram format and lint * data module that only crops and does not collate * wip: execute transforms on the GPU * path for if not ddp * fix randomness in inversion transform * add option to pop the normalization metadata * move gpu transform definition back to data module * add tiled crop transform for validation * add stack channel transform for gpu augmentation * fix typing * collate before sending to gpu * inherit gpu transforms for livecell dataset * update fcmae engine to apply per-dataset augmentations * format and lint hcs_ram * fix abc type hint * update docstring style * disable grad for validation transforms * improve sample image logging in fcmae * fix dataset length when batch size is larger than the dataset * fix docstring * add option to disable normalization metadata * inherit gpu transform for ctmc * remove duplicate method overrride * update docstring for ctmc * allow skipping caching for large datasets * make the fcmae module compatible with image translation * remove prototype implementation * fix import path * Arbitrary prediction time transforms (#209) * fix spelling in docstring and comment * add batched zoom transform for tta * add standalone lightning module for arbitrary TTA * fix composition of different zoom factors * add docstrings * wip: segmentation module * avoid casting * update import path from iohub * make integer array in fixture * labels fixture * test segmentation metrics modules * less strings * test non-empty * select which wells to include in fit #205 * make well selection a mixin * wip: mmap cache data module * support exclusion of FOVs * wip: precompute normalization * add augmentations benchmark * fix cpu threads default * fix probability (affects cpu results) * disable metadata tracking * fix non-distributed initialization * refactor transforms into submodules * wip: bootstrap and distillation * wip: balance distillation loss * re-define cropping transforms * wip: joint only * redefine random flip dict transform * cell classification data module * supervised cell classifier * do not import type hints at runtime * update docstring * backwards compatible import path * fix annotations * fix style * fix dice score import * fix dice score parameters * apply formatting to exercise * fix labels data type * fix labels input shape --------- Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>

… pl trainer

* refactor select well mixin into its own module * refactor filter functions * rename hcs tests * fix import * triplet: select fovs for training * add test * increase example size * test exclude fovs * use full fov path to exclude * add unit test * test fov names

* relax patch version * tweak visualization

* point to stable version * minor wording edit * link tutorials on the main branch * split demo from tutorials * add vcp links * fix typo * add link to hek tutorial * update descriptions and add neuromast * wip: try to patch CI * Revert "wip: try to patch CI" This reverts commit 2ecf722. * explain versioning policy * add todo label

…nthesis

* restructure and test script * black formatted * rename utils.py * add doctrings * add numpy style doctrings * ruff fixed import error * remove redundant comments * black formatted * ruff corrected * modified docstrings * fix typing * fix docstring type hint * adding tests for general functionality and simplifiying the normalization * adding mahotas to metrics * removing the classes since we will add independent unit tests. * adding some unit tests * cleaning up the pytests functions * moving eps as class attribute. * removing self. --------- Co-authored-by: Ziwen Liu <[email protected]> Co-authored-by: Ziwen Liu <[email protected]> Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>

Co-authored-by: Ziwen Liu <[email protected]>

…. setting configurable phate and pca.

…atamodule

ziw-liu · 2025-06-16T17:56:59Z

The diff is already in #240?

ziw-liu · 2025-07-11T20:09:53Z

@edyoshikun do you still need this?

edyoshikun · 2025-07-16T23:46:16Z

closing this as most of the implementation is already in main. @ziw-liu open a new PR that exposes the prefetching factors and workers for the TripletDataModule if we think this will bring us a speed boost.

edyoshikun and others added 30 commits April 15, 2025 15:50

delete outdatted figure making scripts

0c73676

delte unused infection classification scripts

99ed61f

rename infection classfication README.md

75b64b8

adding old visualization code

6a646ee

deleting old evaluation code

e7f575b

cherry pick commit adding xyz coordinates to xarray

7e661aa

cherry-pick flexible number of PC components

c5edf90

cherry-pick updating distance and ALFI MSF measurments

3624861

cherry-pick moving occlusion script to figures

71481ec

adding demo prototype for dynaclr

72b7482

cleaning up demo imagenet vs dynaclr

2efcb4e

abstracting the writing to xarray format so it doesn't dependo on the…

ad201a9

… pl trainer

fixing patch size mismatch

1f52dcf

benchmark_demos

c2f686b

making pca compute similar to phate

240bcd9

imagenet lm module

aa48bf5

update openphenom lm module

cd179c0

update embedding_writer to accept configurable PHATE,PCA,UMAP

a73e216

add plotting utils for dtw

975eb5a

update imagenet and openpheno to accept multi channel

06439c1

readme for demo

dcc6da0

moving examples and simplifying example to load and display

87904d2

update README.md with instructions to run inference

963926d

missing plotly dash in visual

6c58145

adding a test case with dash

974d058

format

4310c42

fix imports

15f837b

removing paths and use the ones relative to download

f3ab2c1

dynaclr-denv-vs and interactive visualizer update

8c20eaa

edyoshikun and others added 24 commits May 29, 2025 11:02

moving the old cli scripts for the dynaclr demo to examples

5c79d69

Tweak VCP tutorials (#251)

eab9819

* relax patch version * tweak visualization

Merge branch 'main' into dynaclr_v2

65df54b

uploading demo as html file since jupyter cannot render it

e71ff90

updating demo readmes to point to public

7d0117e

remove the play pause buttons for the phate demo

3cfa350

standardizing viscy hosted models to have the description in the pare…

4e0dd73

…nthesis

numpy docstring on openphenom and removing torchno grad redundancy

d8ca08c

replacing go.heatmap to go.image for visualization

0daf4c8

reviewed README

918ff03

added link to DynaCLR demo

9658fc3

adding script to generate pseudotracks

9a2ec1c

adding global paths and todos to pseudo tracks

3d81e79

Update examples/DynaCLR/setup.sh

e1922b5

Co-authored-by: Ziwen Liu <[email protected]>

adding documentation to the visualization class

3a5b2ac

visualization app will take z-range as tuple

ce64513

removing the redundant 'reduction' attribute in the embedding_writter…

78f9fc3

…. setting configurable phate and pca.

remove dynacell metrics config

70cefe8

replacing prints with assertions

62c6e87

adding convenience function to testimate the dataloader settings.

b3029ae

adding pin_memory, consistent workers and prefetching to the tripletd…

b877d2c

…atamodule

Base automatically changed from dynaclr_v2 to main June 23, 2025 19:07

edyoshikun closed this Jul 16, 2025

ziw-liu deleted the trip_concatdl branch July 17, 2025 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Debugging the TripletDataModule using the ConcatDataModule #257

Debugging the TripletDataModule using the ConcatDataModule #257

Uh oh!

edyoshikun commented Jun 10, 2025

Uh oh!

ziw-liu commented Jun 16, 2025

Uh oh!

ziw-liu commented Jul 11, 2025

Uh oh!

edyoshikun commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Debugging the TripletDataModule using the ConcatDataModule #257

Debugging the TripletDataModule using the ConcatDataModule #257

Uh oh!

Conversation

edyoshikun commented Jun 10, 2025

Uh oh!

ziw-liu commented Jun 16, 2025

Uh oh!

ziw-liu commented Jul 11, 2025

Uh oh!

edyoshikun commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants