28 Oct 16:43

zhenghh04

b56c741

Release v2.0.1 Latest

Latest

What's Changed

Fixed mocking for DFTracer by @hariharan-devarajan in #220
Fixed iterator to only store data for that rank. by @hariharan-devarajan in #216
Fix PyPI Publish Issue and Improve Project Metadata by @izzet in #224
Fix missing import for chunking. by @hariharan-devarajan in #223
Improve CI Performance. by @hariharan-devarajan in #227
For sample indexing we fix the uneven sampling by @hariharan-devarajan in #226
fix misleading generator message by @rayandrew in #231
fix negative value of computation time when stdev exists by @rayandrew in #233
Bugfix: fix type of number for offset and size by @hariharan-devarajan in #229
Fix wrong configuration for hdf5 chunking by @rayandrew in #237
fix last step is not executed by @rayandrew in #236
fix wrong tracing location of fetch data by @rayandrew in #238
enable option to disable pin_memory in pytorch by @rayandrew in #239
Change max to abs for preprocess time by @rayandrew in #240
New improved modelling for LLM Deepspeed. by @hariharan-devarajan in #230
Add user config to specify type of distribution of time configuration by @rayandrew in #241
fixed bug on doc action by @zhenghh04 in #251
Update jekyll-gh-pages.yml by @zhenghh04 in #254
upgrade pydftracer package by @rayandrew in #242
Support for setting different DLIO_LOG_LEVEL by @zhenghh04 in #222
Enhancing metric calculation and output functionality by @zhenghh04 in #253
Checkpointing support for transformer type models by @zhenghh04 in #247
Update jekyll-gh-pages.yml by @zhenghh04 in #257
Fix doc deployment issue by @zhenghh04 in #258
left over logging fix by @zhenghh04 in #259
docker: use pip install to match readme by @glimchb in #265
ci: add docker build and publish by @glimchb in #263
Darshan preload environment variable removed by @zhenghh04 in #260
Copyright update by @zhenghh04 in #261
Update docker.yml by @zhenghh04 in #266
ci: also publish docker image on releases by @glimchb in #267
Fix saving checkpoint print by @LouisDDN in #270
docs: small readme typo by @glimchb in #268
Fixed loading checkpoint timer by @zhenghh04 in #273
Refactor: move pydftracer dependency to extras for better management by @hariharan-devarajan in #275
Enhancement for checkpoint feature by @zhenghh04 in #276
Separate read and write checkpoints. by @zhenghh04 in #278
configs by @zhenghh04 in #284
docker: improve docker cache and remove sources by @glimchb in #287
Fixes for v2.0 benchmark by @johnugeorge in #289
Reorganized the code provided by YardenMa for O_DIRECT support with NPY and NPZ formats and pytorch by @timothy-chau in #286
RAM optimisations for checkpointing by @LouisDDN in #283
Randomize tensor data by default (checkpoint) by @LouisDDN in #291
S3 Fix by @zhenghh04 in #294
Mlperf storage v2.0 by @zhenghh04 in #303
Dimension-based Dataset Generation by @rayandrew in #301
docs(profiling): fix dftracer repo location by @glimchb in #304
Add DFTracer AI logging support with dftracer by @rayandrew in #302
increase tests timeout to 600s (10 minutes) by @rayandrew in #312

New Contributors

@rayandrew made their first contribution in #231
@glimchb made their first contribution in #265
@timothy-chau made their first contribution in #286

Full Changelog: v2.0.0...v2.0.1

Contributors

johnugeorge, izzet, and 6 other contributors

Assets 2

13 Aug 21:18

zhenghh04

v2.0.0

866828c

Release v2.0.0

What's Changed

Add docker image with CPU only dependencies by @johnugeorge in #8
Add dlio fixes by @johnugeorge in #10
Fixed issues related to checkpointing and profiling by @zhenghh04 in #13
Config parameters fixes by @johnugeorge in #11
Fixing folder number for evaluation by @johnugeorge in #14
fixed checkpoint issues by @zhenghh04 in #16
Adding PR unit tests for testing different data format and fixing issues for reading png and jpeg with pytorch data folder. by @zhenghh04 in #17
A bunch of minor fixes by @zhenghh04 in #18
Minor fixes by @zhenghh04 in #22
Add ckpting to UNET3D workload, remove old prefetch param by @lhovon in #23
Minor modification of configuration options to remove some confusion by @zhenghh04 in #25
Adding Storage interface for supporting multiple storage backends by @johnugeorge in #20
Code Fixes by @johnugeorge in #26
Add the UNET3D sleep time for V100 32GB batch size 4 by @lhovon in #29
Minor config changes by @johnugeorge in #31
Make hydra config folder configurable by @johnugeorge in #32
Mlperf storage v0.5 by @zhenghh04 in #33
Changes to support segregation of data loader and reader by @hariharan-devarajan in #37
Added application-level profile support for DLIO by @hariharan-devarajan in #39
Multithreading issue with TensorFlow and PyTorch dataloader by @hariharan-devarajan in #44
bug fix to free memory once file is completely read by @hariharan-devarajan in #51
Pull changes from mlperf_storage_v0.5.1 by @zhenghh04 in #52
Improved tracing utility added preprocessing support by @zhenghh04 in #53
Trace improvement. by @hariharan-devarajan in #48
Moved resize image to config by @zhenghh04 in #55
instead of using direct methods using enter and exit. by @hariharan-devarajan in #54
Reorganizing output files by @zhenghh04 in #56
Generator fixed random seed by @zhenghh04 in #58
Merging branch mlperf_storage_v0.5.1 by @zhenghh04 in #57
fixing mistakes in calculating total number of steps by @zhenghh04 in #59
Mlperf storage v0.5.1 by @zhenghh04 in #60
Added support for Dali data loader by @hariharan-devarajan in #49
Changed datatype to be np.uint8 universally in the call by @zhenghh04 in #61
Adding support for training on a subset of dataset by @zhenghh04 in #63
DLIO profiler integration by @hariharan-devarajan in #62
Added Support Power9PC by @hariharan-devarajan in #65
Update unet3d.yaml to correct the sample size for unet3d by @zhenghh04 in #68
For X86 and AMD machines, we can create a pip based dlio installations by @hariharan-devarajan in #66
Added validation to check enough core available for reading by @hariharan-devarajan in #73
Added custom plugin code for custom data loader and reader. by @hariharan-devarajan in #74
Changes required within DLIO Benchmark for creating a pip wheel by @hariharan-devarajan in #77
Update bert.yaml to be consistent with mlperf storage by @zhenghh04 in #79
Fixing subfolder issues and added subset tests by @zhenghh04 in #82
Documentation: Instructions to compile and run on Lassen machine. by @OlgaKogiou in #85
Changes to improve documentation by @hariharan-devarajan in #89
Fixed dali data loader execution. by @hariharan-devarajan in #91
Enhancing Dali data loader support by @zhenghh04 in #94
Fixing Dali Data loader Parallelism and Pipelining. by @hariharan-devarajan in #93
Update typo which gives issue for pytorch 1.3.1 by @hariharan-devarajan in #103
Added documentation for the JPEG generator issue by @kaushikvelusamy in #100
Workloads by @zhenghh04 in #97
Added Info logging for profiler and removed unnecessary bracket calls. by @hariharan-devarajan in #104
Fix the data dir path by @hariharan-devarajan in #108
Making DLIO Profiler default for dlio_benchmark. by @hariharan-devarajan in #111
Adding dlp logger. by @hariharan-devarajan in #109
Workloads by @zhenghh04 in #112
fixed readthedoc build issue by @zhenghh04 in #115
fix Docker file to use venv. by @hariharan-devarajan in #119
Switch dlio_profiler to use pypi instead of github by @hariharan-devarajan in #120
Added force install for profiler for avoiding caching issues by @hariharan-devarajan in #123
Update README.md by @venkat-1 in #121
torch checkpoint creation should use storage class methods by @krehm in #126
Reducing Github actions time by @zhenghh04 in #128
Create output_folder using os.makedirs() by @krehm in #124
Adding Native Dali Data Loader support for TFRecord, Images, and NPZ files by @zhenghh04 in #118
Add support for pytorch spawn and forkserver multiprocessing_context by @krehm in #129
Reopen dlio.log in non-fork reader_threads child processes by @krehm in #130
added checkpointing to support LLMs by @hariharan-devarajan in #114
added dlp for spawned workers pytorch by @hariharan-devarajan in #136
Fix MPI finalization. by @hariharan-devarajan in #139
Adding dlio_profiler to requirements.txt by @johnugeorge in #144
Fix dataloader initialization to only happen once. Not on every epoch. by @hariharan-devarajan in #143
Fix random sampling pytorch non-determinism. by @hariharan-devarajan in #145
Fixed printing for DLIO output. by @hariharan-devarajan in #142
Doc changes to fix DLIO profiler and remove IOStat by @hariharan-devarajan in #146
Support for custom checkpointing. by @hariharan-devarajan in #137
Feature/parallel io generator by @hariharan-devarajan in #148
fix random bugs and printing by @hariharan-devarajan in #147
Release for v2.0 by @zhenghh04 in #113
Fix requirements file by @johnugeorge in #150
fixed sample distribution bugs by @zhenghh04 in #152
Fix sample shuffling by @hariharan-devarajan in #154
Optimization to sample distribution by @TheAssembler1 in https://github.com/argonne-lcf/dlio_benchmark/pull...

Contributors

johnugeorge, izzet, and 9 other contributors

Assets 2

03 Feb 14:57

zhenghh04

v1.1

09bc4f7

DLIO v1.1

In this new release, we have the following changes and new enhancements

Added support for S3 storage
Updated config files for MLPerf Storage workloads: UNet3D and Bert.
Changes on configuration options:
- added variability support for sample size, training and validation computation time.
- changes on shuffling, prefetching setting.
- moved batch_size, batch_size_eval to reader session

This release is correspondence to MLPerf storage v0.5 prerelease: https://github.com/mlcommons/storage/releases/tag/v0.5-rc0

Assets 2

16 Nov 06:44

zhenghh04

v1.0.0

dd187e9

DLIO v1.0

DLIO v1.0 Release Notes

We are excited to announce the release of DLIO 1.0! There are many new features and new enhancements compared to previous 0.0.1 version:

Using YAML file to configure DLIO in Hydra.cc framework; The configuration options are organized in a hierarchical way, including model, framework, workflow, dataset, train, evaluation, checkpoint, profiling. a set of YAML files for some workloads are included.
Data loader support enhancement:
- Added data loader layer above data format to allow user to choose data loader and data format independently.
- Added PyTorch data loader support. We have full PyTorch data loader support for one sample per file dataset
- Enhanced TensorFlow tf.data loader to support for generic file format beyond tfrecord format (currently only support one sample per file case for generic data format)
New dataset support
- Added support for png and jpeg formats
- Supporting multiple subfolders for training and validation datasets.
- Supporting generating validation dataset
Profiling and logging
- Added support for iostat profiling
- Added detailed logging info
Added support for validation.
Added post processing python script
Added unit tests and GitHub Actions tests.
User and developer documentation in github.io: https://argonne-lcf.github.io/dlio_benchmark

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Contributors

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

DLIO v1.0 Release Notes

Uh oh!

Releases: argonne-lcf/dlio_benchmark

Release v2.0.1

What's Changed

New Contributors

Contributors

Uh oh!

Release v2.0.0

What's Changed

Contributors

Uh oh!

DLIO v1.1

Uh oh!

DLIO v1.0

DLIO v1.0 Release Notes

Uh oh!