Commit 3fcabcd
committed
feat: benchmarking framework, Docker infrastructure, and documentation overhaul
Benchmarking application layer (benchmarks/):
- Add competitive benchmark suite with 12 framework adapters (Grain, tf.data,
PyTorch DataLoader, DALI, Ray Data, SPDL, MosaicML, WebDataset, HF Datasets,
LitData, Deep Lake, jax-dataloader)
- Add scenario-driven runner with TOML config profiles (cpu, gpu_a100, tpu_v5e)
- Add datarax-bench CLI with Click (run/export/compare subcommands)
- Add analysis modules: gap detection, stability validation, comparison reports
- Add visualization: throughput bars, radar, latency CDF, memory waterfall,
scaling curves, chain depth, feature heatmap
- Add W&B export with benchkit adapter, raw results artifact persistence
- Add pre-import fixups (_preload.py) for Deep Lake/TF/JAX/Ray ordering
Benchmarking engine (src/datarax/benchmarking/):
- Add resource_monitor, results, statistics, timing modules
- Refactor profiler, comparative, regression, monitor modules
- Remove deprecated pipeline_throughput module
Cloud infrastructure:
- Add SkyPilot configs (cpu/gpu/tpu) with datarax-bench CLI integration,
W&B export, PYTHONPATH fix for console scripts, Ray Data exclusion
- Add .dockerignore to reduce build context from ~15GB to <500MB
- Update Dockerfile: switch to runtime CUDA image, two-layer dep caching,
CMD instead of ENTRYPOINT, uv binary COPY
- Add benchmark Dockerfiles (cpu/gpu/tpu) in benchmarks/docker/
- Add CI workflows: benchmark-gate (PR) and benchmark-nightly
Tools:
- Add benchkit package (tools/benchkit/) for benchmark data management:
store, exporters (W&B, JSON, HTML), metric definitions, analysis
Source and operator improvements:
- Add eager source ops, index_shuffle (Feistel cipher O(1) shuffling)
- Add image validation utilities
- Refactor source modules (memory, HF, TFDS, mixed, array_record)
- Update operator strategies (sequential, parallel, branching, ensemble, merging)
Documentation:
- Restructure docs/ with updated API reference, benchmarking guides,
contributing guides, source documentation
- Add Docker documentation (docs/contributing/docker.md)
- Add benchmark results, resource monitor, statistics, timing docs
- Update all examples and notebooks for current API
- Update mkdocs.yml navigation
Script consolidation:
- Remove 8 redundant scripts (run_benchmarks, run_gpu_tests, run_lint, etc.)
- Add generate_baselines.py, run_full_benchmark.sh, verify_docs.py
- Move vertex_config.yaml.template to scripts/
Tests:
- Add benchmark test suite (P0-P5 priority levels, CLI, export, performance)
- Add benchmarking engine tests (profiler, resource monitor, results, etc.)
- Add source tests (eager ops, mixed source)
- Update existing test fixtures and utilities1 parent f82894d commit 3fcabcd
File tree
493 files changed
+52738
-7308
lines changed- .github/workflows
- benchmarks
- adapters
- analysis
- baselines
- config
- hardware_profiles
- core
- docker
- fixtures
- runners
- scenarios
- augmentation
- datarax_unique
- distributed
- io
- multimodal
- nlp
- pipeline_complexity
- production
- tabular
- vision
- sky
- tests
- test_adapters
- test_analysis
- test_runners
- test_scenarios
- test_visualization
- visualization
- docs
- api_reference
- assets/images/examples
- benchmarking
- benchmarks
- contributing
- core
- dag
- distributed
- examples
- advanced
- differentiable
- distributed
- performance
- basic
- core
- integration/huggingface
- quick-reference
- getting_started
- operators
- sources
- user_guide
- examples
- advanced
- differentiable
- distributed
- performance
- comparison
- core
- integration/huggingface
- scripts
- src/datarax
- batching
- benchmarking
- checkpoint
- cli
- config
- control
- core
- dag
- nodes
- distributed
- memory
- monitoring
- operators
- modality
- audio
- image
- text
- strategies
- performance
- samplers
- sharding
- sources
- utils
- workers
- tests
- benchmarking
- benchmarks
- cli
- core
- dag
- distributed
- fixtures/crepe
- memory
- monitoring
- operators
- modality
- audio
- image
- strategies
- samplers
- scripts
- sources
- test_common
- tools/benchkit
- src/benchkit
- exporters
- tests
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
493 files changed
+52738
-7308
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
48 | | - | |
| 47 | + | |
49 | 48 | | |
50 | 49 | | |
51 | 50 | | |
| |||
0 commit comments