Skip to content

Commit d52d8ea

Browse files
committed
chore: bump version to 0.1.1 and fix publish workflow
- Bump version from 0.1.0.post1 to 0.1.1 - Remove direct path references (benchkit @ {root:uri}) that PyPI rejects - Add verbose output to publish steps for better error diagnostics - Add skip-existing for TestPyPI to handle re-runs gracefully - Add TestPyPI trusted publisher setup instructions
1 parent 3fcabcd commit d52d8ea

33 files changed

+1682
-1987
lines changed

.github/workflows/benchmark-nightly.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ jobs:
2121
env:
2222
XLA_FLAGS: "--xla_force_host_platform_device_count=4"
2323
JAX_PLATFORMS: "cpu"
24+
CUDA_VISIBLE_DEVICES: ""
2425
steps:
2526
- uses: actions/checkout@v4
2627

@@ -36,7 +37,8 @@ jobs:
3637
env:
3738
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
3839
run: >
39-
uv run datarax-bench run
40+
uv run python -c "from benchmarks.cli import main; main()"
41+
run
4042
--platform cpu
4143
--repetitions 3
4244
--wandb

.github/workflows/build-verification.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ jobs:
2525
python-version: ${{ matrix.python-version }}
2626

2727
- name: Install uv
28-
uses: astral-sh/setup-uv@v1
28+
uses: astral-sh/setup-uv@v4
2929
with:
3030
enable-cache: true
3131
cache-suffix: "${{ matrix.os }}-${{ matrix.python-version }}"

.github/workflows/ci.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ jobs:
2121
python-version: '3.11'
2222

2323
- name: Install uv
24-
uses: astral-sh/setup-uv@v1
24+
uses: astral-sh/setup-uv@v4
2525
with:
2626
enable-cache: true
2727
cache-suffix: "lint"
@@ -64,7 +64,7 @@ jobs:
6464
python-version: '3.11'
6565

6666
- name: Install uv
67-
uses: astral-sh/setup-uv@v1
67+
uses: astral-sh/setup-uv@v4
6868
with:
6969
enable-cache: true
7070
cache-suffix: "examples"
@@ -110,7 +110,7 @@ jobs:
110110
python-version: ${{ matrix.python-version }}
111111

112112
- name: Install uv
113-
uses: astral-sh/setup-uv@v1
113+
uses: astral-sh/setup-uv@v4
114114
with:
115115
enable-cache: true
116116
cache-suffix: "unit-${{ matrix.python-version }}-${{ matrix.os }}"
@@ -169,7 +169,7 @@ jobs:
169169
python-version: '3.11'
170170

171171
- name: Install uv
172-
uses: astral-sh/setup-uv@v1
172+
uses: astral-sh/setup-uv@v4
173173
with:
174174
enable-cache: true
175175
cache-suffix: "integration"
@@ -219,7 +219,7 @@ jobs:
219219
python-version: '3.11'
220220

221221
- name: Install uv
222-
uses: astral-sh/setup-uv@v1
222+
uses: astral-sh/setup-uv@v4
223223
with:
224224
enable-cache: true
225225
cache-suffix: "e2e"
@@ -270,7 +270,7 @@ jobs:
270270
python-version: '3.11'
271271

272272
- name: Install uv
273-
uses: astral-sh/setup-uv@v1
273+
uses: astral-sh/setup-uv@v4
274274
with:
275275
enable-cache: true
276276
cache-suffix: "perf"
@@ -317,7 +317,7 @@ jobs:
317317
python-version: '3.11'
318318

319319
- name: Install uv
320-
uses: astral-sh/setup-uv@v1
320+
uses: astral-sh/setup-uv@v4
321321
with:
322322
enable-cache: true
323323
cache-suffix: "coverage"

.github/workflows/publish.yml

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
11
# Publish to PyPI when a GitHub Release is created
22
# Uses Trusted Publishers (OIDC) - no API tokens needed!
33
#
4-
# Setup required on PyPI:
5-
# 1. Go to https://pypi.org/manage/project/datarax/settings/publishing/
6-
# 2. Add a new trusted publisher with:
4+
# Setup required on PyPI (https://pypi.org/manage/project/datarax/settings/publishing/):
75
# - Owner: avitai
86
# - Repository: datarax
97
# - Workflow name: publish.yml
108
# - Environment: pypi
9+
#
10+
# Setup required on TestPyPI (https://test.pypi.org/manage/project/datarax/settings/publishing/):
11+
# - Owner: avitai
12+
# - Repository: datarax
13+
# - Workflow name: publish.yml
14+
# - Environment: testpypi
1115

1216
name: Publish to PyPI
1317

@@ -81,6 +85,8 @@ jobs:
8185
uses: pypa/gh-action-pypi-publish@release/v1
8286
with:
8387
repository-url: https://test.pypi.org/legacy/
88+
verbose: true
89+
skip-existing: true
8490

8591
publish-pypi:
8692
name: Publish to PyPI
@@ -105,3 +111,5 @@ jobs:
105111

106112
- name: Publish to PyPI
107113
uses: pypa/gh-action-pypi-publish@release/v1
114+
with:
115+
verbose: true

.github/workflows/summary.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ jobs:
1919
- name: Run AI inference
2020
id: inference
2121
uses: actions/ai-inference@v1
22+
env:
23+
ISSUE_TITLE: ${{ github.event.issue.title }}
24+
ISSUE_BODY: ${{ github.event.issue.body }}
2225
with:
2326
prompt: |
2427
Summarize the following GitHub issue in one paragraph:
@@ -27,7 +30,7 @@ jobs:
2730
2831
- name: Comment with AI summary
2932
run: |
30-
gh issue comment $ISSUE_NUMBER --body '${{ steps.inference.outputs.response }}'
33+
gh issue comment "$ISSUE_NUMBER" --body "$RESPONSE"
3134
env:
3235
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
3336
ISSUE_NUMBER: ${{ github.event.issue.number }}

.github/workflows/test-coverage.yml

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,18 +23,10 @@ jobs:
2323
python-version: '3.11'
2424

2525
- name: Install uv
26-
run: |
27-
curl -LsSf https://astral.sh/uv/install.sh | sh
28-
29-
- name: Configure uv cache
30-
uses: actions/cache@v3
26+
uses: astral-sh/setup-uv@v4
3127
with:
32-
path: |
33-
~/.cache/uv
34-
~/.cache/uv/virtualenvs
35-
key: ${{ runner.os }}-uv-coverage-report-${{ hashFiles('pyproject.toml', 'uv.lock') }}
36-
restore-keys: |
37-
${{ runner.os }}-uv-coverage-report-
28+
enable-cache: true
29+
cache-suffix: "coverage"
3830

3931
- name: Create virtual environment
4032
run: |

README.md

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
1919
---
2020

21-
**Datarax** (*Data + Array/JAX*) is a high-performance, extensible data pipeline framework specifically engineered for JAX-based machine learning workflows. It leverages JAX's JIT compilation, automatic differentiation, and hardware acceleration to build efficient, scalable data loading, preprocessing, and augmentation pipelines on CPUs, GPUs, and TPUs.
21+
**Datarax** (*Data + Array/JAX*) is an extensible data pipeline framework built for JAX-based machine learning workflows. It leverages JAX's JIT compilation, automatic differentiation, and hardware acceleration to build data loading, preprocessing, and augmentation pipelines that run on CPUs, GPUs, and TPUs.
2222

2323
## Key Features
2424

@@ -33,18 +33,28 @@
3333

3434
## Why Datarax?
3535

36-
Datarax's differentiable pipeline architecture enables optimization paradigms that are impossible with traditional data loaders. Here are three real-world examples:
36+
JAX has mature libraries for models (Flax), optimizers (Optax), and checkpointing (Orbax), but lacks a dedicated data pipeline framework that operates at the same level of abstraction. Existing options are either framework-agnostic loaders that return NumPy arrays (losing JIT/autodiff benefits) or wrappers around tf.data/PyTorch that introduce cross-framework overhead. Datarax aims to fill this gap. The framework is under active development with ongoing performance optimization — the architecture is functional, but throughput and API surface are still being refined.
3737

38-
### Learned Augmentation Policy (10,000x Faster Search)
39-
Traditional augmentation search (AutoAugment) requires 15,000 GPU-hours of RL. With datarax's differentiable operators, [DADA-style gradient-based search](examples/advanced/differentiable/01_dada_learned_augmentation_guide.py) achieves the same accuracy in **~0.1 GPU-hours** — because gradients flow through the augmentation pipeline.
38+
### JAX-Native from the Ground Up
39+
Every component — sources, operators, batchers, samplers, sharders — is a Flax NNX module. Pipeline state is managed through NNX's variable system, which means operators can hold learnable parameters, be serialized with Orbax, and participate in JAX transformations (`jit`, `vmap`, `grad`) without special handling.
4040

41-
### Task-Optimized Image Processing (+30% Detection Accuracy)
42-
Camera ISPs are tuned for human perception, not AI tasks. Datarax's DAG executor lets you [build a differentiable ISP pipeline](examples/advanced/differentiable/02_learned_isp_guide.py) where detection loss backpropagates through every processing stage, automatically optimizing for **what the model actually needs**.
41+
### Differentiable Data Pipelines
42+
Because operators are NNX modules, gradients flow through the entire pipeline. This enables approaches that are not possible with standard data loaders:
4343

44-
### Cross-Domain Extensibility (Audio Synthesis in 3 Operators)
45-
Datarax isn't just for images. By implementing [3 custom operators for DDSP audio synthesis](examples/advanced/differentiable/03_ddsp_audio_synthesis_guide.py), you get a complete differentiable audio pipeline — with **100x less training data** than neural audio models — proving the framework extends to any domain.
44+
- [Gradient-based augmentation search](examples/advanced/differentiable/01_dada_learned_augmentation_guide.py) — replacing RL-based methods like AutoAugment with direct optimization
45+
- [Task-optimized preprocessing](examples/advanced/differentiable/02_learned_isp_guide.py) — backpropagating task loss through every processing stage
46+
- [Differentiable audio synthesis](examples/advanced/differentiable/03_ddsp_audio_synthesis_guide.py) — extending the same pattern to non-vision domains
4647

47-
> **Learn more**: [Differentiable Pipeline Examples](docs/examples/advanced/differentiable/)
48+
See the [differentiable pipeline examples](docs/examples/advanced/differentiable/) for details.
49+
50+
### DAG Execution Model
51+
Pipelines are directed acyclic graphs, not linear chains. The `>>` operator composes sequential steps, `|` creates parallel branches, and control-flow nodes (`Branch`, `Merge`, `SplitField`) handle conditional and multi-path logic. The DAG executor manages scheduling, caching, and rebatching across the graph.
52+
53+
### Deterministic Reproducibility
54+
Shuffling uses Grain's Feistel cipher permutation, which generates a full-epoch permutation in O(1) memory without materializing the index array. Combined with explicit RNG key threading through every stochastic operator, pipelines produce identical output given the same seed — across restarts, devices, and host counts.
55+
56+
### Built-in Competitive Benchmarking
57+
The benchmarking engine profiles datarax against 12+ frameworks (Grain, tf.data, PyTorch DataLoader, DALI, Ray Data, and others) across standardized scenarios. Results feed a regression guard that catches performance regressions in CI and a gap analysis that identifies optimization targets relative to the fastest framework per scenario. This benchmark-driven development loop is how datarax tracks its progress toward competitive throughput — current results and optimization status are tracked in the [benchmarking documentation](docs/benchmarks/index.md).
4858

4959
## Installation
5060

@@ -169,7 +179,7 @@ complex_pipeline = (
169179

170180
## Architecture
171181

172-
```
182+
```text
173183
src/datarax/
174184
core/ # Base modules: DataSourceModule, OperatorModule, Element, Batcher, Sampler, Sharder
175185
dag/ # DAG executor and node system (source, operator, batch, cache, control flow)
@@ -193,7 +203,7 @@ src/datarax/
193203

194204
## Benchmarking
195205

196-
Datarax includes a benchmarking suite for competitive comparison against 12 data loading frameworks across 25 scenarios spanning vision, NLP, tabular, multimodal, I/O, distributed, and pipeline complexity workloads.
206+
Datarax includes a benchmarking suite for comparison against 12+ data loading frameworks across a range of workload scenarios (vision, NLP, tabular, multimodal, distributed).
197207

198208
```bash
199209
# Install benchmark dependencies (adds PyTorch, DALI, Ray, etc.)

benchmarks/export.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,10 @@ def export(
159159
# Keep the W&B run open so we can log additional artifacts.
160160
url = self._exporter.export_run(run, finish=False)
161161

162+
# If W&B init failed (no auth / wandb not installed), skip all logging.
163+
if not url:
164+
return ""
165+
162166
# 2. Charts (datarax-specific)
163167
self._log_charts(comparative, chart_dir)
164168

0 commit comments

Comments
 (0)