Skip to content

Commit cec4ae1

Browse files
Automatic reservation application (#26)
* feat: Add automatic reservation application for Dataform actions, removing manual `pre_operations` from definitions. * lint Signed-off-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> * feat: Refactor reservation application to robustly handle all Dataform action types, introduce comprehensive testing Signed-off-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> * lint Signed-off-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> * install dependencies Signed-off-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> * lint Signed-off-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> * docs: centralize automatic reservation initialization into a dedicated file, and update ESLint globals. Signed-off-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> * lint Signed-off-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> * feat: Enhance reservation statement injection in `index.js` to support string or array pre-operations Signed-off-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> * feat: Verify patching of Dataform actions with new post-initialization tests Signed-off-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> * feat: Implement matrix testing for Dataform versions and update CI configuration * feat: Introduce autoAssignActions method for automated reservation assignment and update documentation * feat: Add .npmignore file and update CONTRIBUTING.md and README.md for automated testing and assignment improvements * chore: update package.json to move @dataform/cli to devDependencies and update workflow_settings.yaml to uncomment defaultDataset * feat: add tests for autoAssignActions function to validate reservation handling * merge * chore: update dataform dependencies to version 3.0.43 * chore: remove unused global 'reservations' from ESLint configuration --------- Signed-off-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com>
1 parent 5751bd4 commit cec4ae1

28 files changed

+1200
-750
lines changed

.github/copilot-instructions.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Copilot Instructions for Dataform Reservation Package
2+
3+
This document captures key learnings, debugging strategies, and architectural nuances discovered during the development of the `@masthead-data/dataform-package`.
4+
5+
## How to Debug
6+
7+
### 1. Tracing Compilation
8+
Dataform executes JavaScript during the compilation phase. To trace what's happening:
9+
- Use `console.error()` for debug logs. This ensures logs go to `stderr` and don't corrupt the JSON output redirected to a file.
10+
- Avoid `console.log()` inside Dataform definitions if you plan to pipe the output to a JSON parser, as it may inject plain text into the JSON stream.
11+
12+
### 2. Inspecting the Graph
13+
To see the final state of all actions:
14+
```bash
15+
cd test-project
16+
npx @dataform/cli compile --json > compiled.json
17+
```
18+
Inspect the `tables`, `operations`, and `assertions` arrays in the resulting JSON. Check `preOps` and `queries` for the injected `SET @@reservation` statements.
19+
20+
### 3. Verification Script
21+
Use the provided verification script to check invariants:
22+
```bash
23+
node scripts/verify_compilation.js
24+
```
25+
This script validates that reservations are prepended and that assertions are skipped.
26+
27+
## Testing Configuration
28+
29+
### Local Integration Testing
30+
The `test-project` is configured to use the local version of the package. In `test-project/package.json`:
31+
```json
32+
"dependencies": {
33+
"@masthead-data/dataform-package": "file:../"
34+
}
35+
```
36+
**Note:** `npm ci` or `npm install` in the `test-project` caches the local package. If you make changes to `index.js` and don't see them reflected, you may need to force an update or avoid `npm ci` during rapid iteration.
37+
38+
### Running Tests
39+
40+
#### Matrix Testing (Default)
41+
Run from the root to test all supported versions:
42+
```bash
43+
npm test
44+
```
45+
This automatically runs matrix tests across v2.4.2 and latest v3.X.X versions, managing config file conflicts.
46+
47+
#### Single Version (Fast Iteration)
48+
For rapid development on the current version:
49+
```bash
50+
npm run test:single
51+
```
52+
This runs:
53+
1. `jest`: Unit tests for helper functions
54+
2. `dataform compile`: Generates the actual project graph
55+
3. `verify_compilation.js`: In-depth JSON inspection
56+
57+
#### Specific Version
58+
Test a single Dataform version:
59+
```bash
60+
npm test -- 2.4.2
61+
```
62+
63+
**Note:** Matrix tests handle `dataform.json` (v2) vs `workflow_settings.yaml` (v3) conflicts automatically with cleanup traps.
64+
65+
**CI Integration:** GitHub Actions runs matrix tests on every PR.
66+
67+
## Package Architecture
68+
69+
### Exported Methods
70+
1. **`autoAssignActions(config)`** - Primary method: global monkeypatch of `publish()`, `operate()`, `assert()` and `sqlxAction()`
71+
2. **`createReservationSetter(config)`** - Secondary method: returns a function for manual per-file application
72+
3. **`getActionName(ctx)`** - Utility: extracts action names from Dataform contexts
73+
74+
### Key Implementation Details
75+
- **Monkeypatching Strategy:** Intercepts global methods immediately after config is loaded (use `_reservations.js` prefix to run first)
76+
- **Config Preprocessing:** Converts `actions` arrays to Sets for O(1) lookup performance
77+
- **Builder Modification:** Always modify `contextablePreOps`/`contextableQueries` on builders, not proto objects
78+
- **Assertions:** Explicitly skipped to avoid SQL syntax errors in BigQuery
79+
80+
## Hard-Learned Dataform Nuances
81+
82+
### 1. Builder vs Proto
83+
Dataform makes a distinction between **Action Builders** (the objects returned by `publish()`, `operate()`, etc.) and the final **Proto Objects** (the serialized state).
84+
- **Modification Point:** To ensure persistence, modifications should be made to `action.contextablePreOps` or `action.contextableQueries` on the **Builder**. If you only modify `proto.preOps`, Dataform's internal resolution logic might overwrite your changes during the final compilation phase.
85+
86+
### 2. SQLX Pre-operations
87+
In `.sqlx` files, `pre_operations { ... }` blocks are internal to Dataform. When monkeypatching, we must ensure our reservation statement is **prepended** (using `.unshift()`) so it executes before any user-defined variables or temporary functions.
88+
89+
### 3. The `queries()` method
90+
For `operations`, the SQL is often set via `.queries(["SQL"])`. This method can be called multiple times or late in the script. We monkeypatch this method on the builder instance to wrap the user's input, ensuring the reservation is always at the top of the list, regardless of when `queries()` is called.
91+
92+
### 4. Assertions
93+
Assertions in Dataform are strict. They expect a single `SELECT` statement. Prepending a `SET` statement will cause a syntax error in BigQuery because assertions are often wrapped in subqueries or views by Dataform. We explicitly skip assertions in this package.
94+
95+
## Release Process
96+
97+
1. Update `CHANGELOG.md` with version and changes
98+
2. Bump version in `package.json` and `README.md`
99+
3. Run `npm test` to verify matrix tests pass
100+
4. Commit and push to branch
101+
5. Create PR, ensure CI passes
102+
6. Merge to main
103+
7. Tag release: `npm run release --version=x.y.z`
104+
105+
## Known Limitations & Future Work
106+
107+
**Performance:** `findReservation` uses linear scan (acceptable for typical project sizes <1000 actions)

.github/linters/eslint.config.mjs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ export default [
1818
publish: 'readonly',
1919
constant: 'readonly',
2020
ctx: 'readonly',
21+
operate: 'readonly',
22+
assert: 'readonly',
2123
}
2224
},
2325
rules: {

.github/workflows/ci.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ on:
88
jobs:
99
lint-and-test:
1010
runs-on: ubuntu-latest
11+
strategy:
12+
matrix:
13+
dataform-version: ['2.4.2', '3.0.43']
1114

1215
steps:
1316
- name: Checkout code
@@ -25,8 +28,8 @@ jobs:
2528
- name: Run linter
2629
run: npm run lint
2730

28-
- name: Run tests
29-
run: npm test
31+
- name: Run tests (Dataform ${{ matrix.dataform-version }})
32+
run: npm test -- ${{ matrix.dataform-version }}
3033

3134

3235
dependabot:

.gitignore

Lines changed: 0 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -1,139 +1 @@
1-
# Logs
2-
logs
3-
*.log
4-
npm-debug.log*
5-
yarn-debug.log*
6-
yarn-error.log*
7-
lerna-debug.log*
8-
9-
# Diagnostic reports (https://nodejs.org/api/report.html)
10-
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
11-
12-
# Runtime data
13-
pids
14-
*.pid
15-
*.seed
16-
*.pid.lock
17-
18-
# Directory for instrumented libs generated by jscoverage/JSCover
19-
lib-cov
20-
21-
# Coverage directory used by tools like istanbul
22-
coverage
23-
*.lcov
24-
25-
# nyc test coverage
26-
.nyc_output
27-
28-
# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
29-
.grunt
30-
31-
# Bower dependency directory (https://bower.io/)
32-
bower_components
33-
34-
# node-waf configuration
35-
.lock-wscript
36-
37-
# Compiled binary addons (https://nodejs.org/api/addons.html)
38-
build/Release
39-
40-
# Dependency directories
411
node_modules/
42-
jspm_packages/
43-
44-
# Snowpack dependency directory (https://snowpack.dev/)
45-
web_modules/
46-
47-
# TypeScript cache
48-
*.tsbuildinfo
49-
50-
# Optional npm cache directory
51-
.npm
52-
53-
# Optional eslint cache
54-
.eslintcache
55-
56-
# Optional stylelint cache
57-
.stylelintcache
58-
59-
# Optional REPL history
60-
.node_repl_history
61-
62-
# Output of 'npm pack'
63-
*.tgz
64-
65-
# Yarn Integrity file
66-
.yarn-integrity
67-
68-
# dotenv environment variable files
69-
.env
70-
.env.*
71-
!.env.example
72-
73-
# parcel-bundler cache (https://parceljs.org/)
74-
.cache
75-
.parcel-cache
76-
77-
# Next.js build output
78-
.next
79-
out
80-
81-
# Nuxt.js build / generate output
82-
.nuxt
83-
dist
84-
85-
# Gatsby files
86-
.cache/
87-
# Comment in the public line in if your project uses Gatsby and not Next.js
88-
# https://nextjs.org/blog/next-9-1#public-directory-support
89-
# public
90-
91-
# vuepress build output
92-
.vuepress/dist
93-
94-
# vuepress v2.x temp and cache directory
95-
.temp
96-
.cache
97-
98-
# Sveltekit cache directory
99-
.svelte-kit/
100-
101-
# vitepress build output
102-
**/.vitepress/dist
103-
104-
# vitepress cache directory
105-
**/.vitepress/cache
106-
107-
# Docusaurus cache and generated files
108-
.docusaurus
109-
110-
# Serverless directories
111-
.serverless/
112-
113-
# FuseBox cache
114-
.fusebox/
115-
116-
# DynamoDB Local files
117-
.dynamodb/
118-
119-
# Firebase cache directory
120-
.firebase/
121-
122-
# TernJS port file
123-
.tern-port
124-
125-
# Stores VSCode versions used for testing VSCode extensions
126-
.vscode-test
127-
128-
# yarn v3
129-
.pnp.*
130-
.yarn/*
131-
!.yarn/patches
132-
!.yarn/plugins
133-
!.yarn/releases
134-
!.yarn/sdks
135-
!.yarn/versions
136-
137-
# Vite logs files
138-
vite.config.js.timestamp-*
139-
vite.config.ts.timestamp-*

.npmignore

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Test files and project
2+
test/
3+
test-project/
4+
scripts/
5+
6+
# Development files
7+
.github/
8+
.eslintrc.json
9+
jest.config.js
10+
*.log
11+
node_modules/
12+
13+
# Git files
14+
.git/
15+
.gitignore
16+
.gitattributes
17+
18+
# CI/CD
19+
.github/
20+
21+
# Documentation (keep README, CHANGELOG, LICENSE)
22+
CONTRIBUTING.md

CHANGELOG.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1919

2020
### Security
2121

22+
## [0.2.0] - 2026-01-20
23+
24+
### Added
25+
26+
- **`autoAssignActions()` method** - Primary integration approach that automatically assigns actions to reservations to all Dataform actions globally without requiring manual code in each action file
27+
- **Matrix testing infrastructure** - Automated testing across multiple Dataform versions (currently - v2.4.2 and v3.0.43)
28+
- **API Reference section** in README with comprehensive documentation of all exported methods
29+
2230
## [0.1.0] - 2025-10-27
2331

2432
### Changed
@@ -51,6 +59,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
5159
- Best practices guide
5260
- Troubleshooting section
5361

54-
[Unreleased]: https://github.com/masthead-data/dataform-package/compare/v0.1.0...HEAD
62+
[Unreleased]: https://github.com/masthead-data/dataform-package/compare/v0.2.0...HEAD
63+
[0.2.0]: https://github.com/masthead-data/dataform-package/compare/v0.1.0...v0.2.0
5564
[0.1.0]: https://github.com/masthead-data/dataform-package/compare/v0.0.1...v0.1.0
5665
[0.0.1]: https://github.com/masthead-data/dataform-package/tree/v0.0.1

CONTRIBUTING.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,17 @@ We welcome contributions to the Dataform package! This document provides guideli
2323
npm test
2424
```
2525

26+
This command runs the matrix test suite which automatically:
27+
28+
1. Iterates through all supported Dataform versions (v2 and v3).
29+
2. Executes unit tests and integration tests.
30+
31+
For faster iteration on the currently installed version in `test-project`, you can run:
32+
33+
```bash
34+
npm run test:single
35+
```
36+
2637
4. **Run linting:**
2738

2839
```bash

0 commit comments

Comments
 (0)