Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
813fdea
feat: Add automatic reservation application for Dataform actions, rem…
max-ostapenko Dec 21, 2025
5c5e5ff
lint
max-ostapenko Jan 15, 2026
e99cd77
feat: Refactor reservation application to robustly handle all Datafor…
max-ostapenko Jan 15, 2026
51140bd
lint
max-ostapenko Jan 15, 2026
77169b5
Merge branch 'main' into auto-apply
max-ostapenko Jan 15, 2026
8707cb6
install dependencies
max-ostapenko Jan 15, 2026
21d1385
lint
max-ostapenko Jan 16, 2026
55b32d4
docs: centralize automatic reservation initialization into a dedicate…
max-ostapenko Jan 16, 2026
83d6a0a
lint
max-ostapenko Jan 16, 2026
0b461f4
feat: Enhance reservation statement injection in `index.js` to suppor…
max-ostapenko Jan 16, 2026
76e4281
feat: Verify patching of Dataform actions with new post-initializatio…
max-ostapenko Jan 16, 2026
3f21437
feat: Implement matrix testing for Dataform versions and update CI co…
max-ostapenko Jan 19, 2026
3890957
feat: Introduce autoAssignActions method for automated reservation as…
max-ostapenko Jan 19, 2026
589fb2e
feat: Add .npmignore file and update CONTRIBUTING.md and README.md fo…
max-ostapenko Jan 19, 2026
77cf3c7
chore: update package.json to move @dataform/cli to devDependencies a…
max-ostapenko Jan 20, 2026
6dc61a3
Merge branch 'main' into auto-apply
max-ostapenko Jan 20, 2026
426de6d
feat: add tests for autoAssignActions function to validate reservatio…
max-ostapenko Jan 20, 2026
ed95b58
merge
max-ostapenko Jan 20, 2026
184c3b3
chore: update dataform dependencies to version 3.0.43
max-ostapenko Jan 20, 2026
8df8c7f
chore: remove unused global 'reservations' from ESLint configuration
max-ostapenko Jan 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Copilot Instructions for Dataform Reservation Package

This document captures key learnings, debugging strategies, and architectural nuances discovered during the development of the `@masthead-data/dataform-package`.

## How to Debug

### 1. Tracing Compilation
Dataform executes JavaScript during the compilation phase. To trace what's happening:
- Use `console.error()` for debug logs. This ensures logs go to `stderr` and don't corrupt the JSON output redirected to a file.
- Avoid `console.log()` inside Dataform definitions if you plan to pipe the output to a JSON parser, as it may inject plain text into the JSON stream.

### 2. Inspecting the Graph
To see the final state of all actions:
```bash
cd test-project
npx @dataform/cli compile --json > compiled.json
```
Inspect the `tables`, `operations`, and `assertions` arrays in the resulting JSON. Check `preOps` and `queries` for the injected `SET @@reservation` statements.

### 3. Verification Script
Use the provided verification script to check invariants:
```bash
node scripts/verify_compilation.js
```
This script validates that reservations are prepended and that assertions are skipped.

## Testing Configuration

### Local Integration Testing
The `test-project` is configured to use the local version of the package. In `test-project/package.json`:
```json
"dependencies": {
"@masthead-data/dataform-package": "file:../"
}
```
**Note:** `npm ci` or `npm install` in the `test-project` caches the local package. If you make changes to `index.js` and don't see them reflected, you may need to force an update or avoid `npm ci` during rapid iteration.

### Running Tests

#### Matrix Testing (Default)
Run from the root to test all supported versions:
```bash
npm test
```
This automatically runs matrix tests across v2.4.2 and latest v3.X.X versions, managing config file conflicts.

#### Single Version (Fast Iteration)
For rapid development on the current version:
```bash
npm run test:single
```
This runs:
1. `jest`: Unit tests for helper functions
2. `dataform compile`: Generates the actual project graph
3. `verify_compilation.js`: In-depth JSON inspection

#### Specific Version
Test a single Dataform version:
```bash
npm test -- 2.4.2
```

**Note:** Matrix tests handle `dataform.json` (v2) vs `workflow_settings.yaml` (v3) conflicts automatically with cleanup traps.

**CI Integration:** GitHub Actions runs matrix tests on every PR.

## Package Architecture

### Exported Methods
1. **`autoAssignActions(config)`** - Primary method: global monkeypatch of `publish()`, `operate()`, `assert()` and `sqlxAction()`
2. **`createReservationSetter(config)`** - Secondary method: returns a function for manual per-file application
3. **`getActionName(ctx)`** - Utility: extracts action names from Dataform contexts

### Key Implementation Details
- **Monkeypatching Strategy:** Intercepts global methods immediately after config is loaded (use `_reservations.js` prefix to run first)
- **Config Preprocessing:** Converts `actions` arrays to Sets for O(1) lookup performance
- **Builder Modification:** Always modify `contextablePreOps`/`contextableQueries` on builders, not proto objects
- **Assertions:** Explicitly skipped to avoid SQL syntax errors in BigQuery

## Hard-Learned Dataform Nuances

### 1. Builder vs Proto
Dataform makes a distinction between **Action Builders** (the objects returned by `publish()`, `operate()`, etc.) and the final **Proto Objects** (the serialized state).
- **Modification Point:** To ensure persistence, modifications should be made to `action.contextablePreOps` or `action.contextableQueries` on the **Builder**. If you only modify `proto.preOps`, Dataform's internal resolution logic might overwrite your changes during the final compilation phase.

### 2. SQLX Pre-operations
In `.sqlx` files, `pre_operations { ... }` blocks are internal to Dataform. When monkeypatching, we must ensure our reservation statement is **prepended** (using `.unshift()`) so it executes before any user-defined variables or temporary functions.

### 3. The `queries()` method
For `operations`, the SQL is often set via `.queries(["SQL"])`. This method can be called multiple times or late in the script. We monkeypatch this method on the builder instance to wrap the user's input, ensuring the reservation is always at the top of the list, regardless of when `queries()` is called.

### 4. Assertions
Assertions in Dataform are strict. They expect a single `SELECT` statement. Prepending a `SET` statement will cause a syntax error in BigQuery because assertions are often wrapped in subqueries or views by Dataform. We explicitly skip assertions in this package.

## Release Process

1. Update `CHANGELOG.md` with version and changes
2. Bump version in `package.json` and `README.md`
3. Run `npm test` to verify matrix tests pass
4. Commit and push to branch
5. Create PR, ensure CI passes
6. Merge to main
7. Tag release: `npm run release --version=x.y.z`

## Known Limitations & Future Work

**Performance:** `findReservation` uses linear scan (acceptable for typical project sizes <1000 actions)
2 changes: 2 additions & 0 deletions .github/linters/eslint.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ export default [
publish: 'readonly',
constant: 'readonly',
ctx: 'readonly',
operate: 'readonly',
assert: 'readonly',
}
},
rules: {
Expand Down
7 changes: 5 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ on:
jobs:
lint-and-test:
runs-on: ubuntu-latest
strategy:
matrix:
dataform-version: ['2.4.2', '3.0.43']

steps:
- name: Checkout code
Expand All @@ -25,8 +28,8 @@ jobs:
- name: Run linter
run: npm run lint

- name: Run tests
run: npm test
- name: Run tests (Dataform ${{ matrix.dataform-version }})
run: npm test -- ${{ matrix.dataform-version }}


dependabot:
Expand Down
138 changes: 0 additions & 138 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,139 +1 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
lerna-debug.log*

# Diagnostic reports (https://nodejs.org/api/report.html)
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json

# Runtime data
pids
*.pid
*.seed
*.pid.lock

# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov

# Coverage directory used by tools like istanbul
coverage
*.lcov

# nyc test coverage
.nyc_output

# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
.grunt

# Bower dependency directory (https://bower.io/)
bower_components

# node-waf configuration
.lock-wscript

# Compiled binary addons (https://nodejs.org/api/addons.html)
build/Release

# Dependency directories
node_modules/
jspm_packages/

# Snowpack dependency directory (https://snowpack.dev/)
web_modules/

# TypeScript cache
*.tsbuildinfo

# Optional npm cache directory
.npm

# Optional eslint cache
.eslintcache

# Optional stylelint cache
.stylelintcache

# Optional REPL history
.node_repl_history

# Output of 'npm pack'
*.tgz

# Yarn Integrity file
.yarn-integrity

# dotenv environment variable files
.env
.env.*
!.env.example

# parcel-bundler cache (https://parceljs.org/)
.cache
.parcel-cache

# Next.js build output
.next
out

# Nuxt.js build / generate output
.nuxt
dist

# Gatsby files
.cache/
# Comment in the public line in if your project uses Gatsby and not Next.js
# https://nextjs.org/blog/next-9-1#public-directory-support
# public

# vuepress build output
.vuepress/dist

# vuepress v2.x temp and cache directory
.temp
.cache

# Sveltekit cache directory
.svelte-kit/

# vitepress build output
**/.vitepress/dist

# vitepress cache directory
**/.vitepress/cache

# Docusaurus cache and generated files
.docusaurus

# Serverless directories
.serverless/

# FuseBox cache
.fusebox/

# DynamoDB Local files
.dynamodb/

# Firebase cache directory
.firebase/

# TernJS port file
.tern-port

# Stores VSCode versions used for testing VSCode extensions
.vscode-test

# yarn v3
.pnp.*
.yarn/*
!.yarn/patches
!.yarn/plugins
!.yarn/releases
!.yarn/sdks
!.yarn/versions

# Vite logs files
vite.config.js.timestamp-*
vite.config.ts.timestamp-*
22 changes: 22 additions & 0 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Test files and project
test/
test-project/
scripts/

# Development files
.github/
.eslintrc.json
jest.config.js
*.log
node_modules/

# Git files
.git/
.gitignore
.gitattributes

# CI/CD
.github/

# Documentation (keep README, CHANGELOG, LICENSE)
CONTRIBUTING.md
11 changes: 10 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Security

## [0.2.0] - 2026-01-20

### Added

- **`autoAssignActions()` method** - Primary integration approach that automatically assigns actions to reservations to all Dataform actions globally without requiring manual code in each action file
- **Matrix testing infrastructure** - Automated testing across multiple Dataform versions (currently - v2.4.2 and v3.0.43)
- **API Reference section** in README with comprehensive documentation of all exported methods

## [0.1.0] - 2025-10-27

### Changed
Expand Down Expand Up @@ -51,6 +59,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Best practices guide
- Troubleshooting section

[Unreleased]: https://github.com/masthead-data/dataform-package/compare/v0.1.0...HEAD
[Unreleased]: https://github.com/masthead-data/dataform-package/compare/v0.2.0...HEAD
[0.2.0]: https://github.com/masthead-data/dataform-package/compare/v0.1.0...v0.2.0
[0.1.0]: https://github.com/masthead-data/dataform-package/compare/v0.0.1...v0.1.0
[0.0.1]: https://github.com/masthead-data/dataform-package/tree/v0.0.1
11 changes: 11 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,17 @@ We welcome contributions to the Dataform package! This document provides guideli
npm test
```

This command runs the matrix test suite which automatically:

1. Iterates through all supported Dataform versions (v2 and v3).
2. Executes unit tests and integration tests.

For faster iteration on the currently installed version in `test-project`, you can run:

```bash
npm run test:single
```

4. **Run linting:**

```bash
Expand Down
Loading