Skip to content

feat: add external YAML formatter integration (--formatter)#323

Open
RicardoAGL wants to merge 8 commits intoz3z1ma:mainfrom
RicardoAGL:feature/external-formatter
Open

feat: add external YAML formatter integration (--formatter)#323
RicardoAGL wants to merge 8 commits intoz3z1ma:mainfrom
RicardoAGL:feature/external-formatter

Conversation

@RicardoAGL
Copy link
Contributor

Summary

Adds a --formatter CLI option to refactor, organize, and document commands that runs an external YAML formatter on files written by osmosis. This lets teams use their preferred formatter (prettier, yamlfmt, yq, etc.) in a single command instead of the common 3-step CI pattern (osmosis → formatter → git diff).

  • New --formatter flag accepts any command that takes file paths as trailing args
  • Written file paths tracked via YamlRefactorContext._written_files
  • Formatter runs once after all writes complete (batch, not per-file)
  • Failure is non-fatal — logs a warning but doesn't fail the command
  • 120-second subprocess timeout (configurable)
  • Config precedence: CLI flag > dbt-osmosis.yml > None (backward compatible)

Example usage

# With prettier
dbt-osmosis yaml refactor --formatter "prettier --write"

# With yamlfmt
dbt-osmosis yaml refactor --formatter "yamlfmt"

# With yq
dbt-osmosis yaml refactor --formatter "yq -i '.'"

CI simplification

Before (3-step):

- run: dbt-osmosis yaml refactor
- run: prettier --write "models/**/*.yml"
- run: git diff --quiet models/

After (2-step):

- run: dbt-osmosis yaml refactor --formatter "prettier --write"
- run: git diff --quiet models/

Files changed (12 files, +713 lines)

File Change
src/dbt_osmosis/core/formatting.py Newrun_external_formatter() with shlex parsing, timeout, non-fatal error handling
src/dbt_osmosis/core/settings.py formatter field, _written_files tracking, resolved_formatter property
src/dbt_osmosis/core/schema/writer.py written_file_tracker callback in _write_yaml() / commit_yamls()
src/dbt_osmosis/core/sync_operations.py Pass tracker through to _write_yaml()
src/dbt_osmosis/core/restructuring.py Pass tracker through to _write_yaml() / commit_yamls()
src/dbt_osmosis/cli/main.py --formatter Click option + _run_formatter_if_configured()
src/dbt_osmosis/core/osmosis.py Re-export run_external_formatter
tests/core/test_formatting.py New — 15 tests (success, no-op, failure, timeout, command parsing, yq, yamlfmt, frozenset)
tests/core/test_settings.py 10 new tests (formatter settings, written file tracking, resolved_formatter)
tests/core/test_schema.py 2 new tests (writer tracker callback wiring)
tests/core/test_cli.py 1 new test (--formatter in help output)

Test plan

  • 692 unit tests pass (0 failures, 6 pre-existing skips)
  • Integration-tested with prettier 3.8.1 against demo_duckdb — formatter ran on 4 files successfully
  • Integration-tested with yq 4.52.2 against demo_duckdb — formatter ran on 4 files successfully
  • Tested in production dbt project (dbt-core 1.11.2 + dbt-databricks 1.11.4) with prettier
  • Security review: approved (subprocess timeout, no shell=True, shlex parsing)
  • Backward compatible — without --formatter, behavior is identical to current

Dependencies

Note: This PR is stacked on #317 (feature/fusion-compat). The 3 formatter-specific commits are:

  • 0699e88 feat: add external YAML formatter integration (--formatter)
  • 220ca76 fix: address security review findings for external formatter
  • 57900b4 fix: resolve written file paths to absolute for external formatter

Once #317 is merged, this PR's diff will show only the formatter changes.

Auto-detect dbt version and produce Fusion-compatible YAML when
column-level config is supported (dbt >= 1.9.6). When enabled,
meta/tags are nested under config blocks instead of top-level.
Adds --fusion-compat/--no-fusion-compat CLI flags for override.
Also preserves anchors, data_tests, and semantic_models in YAML.
Fix 17 pre-existing test failures:
- Pass DbtProjectContext (yaml_context.project) instead of YamlRefactorContext
  to generator functions that expect the project context directly
- Fix mock.patch paths for lazy imports: mock at dbt_core_interface.* instead
  of dbt_osmosis.core.generators.* for classes imported inside functions
- Use tmp_path instead of absolute /custom/staging to avoid read-only filesystem
Detect dbt Fusion manifests (schema version > v12) in the target
directory before parsing, so teams running both Fusion and dbt-core
get fusion_compat=True automatically even with older dbt-core versions.

Detection also checks for dbtf/dbt-fusion binaries on PATH as a
fallback when no manifest exists. The explicit --fusion-compat flag
still takes priority over all auto-detection.
- Document --fusion-compat/--no-fusion-compat in CLI reference, settings
  reference, and configuration tutorial
- Add disclaimer that dbt-osmosis requires dbt-core and does not run on
  dbt Fusion directly, with hybrid workflow instructions
- Anchor schema version regex to path segment to avoid false matches
- Optimize manifest reading to parse only first 4KB instead of full JSON
- Downgrade Fusion binary detection log from info to debug
- Remove anchors from parser allowed_keys (preserved via writer only)
- Simplify single-element tuple filters to != comparisons
- Remove unused json import from config.py
The fusion-compat setting is only available via CLI flag or
auto-detection. The project vars path is not wired up in
YamlRefactorContext, so the example was misleading.
Allow running an external formatter (prettier, yamlfmt, yq, etc.) on
YAML files after osmosis writes them, reducing CI to a single step.

- Add --formatter CLI option to refactor/organize/document commands
- Track written files via YamlRefactorContext._written_files
- New formatting.py module with run_external_formatter() (non-fatal)
- Config precedence: CLI flag > dbt-osmosis.yml > None
- 29 new tests covering formatter execution, settings, and wiring
- Add subprocess timeout (default 120s) to prevent hung formatters
- Log warning instead of silently swallowing dbt-osmosis.yml parse errors
- Reuse cached _written_file_tracker variable in restructuring.py
- Add 3 tests for timeout behavior
Relative paths caused formatter to fail when cwd differed from the
invoking directory. Resolving to absolute in register_written_file()
ensures formatters can always find the files.

Discovered during integration testing with prettier and yq against
the demo_duckdb project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant