Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ SQLMESH_DUCKDB_LOCAL_PATH=/tmp/oso.duckdb
DAGSTER_USE_LOCAL_SECRETS=True
#DAGSTER_GCP_SECRETS_PREFIX=dagster

# OSO's python libraries are configured to use json logging by default but this
# can be annoying when viewing things locally. This will configure logs to be
# output in a more human-readable format.
OSO_ENABLE_JSON_LOGS=0

## Google Cloud setup
# You will need to generate Google application credentials.
# You can log in via `gcloud auth application-default login`
Expand Down
37 changes: 37 additions & 0 deletions apps/docs/docs/contribute-data/setup/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,3 +215,40 @@ Notice that after `-m` the code location's module path is specified. It is
useful to note for newcomers that the `warehouse/` path in the repository is not
considered a python module as it does not contain a `__init__.py` file and does
not appear as a python module in the root `pyproject.toml`

### Running dagster with sqlmesh locally

This is mostly for the OSO team as most people should not need to run sqlmesh on
the dagster UI in a local fashion. It should be enough for anyone looking to add
models to run sqlmesh on it's own. The only reason to run sqlmesh locally is to
ensure that the dagster-sqlmesh integration is working as expected with our
particular pipeline.

Some environment variables need to be set in your `.env`:

```bash
# While not strictly necessary, you likely want the sqlmesh dagster asset
# caching enabled so restarting doesn't take so long.
DAGSTER_ASSET_CACHE_ENABLED=1
DAGSTER_ASSET_CACHE_DIR=/path/to/some/cache/dir # change this
# You can set this number to anything reasonable for your testing use case
DAGSTER_ASSET_CACHE_DEFAULT_TTL_SECONDS=3600
# `local` uses duckdb
# `local-trino` uses a locally deployed trino
# Suggestion is to use `local` as it's faster. This doc assumes duckdb.
DAGSTER_SQLMESH_GATEWAY=local
SQLMESH_TESTING_ENABLED=1
OSO_ENABLE_JSON_LOGS=0
```

Then you should run the sqlmesh local test setup to get your local sqlmesh
duckdb initialized with oso local seed data.

```bash
uv run oso local sqlmesh-test --duckdb
```

Now it should be possible run sqlmesh and dagster locally. When materializing
sqlmesh assets, it might complain about some out of date dependencies. Since we
ran the local test setup, the data it's depending on should have been added by
the oso local seed setup.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ dependencies = [
"kr8s==0.20.9",
"structlog>=25.4.0",
"pandas-gbq>=0.29.2",
"dagster-sqlmesh>=0.19.0",
"dagster-sqlmesh>=0.20.0",
"oso-core",
"pyoso",
"metrics-service"
Expand Down
9 changes: 5 additions & 4 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 5 additions & 2 deletions warehouse/oso_dagster/assets/sqlmesh/sqlmesh.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,8 +194,10 @@ def run_sqlmesh(
config.allow_destructive_models
)

# If we specify a dev_environment, we will first plan it for safety
if dev_environment:
# If we specify a dev_environment, we will first plan it for
# safety. Restatements are ignored as they may end up duplicating
# work based on how restatement in planning works.
if dev_environment and not config.restate_models:
context.log.info("Planning dev environment")
all(
sqlmesh.run(
Expand All @@ -206,6 +208,7 @@ def run_sqlmesh(
end=config.end,
restate_models=restate_models,
skip_run=True,
materializations_enabled=False,
)
)

Expand Down