Update github workflow setup by sh-rp · Pull Request #2728 · dlt-hub/dlt

sh-rp · 2025-06-06T12:48:19Z

Description

We need to be more economical with our CI minutes. Proposed changes are:

Only run common tests tests if linter passes
Only run destination tests if common tests pass
Try to convert remote and possible local destinations into a matrix job so there is less margin of error

TODO before finalizing

* Remove -m "smoke" flag which was only added to speed up the test process
* Re-enable commented out tests in common
* Figure out correct concurrency settings.

Review checklist

Are all destinations tested and are the correct tests selected in the output?
Is the order of execution of the jobs correct? (first lint, then common tests, then all the other stuff)
Is the setup clear on first read? If not, explain what is confusing so I we can add comments

netlify · 2025-06-06T12:48:23Z

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`f0cffee`
🔍 Latest deploy log	https://app.netlify.com/projects/dlt-hub-docs/deploys/68496b2b6a0b850008ef36f5

sh-rp · 2025-06-10T10:18:42Z

.github/workflows/test_destinations_local.yml

+            extras: "postgres postgis parquet duckdb cli filesystem"
+            needs_postgres: true
+
+          # Clickhouse OSS (TODO: test with minio s3)


clickhouse oss is disabled for now, it would be good to make it work without any remote services, for example test it with s3 using minio

hmmmm why? it is only using local filesystem. I'm running tests locally on it with network disconnected. we have compose file for it:

services: clickhouse: image: clickhouse/clickhouse-server ports: - "9000:9000" - "8123:8123" environment: - CLICKHOUSE_DB=dlt_data - CLICKHOUSE_USER=loader - CLICKHOUSE_PASSWORD=loader - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 volumes: - clickhouse_data:/var/lib/clickhouse/ - clickhouse_logs:/var/log/clickhouse-server/ restart: unless-stopped healthcheck: test: [ "CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8123/ping" ] interval: 3s timeout: 5s retries: 5 volumes: clickhouse_data: clickhouse_logs:

I think it uses the local clickhouse but depends on s3, az etc. are you sure you can run the full test suite without network connection? I will uncomment it again and have another look tomorrow, but I think it will not work without secrets.

sh-rp · 2025-06-10T10:19:57Z

.github/workflows/test_destinations_remote.yml

+            - name: postgres and redshift
+              # TODO: check wether duckdb and all filesystem drivers need to be here, if so, explain why
+              destinations: "[\"redshift\", \"postgres\", \"duckdb\", \"dummy\"]"
+              filesystem_drivers: "[\"memory\", \"file\", \"r2\", \"s3\", \"gs\", \"az\", \"abfss\", \"gdrive\"]" # excludes sftp


I don't think we need all these drivers here, I took this from test_destinations.yml, so it is a pretty ancient part of this setup and I was not sure if there are any combinations of the above destinations that need these bucket settings, but afaik filesystem_drivers only applies to filesystem tests, right?

those are left there after I extracted filesystem to a separate workflow. I think you can leave memory and file. let's see what happens

sh-rp · 2025-06-10T10:20:26Z

.github/workflows/test_destinations_remote.yml

+              extras: "postgres redshift postgis s3 gs az parquet duckdb"
+              with: ",adbc"
+
+            # Qdrant (disabled, TODO: explain why)


this workflow currently is disabled in the repo. I'd like to add a note here why exactly (because I don't know)

I think we lost our test account. we only test local version now

rudolfix

thanks for doing this!

rudolfix · 2025-06-10T15:47:18Z

.github/workflows/main.yml

+    branches: [ master, devel ]
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}


so this workflow will cancel all the workflows it started?

yes it will, it's pretty cool!

rudolfix · 2025-06-10T15:49:55Z

.github/workflows/main.yml

+  authorize_run_from_fork:
+    name: check if fork and if so wether secrets are available
+    # run when label is assigned OR when we are not a fork
+    if: ${{ github.event.label.name == 'ci from fork' || (github.event.pull_request.head.repo.full_name == github.repository && (github.event.action == 'opened' || github.event.action == 'synchronize'))}}


now the pull_request_target is deactivated so this authorization is defunct (github.event.pull_request.head.repo.full_name == github.repository as always true)

we can enable PRs from forks in a separate ticket

rudolfix · 2025-06-10T15:50:44Z

.github/workflows/main.yml

+  # Destination and Sources local tests, do not provide secrets
+  # Other tests that do not require remote connections
+  #
+  test_destinations_local:


those run in parallel right?

yes, all tests that have their conditions met start immediately, so you have a bunch of tests running in parallel which also include matrixes.

rudolfix · 2025-06-10T15:51:48Z

.github/workflows/main.yml

+    needs: test_common
+    uses: ./.github/workflows/test_sources_local.yml
+
+  test_plus:


maybe those 3 workflows should only depend on lint?

I changed it

rudolfix · 2025-06-10T15:55:57Z

.github/workflows/test_destinations_local.yml

+            extras: "postgres postgis parquet duckdb cli filesystem"
+            needs_postgres: true
+
+          # Clickhouse OSS (TODO: test with minio s3)


hmmmm why? it is only using local filesystem. I'm running tests locally on it with network disconnected. we have compose file for it:

services: clickhouse: image: clickhouse/clickhouse-server ports: - "9000:9000" - "8123:8123" environment: - CLICKHOUSE_DB=dlt_data - CLICKHOUSE_USER=loader - CLICKHOUSE_PASSWORD=loader - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1 volumes: - clickhouse_data:/var/lib/clickhouse/ - clickhouse_logs:/var/log/clickhouse-server/ restart: unless-stopped healthcheck: test: [ "CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8123/ping" ] interval: 3s timeout: 5s retries: 5 volumes: clickhouse_data: clickhouse_logs:

rudolfix · 2025-06-10T16:08:29Z

.github/workflows/test_destinations_remote.yml

+            - name: postgres and redshift
+              # TODO: check wether duckdb and all filesystem drivers need to be here, if so, explain why
+              destinations: "[\"redshift\", \"postgres\", \"duckdb\", \"dummy\"]"
+              filesystem_drivers: "[\"memory\", \"file\", \"r2\", \"s3\", \"gs\", \"az\", \"abfss\", \"gdrive\"]" # excludes sftp


those are left there after I extracted filesystem to a separate workflow. I think you can leave memory and file. let's see what happens

rudolfix · 2025-06-10T16:11:53Z

.github/workflows/test_destinations_local.yml

+            post_install_commands: "poetry run pip install sqlalchemy==2.0.18" # minimum version required by `pyiceberg`
+            additional_tests: "poetry run pytest tests/cli"
+
+          - name: postgres, duckdb


we were also running dummy destination

not in the local tests, but I moved it here now from the remote tests where it also was before (in test_destinations.yml)

rudolfix · 2025-06-10T16:14:55Z

.github/workflows/test_destinations_remote.yml

+              filesystem_drivers: "[\"memory\"]"
+              extras: "motherduck s3 gs az parquet"
+
+            # MSSQL


mssql was always running a full suite. this one is fast. we should do the same for remote postgres

rudolfix · 2025-06-10T16:15:12Z

.github/workflows/test_destinations_remote.yml

+              extras: "postgres redshift postgis s3 gs az parquet duckdb"
+              with: ",adbc"
+
+            # Qdrant (disabled, TODO: explain why)


I think we lost our test account. we only test local version now

rudolfix · 2025-06-10T16:18:54Z

.github/workflows/test_destinations_local.yml

-          poetry run pytest tests/load --ignore tests/load/sources
-          poetry run pytest tests/cli
-        env:
-          DESTINATION__POSTGRES__CREDENTIALS: postgresql://loader:loader@localhost:5432/dlt_data


where did those go?

to this new file, see comment above. there is one step in the workflow that copies these back in.

always run all mssql and postgres tests

…ing destinations

* use both pull request and pull request target on destination workflows * remove additional triggers * marks one test as smoke test and only runs this for the time being * only run one test in common, needs to be reverted later * run common tests only on linter success * fix common workflow * only start workflows on call (do not call them yet) * test master workflow * remove docs changes step from lint * remove local destinations docs change * rename master trigger workflows * change concurrency key * try other dependencies * add destination tests with authorize step * remove authorize and docs step from destination tests * fix destination test * rename main workflow * test inherit secrets * add more workflows to main file * fix starting conditions for some workflows * rename plus tests matrix job * remove concurrency settings for now * add first remote destinations workflow version * move some more remote destinations * remove pytest args * try to fix extras string * add more remote destination tests * rename some workflows and add concurrency settings to main workflow * move test_destinations * fix link to called workflow * add better main workflow labels move clickhouse remote tests * create local destinations test * disabled some workflows * disable clickhouse oss for now split duckdb and postgres local tests into own matrix job * copy ssh agent key * move all local destination secrets into template secrets file * small fixes * enable all tests again * fix local tests * add missing openai dep * try to fix qdrant creds * fix qdrant server / local file differentiation * fix cli test * change workflow dependencies * remove telemetry info and other small changes * run dummy destination with the local tests * remove duckdb from remote tests, always run all mssql and postgres tests * enable clickhouse oss * fix condition for always running all tests * move cli commands to postgres tests * rename clickhouse-compose to be inline with other services * fix clickhouse local credentials and disable tests which require staging destinations * adapt postgres to postgres example to new fixture * fix clickhouse excluded configs * update essential test handling * skip gcs compat test for local clickhouse tests

use both pull request and pull request target on destination workflows

5c1212c

sh-rp force-pushed the tests/change_workflow_targets branch from 71d42be to 5c1212c Compare June 6, 2025 14:01

sh-rp added 2 commits June 6, 2025 16:14

remove additional triggers

41a0514

marks one test as smoke test and only runs this for the time being

9095cf0

sh-rp changed the title ~~change trigger of workflow files to "pull_request"~~ Update github workflow setup Jun 6, 2025

sh-rp added 4 commits June 6, 2025 16:46

only run one test in common, needs to be reverted later

2851f10

run common tests only on linter success

3e62cf9

fix common workflow

20481a2

only start workflows on call (do not call them yet)

7f4c04e

sh-rp force-pushed the tests/change_workflow_targets branch from 227c99c to 7f4c04e Compare June 6, 2025 15:58

sh-rp added 5 commits June 6, 2025 18:10

test master workflow

34d09bf

remove docs changes step from lint

06b3be0

remove local destinations docs change

5a2a47f

rename master trigger workflows

2c53ada

change concurrency key

85b9880

sh-rp force-pushed the tests/change_workflow_targets branch from a7a3aea to 85b9880 Compare June 6, 2025 16:42

sh-rp added 2 commits June 6, 2025 18:57

try other dependencies

0e6068c

add destination tests with authorize step

ca30439

sh-rp force-pushed the tests/change_workflow_targets branch from 322fee4 to ca30439 Compare June 6, 2025 17:09

remove authorize and docs step from destination tests

2e31b94

sh-rp force-pushed the tests/change_workflow_targets branch from 20e02e3 to 2e31b94 Compare June 6, 2025 17:15

sh-rp added 8 commits June 6, 2025 23:30

fix destination test

1710687

rename main workflow

3c565ea

test inherit secrets

b01cb5b

add more workflows to main file

7b7a1bb

fix starting conditions for some workflows

dd0208c

rename plus tests matrix job

cfcf305

remove concurrency settings for now

a886b05

add first remote destinations workflow version

e38d026

sh-rp commented Jun 10, 2025

View reviewed changes

sh-rp added 4 commits June 10, 2025 14:10

fix local tests

2b40e7d

add missing openai dep

afd246f

try to fix qdrant creds

8abaa17

fix qdrant server / local file differentiation

83978ba

sh-rp force-pushed the tests/change_workflow_targets branch from fead3e6 to 83978ba Compare June 10, 2025 14:12

sh-rp marked this pull request as ready for review June 10, 2025 15:19

rudolfix requested changes Jun 10, 2025

View reviewed changes

sh-rp added 13 commits June 10, 2025 18:32

fix cli test

a17cb8f

change workflow dependencies

794b0a0

remove telemetry info and other small changes

6056ad5

run dummy destination with the local tests

ad75252

remove duckdb from remote tests,

f21c342

always run all mssql and postgres tests

enable clickhouse oss

2d26e45

fix condition for always running all tests

7e1c04d

move cli commands to postgres tests

653ed64

rename clickhouse-compose to be inline with other services

3364064

fix clickhouse local credentials and disable tests which require stag…

89992cc

…ing destinations

adapt postgres to postgres example to new fixture

adf2b1a

fix clickhouse excluded configs

1ca62f6

update essential test handling

1149593

sh-rp force-pushed the tests/change_workflow_targets branch from 4aa0388 to 1149593 Compare June 11, 2025 08:47

skip gcs compat test for local clickhouse tests

f0cffee

sh-rp force-pushed the tests/change_workflow_targets branch from e86fb4a to f0cffee Compare June 11, 2025 11:40

sh-rp merged commit 36ee706 into devel Jun 11, 2025
97 of 102 checks passed

sh-rp deleted the tests/change_workflow_targets branch June 11, 2025 13:09

Comments

Conversation

sh-rp commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

netlify bot commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for dlt-hub-docs canceled.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sh-rp Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rudolfix left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sh-rp commented Jun 6, 2025 •

edited

Loading

netlify bot commented Jun 6, 2025 •

edited

Loading

sh-rp Jun 10, 2025 •

edited

Loading