Skip to content

Comments

fixes leaking datasets tests#2730

Merged
rudolfix merged 26 commits intodevelfrom
chores/fixes-leaking-datasets-tests
Jun 11, 2025
Merged

fixes leaking datasets tests#2730
rudolfix merged 26 commits intodevelfrom
chores/fixes-leaking-datasets-tests

Conversation

@rudolfix
Copy link
Collaborator

@rudolfix rudolfix commented Jun 7, 2025

Description

  1. stores references to all active pipelines (on demand) and drops them after each test
  2. properly mocks local_dir in tests so local files are automatically in _storage (no more *duckdb) databases after tests
  3. removed concurrent access to resolved config traces (parallel tests were sometimes failing)
  4. improves the code that opens duckdb and moterduck connection (opened connections were leaked when setting configs on connection failed etc.)
  5. finally added a full set of settings for duckdb connection, added a way to move connection to ibis with settings applied

more in commits

@netlify
Copy link

netlify bot commented Jun 7, 2025

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 62dfaa7
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/684985508ded210008b044fe
😎 Deploy Preview https://deploy-preview-2730--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@rudolfix rudolfix force-pushed the chores/fixes-leaking-datasets-tests branch from 549c5bb to 19b1139 Compare June 8, 2025 00:04
@rudolfix rudolfix marked this pull request as ready for review June 9, 2025 16:34
@rudolfix rudolfix requested a review from sh-rp June 9, 2025 16:34
@rudolfix rudolfix mentioned this pull request Jun 10, 2025
# TODO: we need to frontload the httpfs extension for abfss for some reason
if self.is_abfss:
self._conn.sql("INSTALL https; LOAD httpfs;")
self._conn.sql("INSTALL httpfs; LOAD httpfs;")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they were aliases it seems!

@pytest.fixture(autouse=True)
def autouse_test_storage() -> FileStorage:
return clean_test_storage()
def autouse_test_storage(request) -> Optional[FileStorage]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is maybe out of scope, but all of these fixtures need good names and docstrings and should be added to the docstring linter, I often look up what they do exactly in code if I need to disable something for testing. This one is a good example, if you never looked it up the name autouse_test_storage gives no real hint about what it does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right! we'll dedicate a week to refactor our utils and release them as OSS lib. version in dlt+ has nice docstrings btw.

if "no_load" in request.keywords:
# always deactivate
Container()[PipelineContext].deactivate()
Container()[PipelineContext].clear_activation_history()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, maybe we want to keep the activation history here so we can clean up with drop_active_pipeline_data after a bunch of "no_load" tests have run without having to store the pipelines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because most of no_load tests produce empty or fake pipelines ie. with destinations that are not instantiated. so they were leaking into subsequent tests and breaking them on cleanup. I'll keep it util we have a good case

@dataclass
class WithLocalFiles(BaseConfiguration):
"""Mixin to BaseConfiguration that shifts relative locations into `local_dir` and allows for a few special locations.
:pipeline: in the pipeline working folder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was already like this, but why not "pipeline_working_dir"? This way it is clear what this variable means.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm you can add QoL ticket for that. I like :pipeline: because it is like :memory:

@rudolfix rudolfix merged commit f821d21 into devel Jun 11, 2025
53 of 55 checks passed
@rudolfix rudolfix deleted the chores/fixes-leaking-datasets-tests branch June 11, 2025 20:17
dat-a-man pushed a commit that referenced this pull request Jun 24, 2025
* adds optional pipeline activation history to context

* allows to configure configs and pragmas for duckdb, improves sql_client, tests

* allows query string for motherduck, tests WIP

* mocks local_dir correctly to place local files, drop duckdb in pipeline fixture in most places

* enables activation factory to drop datasets from all pipelines

* uses correct fixture scope in test read interfaces

* bumps duckdb and pyarrow

* ignores some flake8 errors

* logs resolved traces thread-wise, clears log between pipeline runs

* improves duckdb tests and docs

* bumps arrow to v20 because duckdb 1.3 needs at least 19 for its types

* fixes tests - mostly duckdb database locations

* fixes lockfile

* fixes edge cases when passing setting to duckdb connection

* disables iceberg abfss tests

* refactors WithLocalFiles so they can be used independent from destination

* more local dir test fixes

* moves WithLocalFiles to common storages configuration

* tests edge cases when setting configs on duckdb fails

* updates docs

* reverts duckdb to 1.2.1 - last stable version

* more test fixes

* moves create_secret to duckdb sqlclient

* disables building of Dockerfile until we upgrade arrow

* skip gcs compat test for local clickhouse tests

---------

Co-authored-by: dave <shrps@posteo.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants