Skip to content

Migrate delta's golden table tests: snapshot tests and comparison script#51

Open
zachschuermann wants to merge 3 commits intodelta-incubator:mainfrom
zachschuermann:golden-tables
Open

Migrate delta's golden table tests: snapshot tests and comparison script#51
zachschuermann wants to merge 3 commits intodelta-incubator:mainfrom
zachschuermann:golden-tables

Conversation

@zachschuermann
Copy link

@zachschuermann zachschuermann commented Jul 23, 2024

Description

Adding delta's existing golden table tests to DAT. This PR introduces just a small first batch (the first 'snapshot' tests) and a comparison utility which was used to compare the tables produced by the new pyspark code with the tables produced by the old delta golden table code (and persisted in the delta repo).

The tests are first translated to pyspark, then can be tested by generating the tables and using the comparison utility to check that the latest snapshot matches (more advanced tests might take manual checking to confirm). Also ran these tests against delta-kernel-rs and so far all green :)

How was this patch tested?

# after adding the tests, generate tables (and expectations)
make write-generated-tables

# check against the old persisted delta tests (snapshot-vacuum example)
poetry run python util/compare.py out/reader_tests/generated/snapshot-vacuumed/delta <path-to-delta-repo>/connectors/golden-tables/src/main/resources/golden/snapshot-vacuumed

# manually replace the acceptance tests in delta-kernel-rs to run against the new tests
# with cwd delta-kernel-rs/acceptance:
cp -r <path-to-dat>/out/reader_tests/generated tests/dat/out/reader_tests/generated
cargo t

@zachschuermann zachschuermann changed the title [wip] Add golden table snapshot tests and comparison script Add golden table snapshot tests and comparison script Jul 23, 2024
@zachschuermann zachschuermann changed the title Add golden table snapshot tests and comparison script Migrate delta's golden table tests: snapshot tests and comparison script Jul 23, 2024
Copy link
Contributor

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, mostly lgtm modulo a few comments.

print(df2)

# Check schema compatibility (columns and types)
if df1.schema != df2.schema:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just use assertDataFrameEqual and assertSchemaEqual

.mode("overwrite") \
.save(table_path)

@reference_table(name="snapshot-data0", description="golden tables snapshot-data0 test")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know these are the names from the golden tests in delta. they are really not descriptive :) If you know a bit more what they are testing maybe we can rename them to add a little more color.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants