feat: Introduce functionality to read and write typed parquets #66

borchero · 2025-06-17T19:56:01Z

Motivation

In #57, we added support for serializing schemas to JSON. This PR now adds utility functions to serialize the schema as parquet metadata and leverage that schema when reading files.

Changes

Add {read,write,scan,sink}_parquet methods to the Schema class

codecov · 2025-06-17T19:57:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (74eebee) to head (02b6c16).

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #66   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           41        41           
  Lines         2216      2255   +39     
=========================================
+ Hits          2216      2255   +39

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dataframely/schema.py

tests/schema/test_read_write_parquet.py

AndreasAlbertQC

Super nice and clear, thanks @borchero !

delsner

Nice!

dataframely/schema.py

borchero · 2025-07-03T22:02:30Z

Sorry, I didn't get to work on this again but I still meant to add proper support for partitioned datasets which also introduce some additional challenges at read-time 👀 I'll mark this as draft for now

borchero added 17 commits June 14, 2025 16:09

feat: Allow semantic comparison of columns and schemas

3457c12

Update

5753ce5

Fix pch

b35c959

Update

bc8179e

Fix pch

6484091

Consistency

d52626b

Fix

3dd3737

Fix

6908e7a

Fix nested types

ea559ba

Fix pch

673f3f0

feat: Introduce (de-)serialization for schemas

c17dcfd

Rename

74bf2e5

Review

fe5d67b

Merge branch 'main' into serialization

5873a7d

feat: Introduce functionality to read and write typed parquets

0f808dc

Merge branch 'serialization' into read-write

63b8da0

Merge branch 'main' into read-write

5c4b527

borchero self-assigned this Jun 17, 2025

borchero requested review from AndreasAlbertQC and delsner as code owners June 17, 2025 19:56

github-actions bot added the enhancement New feature or request label Jun 17, 2025

AndreasAlbertQC reviewed Jun 24, 2025

View reviewed changes

dataframely/schema.py Show resolved Hide resolved

dataframely/schema.py Outdated Show resolved Hide resolved

tests/schema/test_read_write_parquet.py Show resolved Hide resolved

AndreasAlbertQC reviewed Jun 24, 2025

View reviewed changes

delsner approved these changes Jul 1, 2025

View reviewed changes

dataframely/schema.py Outdated Show resolved Hide resolved

borchero marked this pull request as draft July 3, 2025 22:01

Review

09437ca

borchero marked this pull request as ready for review July 6, 2025 10:27

borchero requested a review from AndreasAlbertQC July 6, 2025 10:27

Merge branch 'main' into read-write

94d996e

borchero mentioned this pull request Jul 6, 2025

feat: Allow to serialize collections #74

Merged

borchero force-pushed the read-write branch from 2a1d33a to 94d996e Compare July 6, 2025 13:44

borchero added 2 commits July 8, 2025 13:53

Update

6773c74

Merge branch 'main' into read-write

75f3cad

AndreasAlbertQC approved these changes Jul 8, 2025

View reviewed changes

borchero added 2 commits July 8, 2025 16:16

Review

8a228e3

Merge branch 'main' into read-write

02b6c16

borchero enabled auto-merge (squash) July 8, 2025 14:17

borchero merged commit f3cb607 into main Jul 8, 2025
18 checks passed

borchero deleted the read-write branch July 8, 2025 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Introduce functionality to read and write typed parquets #66

feat: Introduce functionality to read and write typed parquets #66

Uh oh!

borchero commented Jun 17, 2025

Uh oh!

codecov bot commented Jun 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AndreasAlbertQC left a comment

Uh oh!

delsner left a comment

Uh oh!

Uh oh!

borchero commented Jul 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Introduce functionality to read and write typed parquets #66

feat: Introduce functionality to read and write typed parquets #66

Uh oh!

Conversation

borchero commented Jun 17, 2025

Motivation

Changes

Uh oh!

codecov bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AndreasAlbertQC left a comment

Choose a reason for hiding this comment

Uh oh!

delsner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

borchero commented Jul 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Jun 17, 2025 •

edited

Loading