Skip to content

Conversation

@borchero
Copy link
Member

Motivation

When serializing data frames to parquet, schema information beyond the simple polars schema is currently lost. However, persisting the schema alongside has multiple potential benefits:

  • When reading the parquet file, schema information can be used to check whether the data adheres to the current schema without having to validate.
  • Third-party tools can use advanced schema information to optimize operations

Since polars 1.30, the parquet writer supports writing file-level metadata. Naturally, dataframely schemas could be serialized as such.

This PR is the first step towards doing that: it introduces (de-)serialization for schemas from & to JSON.

Changes

  • Introduce Schema.serialize and dy.deserialize_schema to serialize and de-serialize schemas to and from JSON

@borchero borchero self-assigned this Jun 15, 2025
@codecov
Copy link

codecov bot commented Jun 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (18481c1) to head (5873a7d).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main       #57    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           39        41     +2     
  Lines         1954      2148   +194     
==========================================
+ Hits          1954      2148   +194     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@AndreasAlbertQC AndreasAlbertQC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Base automatically changed from equality to main June 17, 2025 16:46
@github-actions github-actions bot added the enhancement New feature or request label Jun 17, 2025
@borchero borchero enabled auto-merge (squash) June 17, 2025 17:21
@borchero borchero merged commit e0c30b4 into main Jun 17, 2025
18 checks passed
@borchero borchero deleted the serialization branch June 17, 2025 17:23
@borchero borchero restored the serialization branch June 17, 2025 19:54
@borchero borchero deleted the serialization branch June 17, 2025 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants