-
Notifications
You must be signed in to change notification settings - Fork 334
Add support pyspark.sql.classic.dataframe.DataFrame transformer #3272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Nelson Chen <[email protected]>
Signed-off-by: Nelson Chen <[email protected]>
|
Bito Automatic Review Skipped - Draft PR |
Signed-off-by: Nelson Chen <[email protected]>
Signed-off-by: Nelson Chen <[email protected]>
Signed-off-by: Nelson Chen <[email protected]>
|
Bito Automatic Review Skipped - Draft PR |
Signed-off-by: Nelson Chen <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## master #3272 +/- ##
==========================================
- Coverage 83.35% 75.63% -7.73%
==========================================
Files 347 215 -132
Lines 28791 22520 -6271
Branches 2960 2961 +1
==========================================
- Hits 23999 17033 -6966
- Misses 3956 4615 +659
- Partials 836 872 +36 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
In 4.0.0 do we have a new dataframe type? |
@kumare3 Yes, I add a new type of dataframe to support |
…eorg#3272) Signed-off-by: Nelson Chen <[email protected]> Signed-off-by: Atharva <[email protected]>
Tracking issue
#6478
Why are the changes needed?
There is a new type of dataframe when
pyspark>=4.0.0. It's not recognized by the current structured dataset. Therefore, the type transformer will fail to serialize and deserialize it.What changes were proposed in this pull request?
Add a new type of spark dataframe and register it.
How was this patch tested?
Github CI will fail if we don't add the new type of dataframe and pyspark>=4.0.0.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link
Summary by Bito
This pull request adds support for the new `pyspark.sql.classic.dataframe.DataFrame` type for compatibility with `pyspark` version 4.0.0 and above. It includes new classes for reading and writing this DataFrame type, along with serialization and deserialization handlers. Plugin requirements have also been updated accordingly.