-
Notifications
You must be signed in to change notification settings - Fork 1
Vision
Right now the library is build against Spark 2.4.x. The next major version of Spark 3.x has been out for some time already. Version 4.x is on the horizon. It would be great to have the library updated to be sure to support Spark 3.x perhaps 4.x as well.
The ErrorHandling capability within spark-commons (or if extracted into its own library) is rather useful and generic enough to be incorporated into spark-data-standardization. That would make it easier to handle errors during the standardization process as the user requires, not forcing the current one behavior.
There are some known bugs, particularly in corner cases date/timestamp parsing, that would deserve to be fixed.
There are some types that are not really supported by spark-data-standardization. Namely IntervalType and NullType. Their full support would enhance the versatility of the library.
There's also BinaryType, that could be supported. Conversion from StringType could inlcude encoding metadata accepting url-encoding and Base64 or uuencode.
The complex types support can also be improved.
Some unsystematic and Absa specific transformations are currently implemented directly within the library. Moving them into plugins and UDFs would make the open source library more generic and extensible.