You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The author of the NamedTuples feature, published hi example of usages of named tuples.
https://github.com/bishabosha/scalar-2025
One of which is indeed, the obvious "dataframe" use-case. I read through it quickly, and here are my notes as a comparison of the tradeoffs / differences to this library.
Internal Representation
In the scala-2025 design, the dataframe is represented as a collection of columns.
class DataFrame[T](
private val cols: IArray[Col[?]],
val len: Int,
private val data: IArray[AnyRef]
)
Whereas Scautable has an Iterable[K] with K some known Named Tuple type.
Said differently, a Single (untyped!) array backed datastore . As I understand it, this representation offers some notable advantages in performance potential, particularly for numeric computations.
Tradeoffs
Scautable has committed to the Row representation, This has the advantage of potential laziness - we don't need to read everything before starting to define transformations, it should also make implementation simpler - as we know the internal representation of the types we're working with.
Most importantly, this choice massively reduces the scope of the project by being standard library pluggable. By contrast, scalar-2025 defines (for example) its own GroupBy trait. Scautable gets that free to the user straight out of the standard library.
Given the lack of time and resources available to this project, plugging straight into the stdlib is a fundamental advantage, and I think we should clearly prefer a lower implementation burden over potential future optimisation at this stage.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The author of the
NamedTuples
feature, published hi example of usages of named tuples.https://github.com/bishabosha/scalar-2025
One of which is indeed, the obvious "dataframe" use-case. I read through it quickly, and here are my notes as a comparison of the tradeoffs / differences to this library.
Internal Representation
In the scala-2025 design, the dataframe is represented as a collection of columns.
Whereas Scautable has an
Iterable[K]
withK
some known Named Tuple type.Said differently, a Single (untyped!) array backed datastore . As I understand it, this representation offers some notable advantages in performance potential, particularly for numeric computations.
Tradeoffs
Scautable has committed to the Row representation, This has the advantage of potential laziness - we don't need to read everything before starting to define transformations, it should also make implementation simpler - as we know the internal representation of the types we're working with.
Most importantly, this choice massively reduces the scope of the project by being standard library pluggable. By contrast, scalar-2025 defines (for example) its own
GroupBy
trait. Scautable gets that free to the user straight out of the standard library.Given the lack of time and resources available to this project, plugging straight into the stdlib is a fundamental advantage, and I think we should clearly prefer a lower implementation burden over potential future optimisation at this stage.
Beta Was this translation helpful? Give feedback.
All reactions