-
Notifications
You must be signed in to change notification settings - Fork 8
d2mini - d2ts without the multi-dimensional versioning #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
KyleAMathews
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
The mini as far as concepts is clear — I'm curious how much code this shaves off?
-456 lines from the code and -270 from the tests The largest changes are in the index and the join and reduce operators. |
|
I think we still want to explicitly tag updates with versions and send frontiers indicating stable versions. Versions are needed for operators to coordinate data that they receive from multiple inputs. For example, say that we are joining two tables. And we perform a transaction on the database that touches both tables. The transaction may execute several inserts/updates/deletes so the transaction results in multiple changes that will be streamed. We will want to send all those changes through the D2 graph tagged with the version that corresponds to the transaction in the DB (i.e. the LSN). Thus we need to version for 2 reasons: 1) such that we can batch updates in a given version, and 2) as a consistency mechanism to avoid combining data from different transactions. More concretely, for join, we want the join to only output data when a version is stable. Without versions and frontiers, join can't know to which version/transaction changes belong and it may break transactional guarantees. Say that we executed a transaction in the DB which results in several changes. Then join would output data for each change but we only want it to output the joined result after all changes of the transaction were processed. Otherwise users may observe weird intermediate states. |
* clone d2ts to d2mini with a few omissions * Stateless operators done * version-index done * join and consolidate done * reduce * count and distinct * topK * graph test * orderBy tests * filterBy * groupby * rename test files * rename some stuff * remove itterate from multiset * use hash * format * tidy * fix tests * update package.json * changeset * improvments to index * refactor index * hash return string - forward compat with 128bit hash * remove unused code
D2Mini is a minimal implementation of the D2TS dataflow graph library but simplified and without the complexities of multi-dimensional versioning.
The API is almost identical to D2TS, but without the need to specify a version when sending data, or to send a frontier to mark the end of a version.
Basic Usage
Here's a simple example that demonstrates the core concepts: