-
Notifications
You must be signed in to change notification settings - Fork 76
Compare DataFrames #1556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Compare DataFrames #1556
Conversation
…es, there is no difference in the logic
| import org.jetbrains.kotlinx.dataframe.api.emptyDataFrame | ||
| import org.jetbrains.kotlinx.dataframe.nrow | ||
|
|
||
| internal class ComparisonDescription( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data schemas are created with @DataSchema, not with : DataRowSchema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing! I added @DataSchema , however with the current implementation : DataRowSchema
is still necessary because of lines 41-42..
FIXES #658
It makes possible to compare DataFrame by exploiting Myers difference algotithm whose cost is O((M+N)*D) .
M is length of dfA, N is length of dfB, D is length of shortest edit script to get B from A.
Returns a DataFrame< ComparisonDescription >,
ComparisonDescription is a schema created specifically for this use case.
It comes with a proper test case.
About Myers difference algotithm:
https://neil.fraser.name/writing/diff/myers.pdf