-
Notifications
You must be signed in to change notification settings - Fork 132
better fix for sys_id in delta, add tests #1413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideThis PR streamlines delta handling by filtering out system columns at read time, removing a redundant system-column regeneration step, propagating delta parameters through storage reads, and adding functional tests to ensure correct delta replay behavior. Sequence diagram for delta read with system column filteringsequenceDiagram
participant "read_dataset()"
participant "SignalSchema"
participant "DataChain"
participant "read_storage()"
"read_dataset()"->>"SignalSchema": deserialize or from_column_types
"read_dataset()"->>"SignalSchema": clone_without_sys_signals (if delta)
"read_dataset()"->>"DataChain": create with filtered signals_schema
"DataChain"->>"read_storage()": propagate delta parameters
Entity relationship diagram for signals schema changes in delta readserDiagram
SIGNAL_SCHEMA {
id int
name string
type string
is_system bool
}
DATA_CHAIN {
id int
signals_schema_id int
}
DATA_CHAIN ||--o| SIGNAL_SCHEMA : uses
%% When delta is enabled, DATA_CHAIN uses SIGNAL_SCHEMA with is_system = false
Class diagram for delta handling and system column changesclassDiagram
class DataChain {
+clone()
+_query: Query
+signals_schema: SignalSchema
}
class SignalSchema {
+deserialize()
+from_column_types()
+mutate()
+clone_without_sys_signals()
}
class read_dataset {
+delta: bool
+signals_schema: SignalSchema
}
class read_storage {
+delta: bool
+delta_on
+delta_result_on
+delta_compare
+delta_retry
+delta_unsafe
}
DataChain --> SignalSchema
read_dataset --> SignalSchema
read_storage --> DataChain
read_storage --> read_dataset
%% Removed class
class _RegenerateSystemColumnsStep {
-catalog: Catalog
-hash_inputs()
-apply()
}
%% Indicate removal
_RegenerateSystemColumnsStep --|> Step
%% Mark as removed
class _RegenerateSystemColumnsStep {
<<removed>>
}
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1413 +/- ##
==========================================
- Coverage 87.80% 87.79% -0.02%
==========================================
Files 160 160
Lines 15207 15192 -15
Branches 2178 2178
==========================================
- Hits 13353 13338 -15
Misses 1350 1350
Partials 504 504
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
af8762a to
9e7340b
Compare
Deploying datachain-documentation with
|
| Latest commit: |
9e7340b
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://d03748bf.datachain-documentation.pages.dev |
| Branch Preview URL: | https://fix-sys-id-delta-2.datachain-documentation.pages.dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `tests/func/test_delta.py:338-343` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
Continuation of https://github.com/iterative/datachain/pull/1412/files
Since we use union in delta, we essentially properly set schema in read_dataset to not include sys columns
Summary by Sourcery
Simplify and correct delta processing by filtering out system columns at read time, removing a redundant regeneration step, propagating delta parameters in storage reads, and adding tests to validate proper behavior
Bug Fixes:
Enhancements:
Tests:
Summary by Sourcery
Improve delta handling by excluding system columns at read time, removing the hacky regeneration step, consolidating delta logic in storage reads, and expanding functional tests
Bug Fixes:
Enhancements:
Tests: