-
Notifications
You must be signed in to change notification settings - Fork 131
Renaming Internal Structs #1059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Renaming Internal Structs #1059
Conversation
src/expr.rs
Outdated
// Define the new RawExpr struct and implement Debug trait | ||
#[derive(Debug, Clone)] | ||
pub struct RawExpr { | ||
pub expr: Expr, | ||
} | ||
|
||
// Implement conversion from RawExpr to Expr | ||
impl From<RawExpr> for Expr { | ||
fn from(raw_expr: RawExpr) -> Expr { | ||
raw_expr.expr | ||
} | ||
} | ||
|
||
/// A PyExpr that can be used on a DataFrame | ||
#[pyclass(name = "Expr", module = "datafusion.expr", subclass)] | ||
#[derive(Debug, Clone)] | ||
pub struct PyExpr { | ||
pub expr: Expr, | ||
pub raw_expr: RawExpr, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need another wrapper struct. Rather, I think you need to change the pyo3 class name to RawExpr
.
#[pyclass(name = "RawExpr", module = "datafusion.expr", subclass)]
This will probably require updates to the CI script that checks for all classes that require exporting.
You should also be able to test the debug error by going into any of the methods in functions.py
and remove the .expr
to get the inner object and see the error generated is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, got it. I'll revert this and proceed as you suggested. I'll start with changing RawExpr
and the corresponding CI tests and get that reviewed again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I am getting this error:
Traceback (most recent call last):
File "/home/user/datafusion-python/rename.py", line 3, in <module>
result = f.isnan(col("a"))
File "/home/user/datafusion-python/python/datafusion/functions.py", line 284, in isnan
return Expr(f.isnan(expr))
TypeError: argument 'num': 'Expr' object cannot be converted to 'RawExpr'
When I try to run:
from datafusion import col, functions as f
result = f.isnan(col("a"))
print(result)
27ef6d3
to
80611e6
Compare
I'll be renaming all the classes in Line 77 in 9d634de
I hope that's correct. The comment above it states this for the classes in |
You'll want to get the submodules as well. Also, this can be done piecemeal if doing the entire thing in one go is too large a PR. The CI failure is the one I expected. We will need to update the check for export coverage in the wrappers to have this specific naming convention. Ideally, we would even require the |
Sure. I'll fix that CI test right away. About breaking this into several PRs, I agree. There are small changes required in different places for each renaming. One single PR would make things a lot confusing in case some future dev tries to understand what all changes were made. |
python/datafusion/expr.py
Outdated
Partitioning = expr_internal.Partitioning | ||
Placeholder = expr_internal.Placeholder | ||
Projection = expr_internal.Projection | ||
RawExpr = expr_internal.RawExpr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this in the wrapper. Instead we want to update the CI test to identify that RawExpr
is properly covered by the Expr
class in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something along these lines?
# Skip internal context and RawExpr (which is handled by Expr class)
if attr in ["_global_ctx", "RawExpr"]:
continue
# Check if RawExpr functionality is covered by Expr class
if attr == "RawExpr" and hasattr(wrapped_obj, "Expr"):
expr_class = getattr(wrapped_obj, "Expr")
assert hasattr(expr_class, "raw_expr"), "Expr class must provide raw_expr property"
continue
# Skip internal context and RawExpr (which is handled by Expr class) | ||
if attr in ["_global_ctx", "RawExpr"]: | ||
continue | ||
|
||
# Check if RawExpr functionality is covered by Expr class | ||
if attr == "RawExpr" and hasattr(wrapped_obj, "Expr"): | ||
expr_class = getattr(wrapped_obj, "Expr") | ||
assert hasattr(expr_class, "raw_expr"), "Expr class must provide raw_expr property" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of special casing it like this, we should be able to write a more general solution that checks all class names. For the internal representation they can begin with Raw
and be converted to their non-raw counterparts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I am sorry. I thought that I could include that after renaming other classes. I am now simply extracting the base_class
name from the Raw
attr name via string slicing and then use it to check if the base_class
has such an attribute:
if attr in ["_global_ctx"]:
continue
# Check if Raw* classes have corresponding wrapper classes
if attr.startswith("Raw"):
base_class = attr[3:] # Remove "Raw" prefix
assert hasattr(wrapped_obj, base_class), f"Missing implementation for {attr}"
continue
Sorry for being this clumsy. Thanks a lot for your patience.
python/datafusion/expr.py
Outdated
|
||
__all__ = [ | ||
"Expr", | ||
"RawExpr", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing this caused the test to fail. I'll add this again and run the tests locally again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its removal is causing the tests to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added a line in the test:
if isinstance(internal_attr, list):
assert isinstance(wrapped_attr, list)
for val in internal_attr:
if isinstance(val, str) and val.startswith("Raw"): #<---- added this line since we check for all Raw* classes in the previous part of the code
continue
assert val in wrapped_attr
python/datafusion/expr.py
Outdated
self.expr = expr | ||
|
||
@property | ||
def raw_expr(self) -> expr_internal.RawExpr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this.
if attr in ["_global_ctx"]: | ||
continue | ||
assert attr in dir(wrapped_obj) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you apply ruff it should correct these whitespace errors. I recommend turning on pre-commit in your local workspace to catch them before they get to CI.
It's looking good. As long as CI passes, I plan to merge. |
Now the
|
if isinstance(val, str) and val.startswith("Raw"): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need an assert here to make certain this exposed class is in the list of the wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's being covered in the previous part of the test. Here:
elif attr.startswith("Raw"):
base_class = attr[3:] # Remove "Raw" prefix
assert hasattr(wrapped_obj, base_class)
Though I think I should change the variable name from base_class
to wrapper_class
. Perhaps I should move the aforementioned part into the isinstance
conditionional block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous part catches that the class is exposed, but this portion of the test was, if my memory is correct, to ensure that everything that is in __all__
in the internal is also in the wrapper. Possibly I need to document all of the reasons for the portions of that test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'RawExpr' isn't in __all__
as you had suggested not to in the comment here. So now the isinstance
block only checks for the non-raw
classes, essentially the wrapper classes like Expr
.
Possibly I need to document all of the reasons for the portions of that test.
That would be helpful. I've added comments next to the new sections as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more cohesive and well-structured, I have already ran the Ci test for this locally and it works, will push it if you approve this:
for attr in dir(internal_obj):
if attr in ["_global_ctx"] or attr.startswith("Raw"): #<---Skip Raw classes for now
continue
[...]
if isinstance(internal_attr, list):
assert isinstance(wrapped_attr, list)
for val in internal_attr:
if isinstance(val, str) and val.startswith("Raw"): #<---check for Raw classes and their corresponding wrapper classes here
# Check if Raw* classes have corresponding wrapper classes
wrapper_class = val[3:] # Remove "Raw" prefix
assert hasattr(wrapped_obj, wrapper_class), f"Missing wrapper class: {wrapper_class} for Raw class: {val}"
continue
assert val in wrapped_attr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say that you tested it locally and it works, have you verified that it also fails when it is supposed to fail? I won't have time to review this again today, but I hope to get to it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I did not fail when I removed Expr
from the __all__
. So essentially, if some wrapper class is not included in __all__
even then its Raw
class shall pass the test. To prevent this, I added the assert wrapper_class in dir(internal_obj)
before the assert hasattr(wrapped_obj, wrapper_class), f"Missing wrapper class: {wrapper_class} for Raw class: {val}"
Now, if the wrapper_class
isn't included in the internal_obj
, then the corresponding Raw
shall fail:
FAILED python/tests/test_wrapper_coverage.py::test_datafusion_missing_exports - AssertionError: Missing wrapper class: Expr for Raw class: RawExpr
I won't have time to review this again today, but I hope to get to it tomorrow.
Sure. I'll see if I can find out any other gaps in my implementation till then.
…ped Raw classes are checked
Thank you for all the work on this. |
I wasn't of much help in the tests part. Thanks for your patience!! |
@timsaucer any suggestions on what other internal class to rename next? If I am correct, all the classes in Line 77 in 9d634de
|
Which issue does this PR close?
Closes #853
Rationale for this change
Developers were getting confusing debugging errors such as "Cannot convert Expr to Expr". Hence, the internal struct
Expr
was renamed toRawExpr
for better understanding.What changes are included in this PR?
The internal struct
Expr
was renamed toRawExpr
.Are there any user-facing changes?
None. It only changes the internal structs.