-
Notifications
You must be signed in to change notification settings - Fork 3
[DRAFT] User collections: integer range #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
__all__ = ["PyDoughUserGeneratedCollection"] | ||
|
||
|
||
class PyDoughUserGeneratedCollection(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this + range_collection.py
into the user_collections
module since they will be referenced by multiple different IRs, and they don't really "belong" to any of them (kinda like metadata).
def __eq__(self, other) -> bool: | ||
return isinstance(other, PyDoughUserGeneratedCollection) and repr(self) == repr( | ||
other | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just do self.equals(other)
and define an equals
abstractmethod for each user collection.
class RangeGeneratedCollection(PyDoughUserGeneratedCollection): | ||
"""Integer range-based collection.""" | ||
|
||
# HA_Q: should start/end/step be int or PyDoughQDAG? Why? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say int | None
since this class will transcend all the IRs
self.start = start | ||
self.end = end | ||
self.step = step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: store all of these as self._start
etc. and have @property
to access them.
Also, let's add self._range = range(start, end, step)
so the to_string
can be: RangeCollection({self.name}!r, {self.column_name}={self.range})
@abstractmethod | ||
def is_empty(self) -> bool: | ||
"""Check if the collection is empty.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to flip this to always_non_empty
, due to the usage
if term_name not in self.collection.columns: | ||
raise PyDoughQDAGException(self.name_mismatch_error(term_name)) | ||
|
||
return Reference(self._ancestor, term_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be Reference(self, term_name)
?
def get_expression_position(self, expr_name: str) -> int: | ||
raise PyDoughQDAGException(f"Cannot call get_expression_position on {self!r}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you can, use the ordering from the collection to figure out the order of the columns
Returns a string representation of the collection in a standalone form. | ||
This is used for debugging and logging purposes. | ||
""" | ||
return f"UserGeneratedCollection[{self.name}, {', '.join(self.collection.columns)}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just use a to_string for the collection
def to_string(self) -> str: | ||
# Stringify as "name(column_name) | ||
return f"{self.name}({', '.join(self.collection.columns)})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again lets use the to_string
for the collection to help out here
def tree_item_string(self) -> str: | ||
return f"UserGeneratedCollection[{self.name}: {', '.join(self.collection.columns)}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same point about the to_string
for the collection
207f36b
to
d0ce87f
Compare
pydough/conversion/hybrid_tree.py
Outdated
case HybridUserGeneratedCollection(): | ||
# User-generated collections are always guaranteed to exist. | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No they aren't (what if the range is empty?) This is why we need the always exists field for the generated colleciton.
pydough/conversion/hybrid_tree.py
Outdated
# HA TODO: confirm is that right? | ||
case HybridUserGeneratedCollection(): | ||
# User-generated collections are always guaranteed to be | ||
# singular. | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not. It is only singular if we can guarantee it has <=1 rows.
def visit_generated_table(self, generated_table: "GeneratedTable") -> None: | ||
"""convert the `GeneratedTable` to SQL code based on which underlying | ||
`PyDoughUserGeneratedCollection` it uses | ||
|
||
Args: | ||
generated_table: The generated table node to visit. | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the logic currently inside this function should be happening inside the function. This function should just call a new method inside self.bindings
t deal with generated tables. That method should case on the type of generated table and call an appropriate method (for now, just one helper method for range). Everything currently inside this function should go inside that helper method for range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there shouldn't be a docstring for this method since there is one for the base class.
""" | ||
Returns a string representation of the collection in a standalone form. | ||
This is used for debugging and logging purposes. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have this docstring since it is already in the parent classes.
@property | ||
def unique_terms(self) -> list[str]: | ||
return self.collection.columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make unique_column_names
a property of PyDoughUserGeneratedCollection
, that way this property only needs to return self.collection.unique_column_names
def node_equals(self, other: RelationalNode) -> bool: | ||
return isinstance(other, GeneratedTable) and self.name == other.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be checking if self.collection == other.collection
.
visitor.visit_generated_table(self) | ||
|
||
def to_string(self, compact=False) -> str: | ||
return f"GENERATED_TABLE(table={self.name}, columns={self.make_column_string(self.columns, compact)})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a very useful string representation for the generated plans. At this point, we don't really care about the names (those are mostly used for QDAG to identify which ancestors you are talking about). Instead, let's stringify it as f"GENERATED_TABLE({self.collection})
def __repr__(self): | ||
return f"USER_GEN_COLLECTION[{self.user_collection.name}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name doesn't matter here, let's stringify using self.user_collection.to_string()
def to_string(self) -> str: | ||
return self.collection.to_string() | ||
|
||
@property | ||
def tree_item_string(self) -> str: | ||
return self.to_string() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two need to be a bit different for QDAG:
to_string()
should return a string that looks like the PyDough code that would generate this QDAG node once qualified.tree_item_string
should return a more compact/clean form that goes inside the tree diagrams from the qualification tests (so whatself.collection.to_string()
is currently doing)
|
||
def to_string(self) -> str: | ||
"""Return a string representation of the range collection.""" | ||
return f"RangeCollection({self.name}!r, {self.columns[0]}={self.range})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The !r
is supposed to go inside the {}
. Right now if you look at the strings inside test_qualification.py
they look a bit off because of this.
if not isinstance(other, RangeGeneratedCollection): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just make isinstance(other, RangeGeneratedCollection)
the first condition in the next line
@property | ||
@abstractmethod | ||
def data(self) -> Any: | ||
"""Return information about data in the collection.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a method in the base class / at all? The subclasses will have no common idea of what "data" is.
No description provided.