Skip to content

Conversation

hadia206
Copy link
Contributor

No description provided.

__all__ = ["PyDoughUserGeneratedCollection"]


class PyDoughUserGeneratedCollection(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this + range_collection.py into the user_collections module since they will be referenced by multiple different IRs, and they don't really "belong" to any of them (kinda like metadata).

Comment on lines 26 to 29
def __eq__(self, other) -> bool:
return isinstance(other, PyDoughUserGeneratedCollection) and repr(self) == repr(
other
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just do self.equals(other) and define an equals abstractmethod for each user collection.

class RangeGeneratedCollection(PyDoughUserGeneratedCollection):
"""Integer range-based collection."""

# HA_Q: should start/end/step be int or PyDoughQDAG? Why?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say int | None since this class will transcend all the IRs

Comment on lines 37 to 39
self.start = start
self.end = end
self.step = step
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: store all of these as self._start etc. and have @property to access them.

Also, let's add self._range = range(start, end, step) so the to_string can be: RangeCollection({self.name}!r, {self.column_name}={self.range})

Comment on lines 55 to 57
@abstractmethod
def is_empty(self) -> bool:
"""Check if the collection is empty."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to flip this to always_non_empty, due to the usage

if term_name not in self.collection.columns:
raise PyDoughQDAGException(self.name_mismatch_error(term_name))

return Reference(self._ancestor, term_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be Reference(self, term_name)?

Comment on lines 112 to 113
def get_expression_position(self, expr_name: str) -> int:
raise PyDoughQDAGException(f"Cannot call get_expression_position on {self!r}")
Copy link
Contributor

@knassre-bodo knassre-bodo Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you can, use the ordering from the collection to figure out the order of the columns

Returns a string representation of the collection in a standalone form.
This is used for debugging and logging purposes.
"""
return f"UserGeneratedCollection[{self.name}, {', '.join(self.collection.columns)}]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just use a to_string for the collection

Comment on lines 131 to 133
def to_string(self) -> str:
# Stringify as "name(column_name)
return f"{self.name}({', '.join(self.collection.columns)})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again lets use the to_string for the collection to help out here

Comment on lines 136 to 137
def tree_item_string(self) -> str:
return f"UserGeneratedCollection[{self.name}: {', '.join(self.collection.columns)}]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point about the to_string for the collection

@hadia206 hadia206 force-pushed the Hadia/user_collections_range branch from 207f36b to d0ce87f Compare July 16, 2025 17:55
@hadia206 hadia206 changed the base branch from main to Hadia/update_reference July 16, 2025 17:56
Comment on lines 680 to 682
case HybridUserGeneratedCollection():
# User-generated collections are always guaranteed to exist.
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they aren't (what if the range is empty?) This is why we need the always exists field for the generated colleciton.

Comment on lines 733 to 737
# HA TODO: confirm is that right?
case HybridUserGeneratedCollection():
# User-generated collections are always guaranteed to be
# singular.
pass
Copy link
Contributor

@knassre-bodo knassre-bodo Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not. It is only singular if we can guarantee it has <=1 rows.

Comment on lines +595 to +602
def visit_generated_table(self, generated_table: "GeneratedTable") -> None:
"""convert the `GeneratedTable` to SQL code based on which underlying
`PyDoughUserGeneratedCollection` it uses

Args:
generated_table: The generated table node to visit.

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the logic currently inside this function should be happening inside the function. This function should just call a new method inside self.bindings t deal with generated tables. That method should case on the type of generated table and call an appropriate method (for now, just one helper method for range). Everything currently inside this function should go inside that helper method for range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there shouldn't be a docstring for this method since there is one for the base class.

Comment on lines +121 to +124
"""
Returns a string representation of the collection in a standalone form.
This is used for debugging and logging purposes.
"""
Copy link
Contributor

@knassre-bodo knassre-bodo Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have this docstring since it is already in the parent classes.

Comment on lines +115 to +117
@property
def unique_terms(self) -> list[str]:
return self.collection.columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make unique_column_names a property of PyDoughUserGeneratedCollection, that way this property only needs to return self.collection.unique_column_names

Comment on lines +49 to +50
def node_equals(self, other: RelationalNode) -> bool:
return isinstance(other, GeneratedTable) and self.name == other.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be checking if self.collection == other.collection.

visitor.visit_generated_table(self)

def to_string(self, compact=False) -> str:
return f"GENERATED_TABLE(table={self.name}, columns={self.make_column_string(self.columns, compact)})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a very useful string representation for the generated plans. At this point, we don't really care about the names (those are mostly used for QDAG to identify which ancestors you are talking about). Instead, let's stringify it as f"GENERATED_TABLE({self.collection})

Comment on lines +519 to +520
def __repr__(self):
return f"USER_GEN_COLLECTION[{self.user_collection.name}]"
Copy link
Contributor

@knassre-bodo knassre-bodo Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name doesn't matter here, let's stringify using self.user_collection.to_string()

Comment on lines +131 to +136
def to_string(self) -> str:
return self.collection.to_string()

@property
def tree_item_string(self) -> str:
return self.to_string()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two need to be a bit different for QDAG:

  • to_string() should return a string that looks like the PyDough code that would generate this QDAG node once qualified.
  • tree_item_string should return a more compact/clean form that goes inside the tree diagrams from the qualification tests (so what self.collection.to_string() is currently doing)


def to_string(self) -> str:
"""Return a string representation of the range collection."""
return f"RangeCollection({self.name}!r, {self.columns[0]}={self.range})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The !r is supposed to go inside the {}. Right now if you look at the strings inside test_qualification.py they look a bit off because of this.

Comment on lines +86 to +87
if not isinstance(other, RangeGeneratedCollection):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just make isinstance(other, RangeGeneratedCollection) the first condition in the next line

Comment on lines +55 to +58
@property
@abstractmethod
def data(self) -> Any:
"""Return information about data in the collection."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a method in the base class / at all? The subclasses will have no common idea of what "data" is.

Base automatically changed from Hadia/update_reference to main August 12, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants