add support for QuerySet.aggregate() #84

WaVEV · 2024-07-16T13:53:06Z

Fixes #12 and #79

timgraham

You can use git rebase -i HEAD~20 to remove the commits that shouldn't be in here.

django_mongodb/compiler.py

django_mongodb/query.py

.github/workflows/test-python.yml

django_mongodb/compiler.py

django_mongodb/query.py

timgraham · 2024-07-31T23:52:09Z

django_mongodb/compiler.py

@@ -17,15 +19,190 @@ class SQLCompiler(compiler.SQLCompiler):
    """Base class for all Mongo compilers."""

    query_class = MongoQuery
+    SEPARATOR = "10__MESSI__3"


Here we have a problem, We cannot define a separator and be sure that it wouldn't have any collision in the future.
This things is used in the case that we are grouping with a foreign field. like

select max(A.id), B.name from A join B group by B.name

In the grouping stage I cannot set names like table.field, so I have to use a separator. But any string could be a collision with the table name or field name.
Then I think in two options:

Generate a random separator (and regenerate if it has a collision)

Rename all the columns (as Django does __col#),

For now, I think we can be ok with some random separator like 'GROUP_FOREIGN_SEPARATOR'

How about double underscore: __. That seems fairly unlikely to collide. I think it's important to keep the queries readable and inserting random strings doesn't seem so nice. Django field names cannot include double underscore due to a collision with Djanog's lookup separator. A developer could set db_column="something__something" but this seems unlikely (famous last words). We could just say, "don't do that!"

Also, I think we should use a more descriptive variable name than SEPARATOR. Django has LOOKUP_SEP = "__". I think GROUP_SEP could be fine here.

Perfect! but I decide to add three, we use __aggregation so, __ was wrong.

django_mongodb/functions.py

django_mongodb/query.py

django_mongodb/features.py

django_mongodb/functions.py

timgraham · 2024-08-05T17:35:38Z

django_mongodb/compiler.py

+                group[alias] = sub_expr.as_mql(self, self.connection)
+                replacing_expr = inner_column
+            # Count must return 0 rather than null.
+            if isinstance(sub_expr, Count):


I wonder if this special handling of Count/Variance can live in aggregations.py?

I thought so as well, but I couldn't find a solution. Let's see the following example:

select count(*) from T1 where 1 = 0

The where clause is executed first, resulting in an empty set, so the count will be 0. But if in MongoDB I do:

[ { "$match": {"$expr": {"$eq": [1, 0]}} }, { "$group": {"_id": null, "group": {"$sum": 1}} } ]

It will return an empty list. To handle this, I created a $facet for the group by null elements. Then post-processing is needed, and this way is shorter.

timgraham · 2024-08-05T17:36:16Z

django_mongodb/compiler.py

+        return Col(self.collection_name, column_target)
+
+    def _prepare_expressions_for_pipeline(self, expression, target, count):
+        """Prepare expressions for the aggregation pipeline."""


These methods would benefit from some more inline comments or at least an explanation of what it means to "prepare an expression".

Yes, Indeed. Will add

timgraham · 2024-08-05T17:39:55Z

django_mongodb/functions.py

+        if resolve_inner_expression:
+            return inner_expression
+        return {"$sum": inner_expression}
+    # If distinct=True or resolve_inner_expression=False=False, sum the size


I edited this from "When count is called with distinct without the flag". I think "the flag" meant resolve_inner_expression=False but please confirm.

Yes, it is.

django_mongodb/compiler.py

WaVEV · 2024-08-05T19:59:41Z

django_mongodb/aggregates.py

+            source_expressions = node.get_source_expressions()
+            filter_ = deepcopy(self.filter)
+            filter_.add(
+                WhereNode([Exact(source_expressions[0], Value(None))], negated=True),


Sorry @timgraham I deleted your comment by mistake, the answer is:
The count only counts values if they aren't none. if the expression result is a string, number or something it sums as 1. Looking again the code, maybe there is a bug when there is filter and distinct options. Will check.

I understand that both branches are excluding null values. My question is how does the query end up with null values? I could probably work through some tests to understand it better. Thought you might be able to give a quick example.

Ok, there are like 3 places with nulls. 😬 or maybe two.
The example when this code is use is:

Select count(name) filter (where surname = 'Lupi') from T1

Here we have to sum of the elements that fulfill the filter, we handle it with a case and the first (and only) source_expression would be the transformed (idk if there is others, but I just copy the mechanism from source)

I will explain it one by one.

But we have to change the exact(value, None) for IsNull(value, True) they are not the same.

timgraham · 2024-08-05T22:33:10Z

django_mongodb/aggregates.py

+
+def count(self, compiler, connection, resolve_inner_expression=False, **extra_context):  # noqa: ARG001
+    """
+    When resolve_inner_expression is True, return the argument as MQL that


Could you give some examples to help me understand this docstring?

Sure, maybe it should be in another function, and I forgot to support that flag in aggregates (there isn't any test covering it 😬). But I can explain the idea: I translated SELECT COUNT(DISTINCT name) FROM T1 into addToSet and then the aggregation operation. So, I only need to calculate the RHS to store its values in a set. After storing, we can calculate the sum, size, variance, or whatever is needed.

timgraham · 2024-08-05T22:34:54Z

django_mongodb/aggregates.py

+            source_expressions = node.get_source_expressions()
+            filter_ = deepcopy(self.filter)
+            filter_.add(
+                WhereNode([Exact(source_expressions[0], Value(None))], negated=True),


I understand that both branches are excluding null values. My question is how does the query end up with null values? I could probably work through some tests to understand it better. Thought you might be able to give a quick example.

timgraham · 2024-08-05T22:41:19Z

django_mongodb/compiler.py

@@ -17,15 +19,190 @@ class SQLCompiler(compiler.SQLCompiler):
    """Base class for all Mongo compilers."""

    query_class = MongoQuery
+    SEPARATOR = "10__MESSI__3"


How about double underscore: __. That seems fairly unlikely to collide. I think it's important to keep the queries readable and inserting random strings doesn't seem so nice. Django field names cannot include double underscore due to a collision with Djanog's lookup separator. A developer could set db_column="something__something" but this seems unlikely (famous last words). We could just say, "don't do that!"

Also, I think we should use a more descriptive variable name than SEPARATOR. Django has LOOKUP_SEP = "__". I think GROUP_SEP could be fine here.

timgraham · 2024-08-05T22:55:48Z

django_mongodb/compiler.py

+        else:
+            group["_id"] = ids
+            pipeline.append({"$group": group})
+            sets = {}


Rename sets -> add_fields ?

timgraham · 2024-08-05T22:57:31Z

django_mongodb/compiler.py

+                value = f"$_id.{key}"
+                if self.SEPARATOR in key:
+                    subtable, field = key.split(self.SEPARATOR)
+                    if subtable not in sets:


How about sets = defaultdict(dict) so this isn't necessary?

timgraham · 2024-08-05T23:04:41Z

django_mongodb/compiler.py

        try:
-            query.mongo_query = {"$expr": self.query.where.as_mql(self, self.connection)}
+            where = getattr(self, "where", self.query.where)


This fallback is a little opaque. It looks to me self.query.where is used for Update and Delete compilers. I would either add a comment about this or add a get_where() hook to the compilers that returns the appropriate value.

Yes, the where from query is very confusing, sometimes it is the same where that camper has, other times it has a where with the having expressions. The compiler (the main one) uses self.where, that has the expressions separated. but add a get_where in the compiler could be good enough.

timgraham · 2024-08-05T23:10:22Z

django_mongodb/compiler.py

+                stack.extend(expr.get_source_expressions())
+
+    def get_aggregation_pipeline(self):
+        return self._group_pipeline


Does it need to be a private variable with a get_ method? Could it be self.aggregation_pipeline like other variables in SQLCompiler.__init__()?

yes, It can

timgraham · 2024-08-07T19:05:42Z

django_mongodb/features.py

+        "aggregation.tests.AggregateTestCase.test_aggregation_default_passed_another_aggregate",
+        "aggregation.tests.AggregateTestCase.test_annotation_expressions",
+        "aggregation.tests.AggregateTestCase.test_reverse_fkey_annotate",
+        # Manage empty result when the flag elide_empty is False


Is this a TODO even though the test probably won't work with SQL-specific: Func("book", function="COUNT")?

Yes, it is a TODO, but maybe this test is not the right to do that. I think, this should be moved to the raw sql queries...

Also add support for Count() in QuerySet.annotate().

timgraham reviewed Jul 16, 2024

View reviewed changes

django_mongodb/compiler.py Outdated Show resolved Hide resolved

django_mongodb/compiler.py Show resolved Hide resolved

django_mongodb/query.py Outdated Show resolved Hide resolved

WaVEV force-pushed the support-aggregate branch from 54de66a to 17dfa68 Compare July 16, 2024 15:46

timgraham changed the title ~~Support aggregate~~ add support for QuerySet.aggregate() Jul 16, 2024

WaVEV force-pushed the support-aggregate branch 2 times, most recently from c43a1df to 9ff6438 Compare July 24, 2024 04:55

timgraham mentioned this pull request Jul 25, 2024

made QuerySet iteration respect chunk_size #88

Merged

timgraham reviewed Jul 25, 2024

View reviewed changes

.github/workflows/test-python.yml Show resolved Hide resolved

django_mongodb/compiler.py Show resolved Hide resolved

django_mongodb/compiler.py Outdated Show resolved Hide resolved

django_mongodb/query.py Outdated Show resolved Hide resolved

WaVEV force-pushed the support-aggregate branch from a528112 to 7e80c1a Compare July 26, 2024 04:56

WaVEV marked this pull request as ready for review July 26, 2024 04:57

WaVEV requested a review from timgraham July 29, 2024 13:41

WaVEV force-pushed the support-aggregate branch from f1e9281 to f6b0a9d Compare July 30, 2024 00:10

timgraham mentioned this pull request Jul 30, 2024

fix crash of DecimalField lookup with F expression #92

Merged

WaVEV force-pushed the support-aggregate branch from 10fd0ca to 3f85f34 Compare July 30, 2024 03:19

timgraham reviewed Aug 1, 2024

View reviewed changes

timgraham force-pushed the support-aggregate branch from 80d0c57 to 90cb739 Compare August 3, 2024 01:17

WaVEV mentioned this pull request Aug 4, 2024

add support for ordering by expressions #94

Merged

timgraham force-pushed the support-aggregate branch 2 times, most recently from 554dacb to 4a70fd0 Compare August 5, 2024 18:26

timgraham reviewed Aug 5, 2024

View reviewed changes

mongodb deleted a comment from timgraham Aug 5, 2024

WaVEV commented Aug 5, 2024

View reviewed changes

timgraham reviewed Aug 5, 2024

View reviewed changes

timgraham force-pushed the support-aggregate branch from a9fc29c to 1d8a64e Compare August 7, 2024 19:19

timgraham reviewed Aug 7, 2024

View reviewed changes

timgraham force-pushed the support-aggregate branch from 1d8a64e to f074d96 Compare August 7, 2024 19:51

timgraham approved these changes Aug 8, 2024

View reviewed changes

add support for QuerySet.aggregate()

6f409db

Also add support for Count() in QuerySet.annotate().

timgraham force-pushed the support-aggregate branch from 4d1cbac to 6f409db Compare August 8, 2024 18:11

timgraham merged commit 6f409db into mongodb:main Aug 8, 2024
3 checks passed

timgraham mentioned this pull request Aug 12, 2024

Count() in QuerySet.annotate() crashes: Unknown expression $count #79

Closed

add support for QuerySet.aggregate() #84

add support for QuerySet.aggregate() #84

Uh oh!

Conversation

WaVEV commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timgraham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WaVEV Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WaVEV Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WaVEV Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

WaVEV commented Jul 16, 2024 •

edited

Loading

WaVEV Aug 1, 2024 •

edited

Loading

WaVEV Aug 6, 2024 •

edited

Loading

WaVEV Aug 6, 2024 •

edited

Loading