-
Notifications
You must be signed in to change notification settings - Fork 26
add support for QuerySet.aggregate() #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use git rebase -i HEAD~20
to remove the commits that shouldn't be in here.
c43a1df
to
9ff6438
Compare
django_mongodb/compiler.py
Outdated
@@ -17,15 +19,190 @@ class SQLCompiler(compiler.SQLCompiler): | |||
"""Base class for all Mongo compilers.""" | |||
|
|||
query_class = MongoQuery | |||
SEPARATOR = "10__MESSI__3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have a problem, We cannot define a separator and be sure that it wouldn't have any collision in the future.
This things is used in the case that we are grouping with a foreign field. like
select max(A.id), B.name
from A
join B
group by B.name
In the grouping stage I cannot set names like table.field, so I have to use a separator. But any string could be a collision with the table name or field name.
Then I think in two options:
- Generate a random separator (and regenerate if it has a collision)
- Rename all the columns (as Django does __col#),
For now, I think we can be ok with some random separator like 'GROUP_FOREIGN_SEPARATOR'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about double underscore: __
. That seems fairly unlikely to collide. I think it's important to keep the queries readable and inserting random strings doesn't seem so nice. Django field names cannot include double underscore due to a collision with Djanog's lookup separator. A developer could set db_column="something__something"
but this seems unlikely (famous last words). We could just say, "don't do that!"
Also, I think we should use a more descriptive variable name than SEPARATOR
. Django has LOOKUP_SEP = "__"
. I think GROUP_SEP
could be fine here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! but I decide to add three, we use __aggregation so, __ was wrong.
80d0c57
to
90cb739
Compare
554dacb
to
4a70fd0
Compare
group[alias] = sub_expr.as_mql(self, self.connection) | ||
replacing_expr = inner_column | ||
# Count must return 0 rather than null. | ||
if isinstance(sub_expr, Count): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this special handling of Count/Variance can live in aggregations.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so as well, but I couldn't find a solution. Let's see the following example:
select count(*) from T1 where 1 = 0
The where
clause is executed first, resulting in an empty set, so the count will be 0. But if in MongoDB I do:
[
{
"$match": {"$expr": {"$eq": [1, 0]}}
},
{
"$group": {"_id": null, "group": {"$sum": 1}}
}
]
It will return an empty list. To handle this, I created a $facet
for the group by null elements. Then post-processing is needed, and this way is shorter.
django_mongodb/compiler.py
Outdated
return Col(self.collection_name, column_target) | ||
|
||
def _prepare_expressions_for_pipeline(self, expression, target, count): | ||
"""Prepare expressions for the aggregation pipeline.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods would benefit from some more inline comments or at least an explanation of what it means to "prepare an expression".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Indeed. Will add
django_mongodb/functions.py
Outdated
if resolve_inner_expression: | ||
return inner_expression | ||
return {"$sum": inner_expression} | ||
# If distinct=True or resolve_inner_expression=False=False, sum the size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited this from "When count is called with distinct without the flag". I think "the flag" meant resolve_inner_expression=False
but please confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is.
source_expressions = node.get_source_expressions() | ||
filter_ = deepcopy(self.filter) | ||
filter_.add( | ||
WhereNode([Exact(source_expressions[0], Value(None))], negated=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @timgraham I deleted your comment by mistake, the answer is:
The count only counts values if they aren't none. if the expression result is a string, number or something it sums as 1. Looking again the code, maybe there is a bug when there is filter and distinct options. Will check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that both branches are excluding null values. My question is how does the query end up with null values? I could probably work through some tests to understand it better. Thought you might be able to give a quick example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there are like 3 places with nulls. 😬 or maybe two.
The example when this code is use is:
Select count(name) filter (where surname = 'Lupi')
from T1
Here we have to sum of the elements that fulfill the filter, we handle it with a case and the first (and only) source_expression would be the transformed (idk if there is others, but I just copy the mechanism from source)
I will explain it one by one.
But we have to change the exact(value, None) for IsNull(value, True) they are not the same.
django_mongodb/aggregates.py
Outdated
|
||
def count(self, compiler, connection, resolve_inner_expression=False, **extra_context): # noqa: ARG001 | ||
""" | ||
When resolve_inner_expression is True, return the argument as MQL that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give some examples to help me understand this docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, maybe it should be in another function, and I forgot to support that flag in aggregates (there isn't any test covering it 😬). But I can explain the idea: I translated SELECT COUNT(DISTINCT name) FROM T1
into addToSet
and then the aggregation operation. So, I only need to calculate the RHS to store its values in a set. After storing, we can calculate the sum, size, variance, or whatever is needed.
source_expressions = node.get_source_expressions() | ||
filter_ = deepcopy(self.filter) | ||
filter_.add( | ||
WhereNode([Exact(source_expressions[0], Value(None))], negated=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that both branches are excluding null values. My question is how does the query end up with null values? I could probably work through some tests to understand it better. Thought you might be able to give a quick example.
django_mongodb/compiler.py
Outdated
@@ -17,15 +19,190 @@ class SQLCompiler(compiler.SQLCompiler): | |||
"""Base class for all Mongo compilers.""" | |||
|
|||
query_class = MongoQuery | |||
SEPARATOR = "10__MESSI__3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about double underscore: __
. That seems fairly unlikely to collide. I think it's important to keep the queries readable and inserting random strings doesn't seem so nice. Django field names cannot include double underscore due to a collision with Djanog's lookup separator. A developer could set db_column="something__something"
but this seems unlikely (famous last words). We could just say, "don't do that!"
Also, I think we should use a more descriptive variable name than SEPARATOR
. Django has LOOKUP_SEP = "__"
. I think GROUP_SEP
could be fine here.
django_mongodb/compiler.py
Outdated
else: | ||
group["_id"] = ids | ||
pipeline.append({"$group": group}) | ||
sets = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename sets -> add_fields ?
django_mongodb/compiler.py
Outdated
value = f"$_id.{key}" | ||
if self.SEPARATOR in key: | ||
subtable, field = key.split(self.SEPARATOR) | ||
if subtable not in sets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about sets = defaultdict(dict)
so this isn't necessary?
django_mongodb/compiler.py
Outdated
try: | ||
query.mongo_query = {"$expr": self.query.where.as_mql(self, self.connection)} | ||
where = getattr(self, "where", self.query.where) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fallback is a little opaque. It looks to me self.query.where
is used for Update and Delete compilers. I would either add a comment about this or add a get_where()
hook to the compilers that returns the appropriate value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the where from query is very confusing, sometimes it is the same where that camper has, other times it has a where with the having
expressions. The compiler (the main one) uses self.where, that has the expressions separated. but add a get_where in the compiler could be good enough.
django_mongodb/compiler.py
Outdated
stack.extend(expr.get_source_expressions()) | ||
|
||
def get_aggregation_pipeline(self): | ||
return self._group_pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be a private variable with a get_
method? Could it be self.aggregation_pipeline
like other variables in SQLCompiler.__init__()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, It can
a9fc29c
to
1d8a64e
Compare
django_mongodb/features.py
Outdated
"aggregation.tests.AggregateTestCase.test_aggregation_default_passed_another_aggregate", | ||
"aggregation.tests.AggregateTestCase.test_annotation_expressions", | ||
"aggregation.tests.AggregateTestCase.test_reverse_fkey_annotate", | ||
# Manage empty result when the flag elide_empty is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a TODO even though the test probably won't work with SQL-specific: Func("book", function="COUNT")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a TODO, but maybe this test is not the right to do that. I think, this should be moved to the raw sql queries...
1d8a64e
to
f074d96
Compare
Also add support for Count() in QuerySet.annotate().
4d1cbac
to
6f409db
Compare
Fixes #12 and #79