Skip to content

add support for queries with joins #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

WaVEV
Copy link
Collaborator

@WaVEV WaVEV commented Jun 28, 2024

Fixes #35

@WaVEV WaVEV requested review from timgraham and Jibola June 28, 2024 23:35
@WaVEV WaVEV marked this pull request as draft June 28, 2024 23:35
@WaVEV WaVEV changed the title Support lookup from related collections (#1) Support lookup from related collections Jun 29, 2024
@WaVEV WaVEV marked this pull request as ready for review July 1, 2024 01:50
@WaVEV WaVEV marked this pull request as draft July 1, 2024 03:05
@WaVEV WaVEV force-pushed the support-lookup-from-related-collections branch from 9915c05 to 60e7eb1 Compare July 1, 2024 15:02
@WaVEV WaVEV marked this pull request as ready for review July 2, 2024 05:11
@WaVEV WaVEV force-pushed the support-lookup-from-related-collections branch from 0a6316e to 8ec51cd Compare July 2, 2024 05:41
@@ -70,6 +72,8 @@ class DatabaseFeatures(BaseDatabaseFeatures):
"model_fields.test_jsonfield.TestQuerying.test_order_grouping_custom_decoder",
"model_fields.test_jsonfield.TestQuerying.test_ordering_by_transform",
"model_fields.test_jsonfield.TestQuerying.test_ordering_grouping_by_key_transform",
# Ordering 'OrderBy' is not iterable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an existing section above: "Lookup in order_by() not supported"

@@ -159,6 +159,12 @@ def execute_sql_flush(self, tables):
if not options.get("capped", False):
collection.drop()

def prepare_join_on_clause(self, lhs_table, lhs_field, rhs_table, rhs_field):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same implementation as BaseDatabaseOperations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this implementation was in Postgres's backend. So I will remove.

@@ -0,0 +1,75 @@
from django.db.models.sql.constants import INNER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if datastructers.py is the best name. Do you anticipate we need to modify any of the other classes in that file? If not, I might suggest join.py, or even put this in query.py between MongoQuery and where_node? No need to make a decision immediately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we will modify BaseTable, I will move it to query.

@@ -202,8 +220,24 @@ def _get_ordering(self):
field_ordering.append((opts.get_field(name), ascending))
return field_ordering

@property
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to use from django.utils.functional import cached_property.

result = tuple(result)
return result

def _apply_converters(self, entity, columns, converters):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor of a refactor to add def apply_converters since that's consistent with the API of Django's SQLCompiler. I tried to refactor this to add that and get rid of _make_result() (which came from django-nonrel) but wasn't successful. Anyway, is it a necessary part of this PR? It looks like it was initially but may no longer be. If possible, I'd try to keep cosmetic cleanups in a separate PR from functionality changes.

README.md Outdated
@@ -114,7 +114,6 @@ Migrations for 'admin':
- `datetimes()`
- `distinct()`
- `extra()`
- `select_related()`

- Queries with joins aren't supported.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also obsolete, kind of. I guess we're in a weird state where some cases are supported. Anyway, we don't need to update the documentation at every stage if this is going to evolve through a series of PRs.

pipeline = []
if self.mongo_lookups:
lookups = self.mongo_lookups
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lookups variable seems unnecessary (possibly a debugging or refactoring artifact)

# maybe I have to create a new object or named tuple.
# it will save lookups, some filters (in case of inner) and project to rename field
# don't know if the rename is needed
self.mongo_lookups = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if mongo_lookups is the best name. It kind of confuses me with Django lookups / lookups.py. Maybe something like lookup_pipeline?

@@ -102,7 +107,17 @@ def get_cursor(self):
# If name != column, then this is an annotatation referencing
# another column.
fields[name] = 1 if name == column else f"${column}"

# Add the subquery results if fields is defined.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "subquery" the best word choice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I changed it for related.

@WaVEV WaVEV force-pushed the support-lookup-from-related-collections branch 3 times, most recently from 52a2d47 to 91e9b3a Compare July 5, 2024 15:26
lhs_fields = []
rhs_fields = []
for lhs, rhs in join_fields:
if isinstance(lhs, str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What case is this? I haven't been able to find the tests that use this branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imported the logic from https://github.com/WaVEV/django/blob/main/django/db/models/sql/datastructures.py#L88
In that join function there is a second part of the join, but I didn't find where it is used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. So it's to support a deprecated private API. I think we should just simplify the code now and not worry about this. It's untested by Django's test suite anyway (checked running with PostgreSQL). Any third-party apps should be on their way to supporting the new API by now.

rhs_fields.append(rhs_mql)

parent_template = "parent__field__"
lookups_pipeline = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@timgraham timgraham changed the title Support lookup from related collections add support for queries with joins Jul 6, 2024
return self.connection.get_collection(self.query.get_meta().db_table)
return self.connection.get_collection(self.collection_name)

def get_lookup_pipeline(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method be private?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it would be consistent with _get_ordering(). On the other hand, I'm not sure I would have underscore prefixed that method (that code came from the original fork). When I look at Django's SQL compiler, underscore methods are mainly used for utility methods used more than once in other methods.

This class doesn't have any methods Django developers call themselves, so whether or not to underscore is mainly a decision about how we want to organize things.

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Adding some questions to get clarification.

Comment on lines +46 to +48
prefix = f"{self.alias}." if self.alias != compiler.collection_name else ""
return f"${prefix}{self.target.column}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we prefix the target column with the alias instead of using only the alias?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you misread it as "column's alias" instead of "column's collection's alias"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will explain the idea of the prefix:
When a collection is joined with the main collection, it generates a new "field" with the same as the joined collection (Now I am thinking if a collision is possible 🤔 ) this is why a prefix is computed if the column table (alias) is different from the current collection.


def get_lookup_pipeline(self):
result = []
for alias in tuple(self.query.alias_map):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to cast this as a tuple before iterating on it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was copied from Django, but may be obsolete: django/django#18357

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an answer there, saying that the query.alias_map is modified during the for loops. 🤔 I didn't noticed that, maybe I have to check it.

Comment on lines +84 to +93
# Wrong results in queries with multiple tables.
"annotations.tests.NonAggregateAnnotationTestCase.test_annotation_aggregate_with_m2o",
"annotations.tests.NonAggregateAnnotationTestCase.test_annotation_reverse_m2m",
"annotations.tests.NonAggregateAnnotationTestCase.test_chaining_annotation_filter_with_m2m",
"lookup.tests.LookupTests.test_lookup_collision",
"expressions_case.tests.CaseExpressionTests.test_join_promotion",
"expressions_case.tests.CaseExpressionTests.test_join_promotion_multiple_annotations",
"ordering.tests.OrderingTests.test_order_by_grandparent_fk_with_expression_in_default_ordering",
"ordering.tests.OrderingTests.test_order_by_parent_fk_with_expression_in_default_ordering",
"ordering.tests.OrderingTests.test_order_by_ptr_field_with_default_ordering_by_expression",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way are these results wrong?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is more than one issue represented here. I'll investigate in more detail later but don't want to hold up the merge for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Also fixes mongodb#35: add support for QuerySet.select_related()
@timgraham timgraham force-pushed the support-lookup-from-related-collections branch from 8a2fe23 to 5e77b83 Compare July 9, 2024 13:25
@timgraham timgraham merged commit 5e77b83 into mongodb:main Jul 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add support for QuerySet.select_related()
3 participants