Skip to content

INTPYTHON-348 add support for QuerySet.raw_mql() #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

INTPYTHON-348 add support for QuerySet.raw_mql() #173

wants to merge 8 commits into from

Conversation

aclark4life
Copy link
Collaborator

@aclark4life aclark4life commented Oct 24, 2024

For #163. This PR adds a new file manager.py with class MongoManager and updates query.py to include classes

  • MongoQuerySet
  • RawQuerySet
  • RawQuery
  • RawModelIterable

which returns:


python manage.py shell -c "from polls.models import Question; q = Question(); q.save(); raw_query = Question.objects.raw_mql('db.polls_question.find()'); [i for i in raw_query]"
DEBUG 2024-10-24 01:40:10,444 utils (0.005) db.polls_question.insert_many([{'question_text': '', 'pub_date': None}])
Traceback (most recent call last):
  File "/Users/alex.clark/Developer/just-django/manage.py", line 22, in <module>
    main()
  File "/Users/alex.clark/Developer/just-django/manage.py", line 18, in main
    execute_from_command_line(sys.argv)
  File "/Users/alex.clark/Developer/just-django/src/django/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/Users/alex.clark/Developer/just-django/src/django/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/alex.clark/Developer/just-django/src/django/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/alex.clark/Developer/just-django/src/django/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django/django/core/management/commands/shell.py", line 117, in handle
    exec(options["command"], globals())
  File "<string>", line 1, in <module>
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 414, in __iter__
    self._fetch_all()
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 401, in _fetch_all
    self._result_cache = list(self.iterator())
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 428, in iterator
    yield from RawModelIterable(self)
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 565, in __iter__
    query_iterator = iter(query)
                     ^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 511, in __iter__
    self._execute_query()
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 551, in _execute_query
    self.cursor.execute(self.sql, params)
    ^^^^^^^^^^^^^^^^^^^
AttributeError: 'Cursor' object has no attribute 'execute'

which perhaps can be fixed by ensuring cursor is a MongoDB cursor and/or query is a MongoQuery. Is this a good approach or am I missing a shorter path to returning iterable results of raw_mql query?

@aclark4life aclark4life requested a review from timgraham October 24, 2024 01:48
@timgraham
Copy link
Collaborator

db.polls_question.find() isn't the raw syntax we're after. Since we're using the Question manager (Question.objects), we know what model to use. Take a look at https://github.com/search?q=repo%3Adjango-nonrel%2Fmongodb-engine%20raw_query&type=code for how this worked in the project we forked. The raw query in that case is the argument to collection.find().

This project was changed (9f0ad73) to use collection.aggregate() instead of find() so the raw query syntax will differ. Now raw_query:
https://github.com/mongodb-labs/django-mongodb/blob/17d8a5d09fee3edc04924092b61f278ab5c56f48/django_mongodb/query.py#L52
enables customizing the $match critieria:
https://github.com/mongodb-labs/django-mongodb/blob/17d8a5d09fee3edc04924092b61f278ab5c56f48/django_mongodb/query.py#L88

I'm not sure that really answers your question, but if you can't figure out how to connect the implementation pieces from this tip, I'll have to dive into the code and take a longer look.

@aclark4life
Copy link
Collaborator Author

raw query syntax will differ.

Right, for example folks can pass in an aggregation pipeline, I assume, e.g.

I'm not sure that really answers your question

That helps! A MongoQuery's raw_query is already included in the Query's aggregation pipeline, if it exists.

connect the implementation pieces from this tip

When someone writes Question.objects.all() we have to translate that to MQL then get the data from MongoDB. When someone writes MQL, we just need to get the data from MQL and that code already exists. So the answer to my question is tentatively "maybe we don't need all those methods". Or if we need them, we need them for something that happens after what happens when someone writes Question.objects.all and gets a QS back.

@timgraham
Copy link
Collaborator

The user could write QuerySet.objects.raw({size: "medium"}) in the example you linked. raw() only allows custom matching conditions, not aggregations like $group.

@aclark4life
Copy link
Collaborator Author

@Jibola pointed out that @timgraham pointed out nonrel already supports this, so here is a 2nd pass. Aside from using

class MongoManager(BaseManager.from_queryset(MongoQuerySet)):

instead of a mixin, I think this is nonrel's implementation. In my testing, I get this output:

>>> Question.objects.raw_mql({'question_text': ""})
DEBUG 2024-10-24 21:34:23,749 utils (0.008) db.polls_question.aggregate([{'$match': {'$expr': {}}}, {'$limit': 21}])
<QuerySet [<Question: Question object (671962ec10fa6748204eed47)>, <Question: Question object (671966bddd9ecff50427569f)>, <Question: Question object (671967ad2b741c96f8292dfb)>, <Question: Question object (67196d1caa2a0b5b8b0beb1f)>, <Question: Question object (67196d4fbb9d46be6ad94f39)>, <Question: Question object (67196d5df268a3b28513c194)>, <Question: Question object (67196e7bc10b5aa61b7343f0)>, <Question: Question object (67196f0b0f7adb7283376800)>, <Question: Question object (67196f336da9e5328f28d5f4)>, <Question: Question object (67196f938274fa736e91b303)>, <Question: Question object (67196fbdb793dd49bf8d0503)>, <Question: Question object (67196fc41cf36110cccc3959)>, <Question: Question object (671976f050fbef6cbbda2c99)>, <Question: Question object (6719773d8505ca3b64207b73)>, <Question: Question object (6719775ab053c1560a90b957)>, <Question: Question object (6719777c3cb4cb3496730ba1)>, <Question: Question object (67197795ffcf17f160b72032)>, <Question: Question object (6719793ebf49036de46da5c9)>, <Question: Question object (671979787eeedb341600ec6a)>, <Question: Question object (671979a24c6f3dfba8282946)>, '...(remaining elements truncated)...']>
>>> Question.objects.raw_mql({'question_text': ""})

Thanks both!

@aclark4life aclark4life requested a review from Jibola October 24, 2024 21:41
Copy link
Collaborator

@timgraham timgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it's working you should see your query reflected like this: {'$match': {'question_text': ""}}.

@Jibola
Copy link
Contributor

Jibola commented Oct 25, 2024

When it's working you should see your query reflected like this: {'$match': {'question_text': ""}}.

I actually think the approach here should even remove the $match.

We should have the expectation be truly "hands free".

i.e.

# For readability
question_collection = mongo_client[DATABASES['default']['NAME']][Question.db_table()]

# Should raise an error (return from the server) that this is invalid MQL -- because it is.
Question.objects.raw_mql([{'question_text': ""}])
# is equivalent to
question_collection.aggregate([{'question_text': ""}])

# This is what the end-user should be expected to write
Question.objects.raw_mql([{'$match': {'question_text': ""}}])
# is equivalent to
question_collection.aggregate([{'$match': {'question_text': ""}}])

Ahhh, I see what you mean now. It only supports match conditions because it still has to fit into the returned object. I think that's good to highlight. I will still say, to my previous point, I'd still rather we remove the guardrails. This way, we can use the translations to even empower them to do groups/projections that fit the object. (That's hacky, but I see this being useful for folks using $search/$vectorSearch operators that would want to do this)

@timgraham
Copy link
Collaborator

Yes, we could require the user to provide an entire pipeline. Perhaps the first step is to read the QuerySet.raw() documentation and understand the pipelines necessary to rewrite the QuerySet.raw() tests. Most of them look like fairly simple match queries but some appear they will be more complicated projects and even group bys.

@Jibola
Copy link
Contributor

Jibola commented Oct 25, 2024

Yes, we could require the user to provide an entire pipeline. Perhaps the first step is to read the QuerySet.raw() documentation and understand the pipelines necessary to rewrite the QuerySet.raw() tests. Most of them look like fairly simple match queries but some appear they will be more complicated projects and even group bys.

Yeah! based on the documentation, my rough understanding is that we can append the $project to maintain expected fields.
I will look through the tests and circle back on some stronger design thoughts.

@timgraham
Copy link
Collaborator

Right. This will have the same caveat from the Django docs as when the user provides a SQL statement: "No checking is done on the pipeline that is passed in to .raw(). Django expects that the pipeline will return a set of documents from the database, but does nothing to enforce that. If the query does not return documents, a (possibly cryptic) error will result."

@aclark4life
Copy link
Collaborator Author

I'm in favor of allowing foot guns here, but I want to clarify that if the end user writes MQL that references data that is not managed by Django, we return an empty query set correct?

@timgraham
Copy link
Collaborator

I'm not sure exactly what you have in mind, but I'd focus on matching the behavior in the raw_query tests. We can consider additional cases (if any) afterward.

- Update raw_mql function signature
@timgraham timgraham changed the title INTPYTHON-348 aggregate via raw_mql INTPYTHON-348 add support for QuerySet.raw_mql() Oct 28, 2024
@aclark4life aclark4life closed this Nov 4, 2024
@aclark4life aclark4life deleted the INTPYTHON-348 branch November 4, 2024 17:34
@aclark4life
Copy link
Collaborator Author

Superseded by #183

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants