INTPYTHON-348 add support for QuerySet.raw_mql() #173

aclark4life · 2024-10-24T01:47:38Z

For #163. This PR adds a new file manager.py with class MongoManager and updates query.py to include classes

MongoQuerySet
RawQuerySet
RawQuery
RawModelIterable

which returns:


python manage.py shell -c "from polls.models import Question; q = Question(); q.save(); raw_query = Question.objects.raw_mql('db.polls_question.find()'); [i for i in raw_query]"
DEBUG 2024-10-24 01:40:10,444 utils (0.005) db.polls_question.insert_many([{'question_text': '', 'pub_date': None}])
Traceback (most recent call last):
  File "/Users/alex.clark/Developer/just-django/manage.py", line 22, in <module>
    main()
  File "/Users/alex.clark/Developer/just-django/manage.py", line 18, in main
    execute_from_command_line(sys.argv)
  File "/Users/alex.clark/Developer/just-django/src/django/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/Users/alex.clark/Developer/just-django/src/django/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/alex.clark/Developer/just-django/src/django/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/alex.clark/Developer/just-django/src/django/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django/django/core/management/commands/shell.py", line 117, in handle
    exec(options["command"], globals())
  File "<string>", line 1, in <module>
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 414, in __iter__
    self._fetch_all()
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 401, in _fetch_all
    self._result_cache = list(self.iterator())
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 428, in iterator
    yield from RawModelIterable(self)
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 565, in __iter__
    query_iterator = iter(query)
                     ^^^^^^^^^^^
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 511, in __iter__
    self._execute_query()
  File "/Users/alex.clark/Developer/just-django/src/django-mongodb/django_mongodb/query.py", line 551, in _execute_query
    self.cursor.execute(self.sql, params)
    ^^^^^^^^^^^^^^^^^^^
AttributeError: 'Cursor' object has no attribute 'execute'

which perhaps can be fixed by ensuring cursor is a MongoDB cursor and/or query is a MongoQuery. Is this a good approach or am I missing a shorter path to returning iterable results of raw_mql query?

timgraham · 2024-10-24T12:40:25Z

db.polls_question.find() isn't the raw syntax we're after. Since we're using the Question manager (Question.objects), we know what model to use. Take a look at https://github.com/search?q=repo%3Adjango-nonrel%2Fmongodb-engine%20raw_query&type=code for how this worked in the project we forked. The raw query in that case is the argument to collection.find().

This project was changed (9f0ad73) to use collection.aggregate() instead of find() so the raw query syntax will differ. Now raw_query:
https://github.com/mongodb-labs/django-mongodb/blob/17d8a5d09fee3edc04924092b61f278ab5c56f48/django_mongodb/query.py#L52
enables customizing the $match critieria:
https://github.com/mongodb-labs/django-mongodb/blob/17d8a5d09fee3edc04924092b61f278ab5c56f48/django_mongodb/query.py#L88

I'm not sure that really answers your question, but if you can't figure out how to connect the implementation pieces from this tip, I'll have to dive into the code and take a longer look.

aclark4life · 2024-10-24T14:06:52Z

raw query syntax will differ.

Right, for example folks can pass in an aggregation pipeline, I assume, e.g.

https://www.mongodb.com/docs/manual/aggregation/#aggregation-pipeline-example

I'm not sure that really answers your question

That helps! A MongoQuery's raw_query is already included in the Query's aggregation pipeline, if it exists.

connect the implementation pieces from this tip

When someone writes Question.objects.all() we have to translate that to MQL then get the data from MongoDB. When someone writes MQL, we just need to get the data from MQL and that code already exists. So the answer to my question is tentatively "maybe we don't need all those methods". Or if we need them, we need them for something that happens after what happens when someone writes Question.objects.all and gets a QS back.

timgraham · 2024-10-24T14:12:34Z

The user could write QuerySet.objects.raw({size: "medium"}) in the example you linked. raw() only allows custom matching conditions, not aggregations like $group.

aclark4life · 2024-10-24T21:41:02Z

@Jibola pointed out that @timgraham pointed out nonrel already supports this, so here is a 2nd pass. Aside from using

class MongoManager(BaseManager.from_queryset(MongoQuerySet)):

instead of a mixin, I think this is nonrel's implementation. In my testing, I get this output:

>>> Question.objects.raw_mql({'question_text': ""})
DEBUG 2024-10-24 21:34:23,749 utils (0.008) db.polls_question.aggregate([{'$match': {'$expr': {}}}, {'$limit': 21}])
<QuerySet [<Question: Question object (671962ec10fa6748204eed47)>, <Question: Question object (671966bddd9ecff50427569f)>, <Question: Question object (671967ad2b741c96f8292dfb)>, <Question: Question object (67196d1caa2a0b5b8b0beb1f)>, <Question: Question object (67196d4fbb9d46be6ad94f39)>, <Question: Question object (67196d5df268a3b28513c194)>, <Question: Question object (67196e7bc10b5aa61b7343f0)>, <Question: Question object (67196f0b0f7adb7283376800)>, <Question: Question object (67196f336da9e5328f28d5f4)>, <Question: Question object (67196f938274fa736e91b303)>, <Question: Question object (67196fbdb793dd49bf8d0503)>, <Question: Question object (67196fc41cf36110cccc3959)>, <Question: Question object (671976f050fbef6cbbda2c99)>, <Question: Question object (6719773d8505ca3b64207b73)>, <Question: Question object (6719775ab053c1560a90b957)>, <Question: Question object (6719777c3cb4cb3496730ba1)>, <Question: Question object (67197795ffcf17f160b72032)>, <Question: Question object (6719793ebf49036de46da5c9)>, <Question: Question object (671979787eeedb341600ec6a)>, <Question: Question object (671979a24c6f3dfba8282946)>, '...(remaining elements truncated)...']>
>>> Question.objects.raw_mql({'question_text': ""})

Thanks both!

timgraham

When it's working you should see your query reflected like this: {'$match': {'question_text': ""}}.

django_mongodb/manager.py

Jibola · 2024-10-25T13:34:49Z

When it's working you should see your query reflected like this: {'$match': {'question_text': ""}}.

I actually think the approach here should even remove the $match.

We should have the expectation be truly "hands free".

i.e.

# For readability
question_collection = mongo_client[DATABASES['default']['NAME']][Question.db_table()]

# Should raise an error (return from the server) that this is invalid MQL -- because it is.
Question.objects.raw_mql([{'question_text': ""}])
# is equivalent to
question_collection.aggregate([{'question_text': ""}])

# This is what the end-user should be expected to write
Question.objects.raw_mql([{'$match': {'question_text': ""}}])
# is equivalent to
question_collection.aggregate([{'$match': {'question_text': ""}}])

Ahhh, I see what you mean now. It only supports match conditions because it still has to fit into the returned object. I think that's good to highlight. I will still say, to my previous point, I'd still rather we remove the guardrails. This way, we can use the translations to even empower them to do groups/projections that fit the object. (That's hacky, but I see this being useful for folks using $search/$vectorSearch operators that would want to do this)

timgraham · 2024-10-25T13:49:02Z

Yes, we could require the user to provide an entire pipeline. Perhaps the first step is to read the QuerySet.raw() documentation and understand the pipelines necessary to rewrite the QuerySet.raw() tests. Most of them look like fairly simple match queries but some appear they will be more complicated projects and even group bys.

Jibola · 2024-10-25T14:09:55Z

Yes, we could require the user to provide an entire pipeline. Perhaps the first step is to read the QuerySet.raw() documentation and understand the pipelines necessary to rewrite the QuerySet.raw() tests. Most of them look like fairly simple match queries but some appear they will be more complicated projects and even group bys.

Yeah! based on the documentation, my rough understanding is that we can append the $project to maintain expected fields.
I will look through the tests and circle back on some stronger design thoughts.

timgraham · 2024-10-25T14:15:01Z

Right. This will have the same caveat from the Django docs as when the user provides a SQL statement: "No checking is done on the pipeline that is passed in to .raw(). Django expects that the pipeline will return a set of documents from the database, but does nothing to enforce that. If the query does not return documents, a (possibly cryptic) error will result."

aclark4life · 2024-10-25T15:26:16Z

I'm in favor of allowing foot guns here, but I want to clarify that if the end user writes MQL that references data that is not managed by Django, we return an empty query set correct?

timgraham · 2024-10-25T15:32:45Z

I'm not sure exactly what you have in mind, but I'd focus on matching the behavior in the raw_query tests. We can consider additional cases (if any) afterward.

- Update raw_mql function signature

aclark4life · 2024-11-04T17:47:11Z

Superseded by #183

INTPYTHON-348 aggregate via raw_mql

dc5238a

aclark4life requested a review from timgraham October 24, 2024 01:48

INTPYTHON-348 aggregate via raw_mql

3e63192

aclark4life requested a review from Jibola October 24, 2024 21:41

timgraham reviewed Oct 24, 2024

View reviewed changes

django_mongodb/manager.py Outdated Show resolved Hide resolved

INTPYTHON-348 aggregate via raw_mql

ce2138b

- Update raw_mql function signature

timgraham changed the title ~~INTPYTHON-348 aggregate via raw_mql~~ INTPYTHON-348 add support for QuerySet.raw_mql() Oct 28, 2024

aclark4life added 5 commits October 28, 2024 12:37

Merge branch 'mongodb-labs:main' into INTPYTHON-348

3cf968e

Merge branch 'main' into INTPYTHON-348

5707234

INTPYTHON-348 aggregate via raw_mql

d6ad9a1

INTPYTHON-348 aggregate via raw_mql

ebecbd7

INTPYTHON-348 aggregate via raw_mql

3933288

aclark4life closed this Nov 4, 2024

aclark4life deleted the INTPYTHON-348 branch November 4, 2024 17:34

INTPYTHON-348 add support for QuerySet.raw_mql() #173

INTPYTHON-348 add support for QuerySet.raw_mql() #173

Uh oh!

Conversation

aclark4life commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timgraham commented Oct 24, 2024

Uh oh!

aclark4life commented Oct 24, 2024

Uh oh!

timgraham commented Oct 24, 2024

Uh oh!

aclark4life commented Oct 24, 2024

Uh oh!

timgraham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jibola commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timgraham commented Oct 25, 2024

Uh oh!

Jibola commented Oct 25, 2024

Uh oh!

timgraham commented Oct 25, 2024

Uh oh!

aclark4life commented Oct 25, 2024

Uh oh!

timgraham commented Oct 25, 2024

Uh oh!

aclark4life commented Nov 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aclark4life commented Oct 24, 2024 •

edited

Loading

Jibola commented Oct 25, 2024 •

edited

Loading