INTPYTHON-522, INTPYTHON-524 Add support for Atlas and vector search queries #325

WaVEV · 2025-06-24T14:29:51Z

No description provided.

django_mongodb_backend/functions.py

django_mongodb_backend/compiler.py

tests/queries_/test_search.py

django_mongodb_backend/expressions/builtins.py

docs/source/ref/models/search.rst

timgraham · 2025-08-05T15:00:55Z

docs/source/ref/models/search.rst

+``SearchEquals`` objects can be reused and combined with other search
+expressions.
+
+See :ref:`search-operations-combinable`


I wonder if we could structure things so we don't need to repeat this boilerplate on every(?) expression.

🤔 I think we cannot scape, unless we list the operations that could be combined in the section of combinable operations. I like to have this link meanwhile I am reading the docs, so it gives an introduction of some (cool?) behaviour

We need to adjust our sphinx theme (or find a new one) that has a "Contents" on the right of the page like Django's docs do (see https://docs.djangoproject.com/en/dev/ref/models/fields/). It will help the browseabilty greatly and the "combined expressions" section won't be buried.

I'm trying to improve the heading structure by adding "Atlas Search expressions" top-level heading, then "combined expressions" might also be at the top-level (to address your concern and make it more visible). Maybe CombinedSearchExpression should be a subsection of "combined expressions" since it's more of a private/advanced API compared to bitwise operators?

Probably "Vector search queries" is another top-level (for SearchVector) and SearchScoreOption is like a utility/helper class?

I didn't make my way through the entire documentation page, but if you're going to work more this weekend, please take a look at my edits and try to make similar adjustments, like adding .. class:: under each heading.

docs/source/ref/models/search.rst

tests/queries_/test_search.py

timgraham · 2025-08-05T15:10:37Z

tests/queries_/test_search.py

+    delayedAssertCountEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertCountEqual)
+    delayedAssertListEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertListEqual)
+    delayedAssertQuerySetEqual = _delayed_assertion(timeout=2)(
+        TransactionTestCase.assertQuerySetEqual
+    )


Are the non-delayed versions ever used? Maybe it's better to overwrite the original names so we don't have to write "delayedXXXXX" everywhere. Or maybe the waiting could be done in setUp() after data is inserted? Unless some test inserts more data, essentially only the first test's waiting is needed, right?

No, I all the checks are delayed...
Regarding to the second question: right, any test that insert data need to wait. If the data is inserted in the init class, we could only wait once. So If we want to get rid of those delayed, we can wait in the creation part.

timgraham · 2025-08-05T15:12:06Z

tests/queries_/test_search.py

+
+
+@skipUnlessDBFeature("supports_atlas_search")
+class SearchEqualsTest(SearchUtilsMixin):


I've tried to be consistent in this project about using "Tests" (plural) in the class names.

🤔 mmh I didn't notice that. will change.

timgraham · 2025-08-05T15:16:37Z

tests/queries_/test_search.py

+        boost_score = SearchScoreOption({"boost": {"value": 3}})
+
+        qs = Article.objects.annotate(
+            score=SearchEquals(path="headline", value="cross", score=boost_score)
+        )


I'd inline boost_score, or at least omit the blank line. (Only some tests are inconsistent.)

Jibola

Things look great, but I've gone through about half of the code (due to size). I will check the test code tomorrow!

django_mongodb_backend/expressions/builtins.py

django_mongodb_backend/compiler.py

Jibola · 2025-08-05T18:32:33Z

django_mongodb_backend/compiler.py

+            if not has_search:
+                raise ValueError(
+                    "Cannot combine two `$vectorSearch` operator. "
+                    "If you need to combine them, consider restructuring your query logic or "
+                    "running them as separate queries."
+                )
+            raise ValueError(
+                "Only one $search operation is allowed per query. "
+                f"Received {len(search_replacements)} search expressions. "
+                "To combine multiple search expressions, use either a CompoundExpression for "
+                "fine-grained control or CombinedSearchExpression for simple logical combinations."
+            )


I think these two ValueErrors need to be switched.

🤔 the second is the case when:
has_vector_search but it does not has search. I think I should refactor this. It is a bit confusing. the not at the beginning is not helping.

django_mongodb_backend/compiler.py

django_mongodb_backend/expressions/builtins.py

django_mongodb_backend/compiler.py

django_mongodb_backend/expressions/search.py

Jibola · 2025-08-05T21:28:53Z

django_mongodb_backend/expressions/search.py

+        # Apply De Morgan's Laws.
+        operator = node.operator.negate() if negated else node.operator
+        negated = negated != (node.operator == Operator.NOT)


This logic is a little confusing because it requires some understanding of negate and the state changes.
I'll leave this as a comment here to be reviewed later.

What's an example of a NOT combinable?
I.e., how would I construct NOT (A AND B) or can this only be done via negate?

I applied De Morgan's Law to get something in the scope of A' operator B'. So:
NOT (A AND B) = Not A or Not B => {SHOULD: [MUST_NOT(A), MUST_NOT(B)] with minimum should in 1.
The other way to handle this is push everything in a must not, but in order to handle: NOT (NOT A)) as A I decided to apply this kind of simplifications.

And, I almost forgot, we could handle double negation on should. (That are the ors if the minimumShouldMatch is 1 ). long story short
A and B => MUST
not C => MUST_NOT
A or B => SHOULD with minimumShouldMatch is 1
not (A or B) => not A and not B => MUST(MUST_NOT(A), MUST_NOT(B))

When A, B, C are atomic.

I refactored it a bit. It is simpler now, don't know if it is simple enough 😄

django_mongodb_backend/expressions/search.py

Jibola

Overall PR looks great! I've got some minor corrections, but other than that, it is good to merge from me. Great work! 🚀

It also looks like there's a ReadTheDocs error:

/home/docs/checkouts/readthedocs.org/user_builds/django-mongodb-backend/checkouts/325/docs/source/ref/models/search.rst:654: WARNING: unknown document: 'atlas:atlas-search/scoring/' [ref.doc]

docs/source/releases/5.2.x.rst

Jibola · 2025-08-06T15:18:21Z

tests/queries_/test_search.py

+    def create_search_index(cls, model, index_name, definition, type="search"):
+        collection = cls._get_collection(model)
+        idx = SearchIndexModel(definition=definition, name=index_name, type=type)
+        collection.create_search_index(idx)


NIT: For the sake of testing, we can make this a blocking call and check for the index before continuing.

🤔 If I don't understand wrong, I have to add a wait for predicate. Right?

tests/queries_/models.py

tests/queries_/test_search.py

timgraham · 2025-08-09T22:23:53Z

docs/source/ref/models/search.rst

+    <QuerySet [<Article: headline: title>]>
+
+The ``path`` argument can be either the name of a field (as a string), or a
+:class:`~django.db.models.F` instance. The ``value`` argument


Why would it be preferable to wrap the field name in an F object? I don't see F in any tests. (and similarly why wrap strings in Value?)

🤔 I can add that in the test. For the values, I copied the idea from here. It could work if we don't wrap it, because MongoDB supports directs values here. In the case of the path, it should be an F object. Sometimes the string makes reference to an embedded model, or a column. In that case when F is resolved returns the corresponding column.

For Django's version: "The arguments to SearchVector can be any Expression or the name of a field."

I don't see any F usage, but there is: SearchVector(Value("This week everything is 10% off"). For paths, they are strings: SearchVector("scene__setting", "dialogue").

I think the point is that while F could be passed (since it's an expression), that's not really something that needs to be mentioned because it doesn't make much sense to add the extra complication compared to a raw string.

🤔 I see. So remove then the models.F from the documentation is the way.

Yes, isn't wrapping in F not semantically correct? An F object represents the value of field, not its path. That's why you had to add the as_path argument as a workaround. Should the path strings be wrapped in Col instead?

timgraham · 2025-08-09T23:21:02Z

docs/source/ref/models/search.rst

+  ``{"maxEdits": 1}``.
+- ``token_order``: Controls token sequence behavior. Accepts values like
+  ``"sequential"`` or ``"any"``.
+- ``score``: An optional score expression such as ``{"boost": {"value": 5}}``.


It's a bit redundant to say "optional" in the section "Optional arguments". ;-)

Isn't the expression actually SearchScoreOption({...}, not dictionary?

Maybe something like:

A :class:`SearchScoreOption` to tune the relevance score.

There's a lot of wording variations:
"An optional score argument can be used to customize relevance scoring."
"An optional score expression to adjust relevance."
"An optional score argument may be used to adjust relevance scoring."
"An optional score expression to influence relevance"

Yes, It was a dictionary. It should be changed

docs/source/ref/models/search.rst

timgraham · 2025-08-09T23:31:49Z

docs/source/ref/models/search.rst

+        )
+    )
+
+Args:


The wording of "Args" is inconsistent. Sometimes we have "Required arguments", "Arguments", "Args", "Optional", "Optional arguments", sometimes no heading. I don't demand absolute adherence to one standard if it doesn't make sense, but using so many different styles doesn't help the reader.

Due to the size of this PR, I'm okay with having that consistency be made in a separate PR.

I didn't read this comment. Sorry to reaching it so late. The idea behind was:
if the parameters (optional or not) is only one, do not make any title, just listed it and that it. But it is inconsistent, I forgot to add the title in some sections 😬.
Now I am wondering if any class should be documented, because this one is a private class.

timgraham · 2025-08-09T23:35:51Z

docs/source/ref/models/search.rst

+
+This expression is used internally when combining search expressions with
+Python’s bitwise operators (``&``, ``|``, ``~``), and corresponds to
+logical operators such as ``and``, ``or``, and ``not``.


"such as" ... Are the any more not listed?

No, they are all.

Jibola · 2025-08-11T13:22:53Z

django_mongodb_backend/expressions/search.py

            params["score"] = self.score.as_mql(compiler, connection)
-        if self.fuzzy is not None:
+        if self.fuzzy:


This should still be if self.fuzzy is not None because fuzzy={} is valid input

Jibola · 2025-08-11T13:24:50Z

django_mongodb_backend/expressions/search.py

@@ -548,30 +544,31 @@ def search_operator(self, compiler, connection):
        }
        if self.score:
            params["score"] = self.score.as_mql(compiler, connection)
-        if self.fuzzy is not None:
+        if self.fuzzy:


Suggested change

if self.fuzzy:

if self.fuzzy is not None:

django_mongodb_backend/expressions/search.py

Jibola · 2025-08-11T13:37:27Z

docs/source/ref/models/search.rst

+        )
+    )
+
+Args:


Due to the size of this PR, I'm okay with having that consistency be made in a separate PR.

…queries

timgraham · 2025-08-11T18:22:45Z

docs/source/ref/models/search.rst

+An optional ``score`` :class:`SearchScoreOption` argument to tune the
+relevance score.


This isn't a standalone sentence. Suggestion: "The optional score argument is a SearchScoreOption that tunes the relevance score."

timgraham · 2025-08-11T18:23:03Z

docs/source/ref/models/search.rst

+``SearchAutocomplete``
+----------------------
+
+.. class:: SearchAutocomplete(path, query, *, fuzzy=None, token_order=None, score=None)


Add these for the rest of the file.

timgraham · 2025-08-11T18:24:03Z

docs/source/ref/models/search.rst

+``SearchExists``
+----------------
+
+Atlas Search expression that matches documents where a field exists.


Adjust the style:

chop "Atlas Search expression that ..." prefixes in favor of "Matches..." and "Uses..." (next paragraph).

timgraham · 2025-08-11T18:25:15Z

docs/source/ref/models/search.rst

+This expression uses the
+:doc:`moreLikeThis operator <atlas:atlas-search/morelikethis>` to retrieve


reflow cases like this. You can break anywhere inside a refs/doc/, e.g.

This expression uses the :doc:`moreLikeThis operator <atlas:atlas-search/morelikethis>` to retrieve...

timgraham · 2025-08-11T18:27:36Z

tests/atlas_search_/test_search.py

+        self.assertAlmostEqual(scored.score, 10.0, places=2)
+
+
+@unittest.expectedFailure


What's the problem? (add a comment...)

timgraham · 2025-08-11T18:29:33Z

docs/source/ref/models/index.rst

@@ -11,3 +11,4 @@ Model API reference.
   querysets
   models
   indexes
+   search


need a mention on docs/source/index.rst too (have forgotten this on some recent changes)

timgraham · 2025-08-11T20:44:43Z

django_mongodb_backend/expressions/search.py

+    def __str__(self):
+        cls = self.identity[0]
+        kwargs = dict(self.identity[1:])
+        arg_str = ", ".join(f"{k}={v!r}" for k, v in kwargs.items())
+        return f"<{cls.__name__}({arg_str})>"


Is it tested? (I'm trying to get a coverage report working but I didn't spot any str( in tests.)

it is not. Will add a test.

timgraham · 2025-08-11T20:46:21Z

tests/atlas_search_/test_search.py

+    assertCountEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertCountEqual)
+    assertListEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertListEqual)
+    assertQuerySetEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertQuerySetEqual)


For future readers, please a a comment explaining the reasons the delayed assertions are needed (as discussed in PR comments). Likewise for index waiting.

timgraham · 2025-08-11T20:47:21Z

tests/atlas_search_/test_search.py

+from .models import Article, Location, Writer
+
+
+def wait_until_index_ready(collection, index_name, timeout: float = 30, interval: float = 0.5):


30 seconds is a long time. It would be nice to explain if it really could take that long.

I don't think so and I hope it never takes that long, will set 5 seconds as a default.

timgraham · 2025-08-11T20:47:56Z

tests/atlas_search_/test_search.py

+    raise TimeoutError(f"Index {index_name} not ready after {timeout} seconds")
+
+
+def _delayed_assertion(timeout: float = 120, interval: float = 0.5):


Could it really take 120 seconds... or 2 minutes?

timgraham · 2025-08-12T00:49:20Z

To generate a coverage report:

$ pip install coverage
$ coverage run --source=../django-mongodb/django_mongodb_backend ./tests/runtests.py --settings=test_mongo atlas_search_ expressions_
$ coverage html

Pretty good. Tests for some methods and some expression parameters are missing. No tests for SearchQueryString!

timgraham · 2025-08-12T00:53:12Z

django_mongodb_backend/expressions/search.py

+        token_order: Optional value for `"tokenOrder"`; controls sequential vs.
+                     any-order token matching.
+        score: Optional[SearchScore] expression to adjust score relevance
+               (e.g., `{"boost": {"value": 5}}`).


SearchScore({...} (or maybe the example isn't needed since none of the other docstrings have it. (I don't think docstrings should be very elaborate documentation, more just to help developers working on the code... they can always consult the proper docs if need be.)

timgraham · 2025-08-12T01:02:48Z

docs/source/ref/models/search.rst

+    <QuerySet [<Article: headline: title>]>
+
+The ``path`` argument can be either the name of a field (as a string), or a
+:class:`~django.db.models.F` instance. The ``value`` argument


Yes, isn't wrapping in F not semantically correct? An F object represents the value of field, not its path. That's why you had to add the as_path argument as a workaround. Should the path strings be wrapped in Col instead?

WaVEV · 2025-08-12T03:08:02Z

SearchQueryString

Added.

timgraham reviewed Jun 25, 2025

View reviewed changes

django_mongodb_backend/functions.py Outdated Show resolved Hide resolved

timgraham reviewed Jun 25, 2025

View reviewed changes

django_mongodb_backend/functions.py Outdated Show resolved Hide resolved

WaVEV force-pushed the atlas-search-lookups branch from 449b6a3 to ca8a7cf Compare June 26, 2025 02:56

WaVEV commented Jul 7, 2025

View reviewed changes

django_mongodb_backend/compiler.py Show resolved Hide resolved

WaVEV force-pushed the atlas-search-lookups branch 3 times, most recently from 9935b25 to a467a57 Compare July 12, 2025 23:32

WaVEV changed the title ~~[WIP] Atlas search lookups~~ Atlas search lookups Jul 14, 2025

WaVEV force-pushed the atlas-search-lookups branch 4 times, most recently from ea2118b to 206b554 Compare July 21, 2025 19:29

timgraham reviewed Jul 22, 2025

View reviewed changes

WaVEV force-pushed the atlas-search-lookups branch 4 times, most recently from 456028d to 65f22e6 Compare July 22, 2025 05:16

WaVEV marked this pull request as ready for review July 24, 2025 19:39

WaVEV force-pushed the atlas-search-lookups branch from eb6eb07 to e7f4d22 Compare July 26, 2025 02:40

WaVEV force-pushed the atlas-search-lookups branch 2 times, most recently from eed2499 to 99f6548 Compare August 5, 2025 13:35

timgraham reviewed Aug 5, 2025

View reviewed changes

Jibola reviewed Aug 5, 2025

View reviewed changes

Jibola requested changes Aug 6, 2025

View reviewed changes

timgraham changed the title ~~Atlas search lookups~~ INTPYTHON-522 Add support for Atlas search queries Aug 9, 2025

timgraham reviewed Aug 9, 2025

View reviewed changes

timgraham changed the title ~~INTPYTHON-522 Add support for Atlas search queries~~ INTPYTHON-522, INTPYTHON-524 Add support for Atlas and vector search queries Aug 11, 2025

Jibola approved these changes Aug 11, 2025

View reviewed changes

Create django_mongodb_backend.expressions package

395dd7e

INTPYTHON-522, INTPYTHON-524 Add support for Atlas and vector search …

87c2ecd

…queries

timgraham force-pushed the atlas-search-lookups branch from b7ea348 to 87c2ecd Compare August 11, 2025 13:51

timgraham reviewed Aug 11, 2025

View reviewed changes

WaVEV added 4 commits August 11, 2025 20:40

Update docs.

6c116b3

check fuzzy nullability

2a6a128

Add str unit tests

2e999fd

Fix unit tests.

5948d22

timgraham reviewed Aug 12, 2025

View reviewed changes

WaVEV added 3 commits August 11, 2025 23:48

Add SearchQueryString tests

a5cbd23

remove super().setUpClass

537dda0

Fix str test.

d6302bf

WaVEV added 2 commits August 12, 2025 00:59

Improve coverage.

428fe91

improve coverage

7735351



		@skipUnlessDBFeature("supports_atlas_search")
		class SearchEqualsTest(SearchUtilsMixin):

+                      )
+                  )
+              Args:

		An optional ``score`` :class:`SearchScoreOption` argument to tune the
		relevance score.

		This expression uses the
		:doc:`moreLikeThis operator <atlas:atlas-search/morelikethis>` to retrieve

		self.assertAlmostEqual(scored.score, 10.0, places=2)


		@unittest.expectedFailure

		from .models import Article, Location, Writer


		def wait_until_index_ready(collection, index_name, timeout: float = 30, interval: float = 0.5):

		raise TimeoutError(f"Index {index_name} not ready after {timeout} seconds")


		def _delayed_assertion(timeout: float = 120, interval: float = 0.5):

INTPYTHON-522, INTPYTHON-524 Add support for Atlas and vector search queries #325

Are you sure you want to change the base?

INTPYTHON-522, INTPYTHON-524 Add support for Atlas and vector search queries #325

Conversation

WaVEV commented Jun 24, 2025 • edited by timgraham Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jibola left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jibola left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jibola Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

timgraham Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timgraham Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

WaVEV commented Jun 24, 2025 •

edited by timgraham

Loading

Jibola left a comment •

edited

Loading

Jibola Aug 6, 2025 •

edited

Loading

timgraham Aug 9, 2025 •

edited

Loading

timgraham Aug 9, 2025 •

edited

Loading

timgraham Aug 9, 2025 •

edited

Loading