Skip to content

INTPYTHON-522, INTPYTHON-524 Add support for Atlas and vector search queries #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

WaVEV
Copy link
Collaborator

@WaVEV WaVEV commented Jun 24, 2025

No description provided.

@WaVEV WaVEV force-pushed the atlas-search-lookups branch from 449b6a3 to ca8a7cf Compare June 26, 2025 02:56
@WaVEV WaVEV force-pushed the atlas-search-lookups branch 3 times, most recently from 9935b25 to a467a57 Compare July 12, 2025 23:32
@WaVEV WaVEV changed the title [WIP] Atlas search lookups Atlas search lookups Jul 14, 2025
@WaVEV WaVEV force-pushed the atlas-search-lookups branch 4 times, most recently from ea2118b to 206b554 Compare July 21, 2025 19:29
@WaVEV WaVEV force-pushed the atlas-search-lookups branch 4 times, most recently from 456028d to 65f22e6 Compare July 22, 2025 05:16
@WaVEV WaVEV marked this pull request as ready for review July 24, 2025 19:39
@WaVEV WaVEV force-pushed the atlas-search-lookups branch from eb6eb07 to e7f4d22 Compare July 26, 2025 02:40
@WaVEV WaVEV force-pushed the atlas-search-lookups branch 2 times, most recently from eed2499 to 99f6548 Compare August 5, 2025 13:35
Comment on lines 51 to 54
``SearchEquals`` objects can be reused and combined with other search
expressions.

See :ref:`search-operations-combinable`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could structure things so we don't need to repeat this boilerplate on every(?) expression.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think we cannot scape, unless we list the operations that could be combined in the section of combinable operations. I like to have this link meanwhile I am reading the docs, so it gives an introduction of some (cool?) behaviour

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to adjust our sphinx theme (or find a new one) that has a "Contents" on the right of the page like Django's docs do (see https://docs.djangoproject.com/en/dev/ref/models/fields/). It will help the browseabilty greatly and the "combined expressions" section won't be buried.

I'm trying to improve the heading structure by adding "Atlas Search expressions" top-level heading, then "combined expressions" might also be at the top-level (to address your concern and make it more visible). Maybe CombinedSearchExpression should be a subsection of "combined expressions" since it's more of a private/advanced API compared to bitwise operators?

Probably "Vector search queries" is another top-level (for SearchVector) and SearchScoreOption is like a utility/helper class?

I didn't make my way through the entire documentation page, but if you're going to work more this weekend, please take a look at my edits and try to make similar adjustments, like adding .. class:: under each heading.

Comment on lines 64 to 68
delayedAssertCountEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertCountEqual)
delayedAssertListEqual = _delayed_assertion(timeout=2)(TransactionTestCase.assertListEqual)
delayedAssertQuerySetEqual = _delayed_assertion(timeout=2)(
TransactionTestCase.assertQuerySetEqual
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the non-delayed versions ever used? Maybe it's better to overwrite the original names so we don't have to write "delayedXXXXX" everywhere. Or maybe the waiting could be done in setUp() after data is inserted? Unless some test inserts more data, essentially only the first test's waiting is needed, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I all the checks are delayed...
Regarding to the second question: right, any test that insert data need to wait. If the data is inserted in the init class, we could only wait once. So If we want to get rid of those delayed, we can wait in the creation part.



@skipUnlessDBFeature("supports_atlas_search")
class SearchEqualsTest(SearchUtilsMixin):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to be consistent in this project about using "Tests" (plural) in the class names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 mmh I didn't notice that. will change.

Comment on lines 112 to 116
boost_score = SearchScoreOption({"boost": {"value": 3}})

qs = Article.objects.annotate(
score=SearchEquals(path="headline", value="cross", score=boost_score)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd inline boost_score, or at least omit the blank line. (Only some tests are inconsistent.)

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things look great, but I've gone through about half of the code (due to size). I will check the test code tomorrow!

Comment on lines 253 to 294
if not has_search:
raise ValueError(
"Cannot combine two `$vectorSearch` operator. "
"If you need to combine them, consider restructuring your query logic or "
"running them as separate queries."
)
raise ValueError(
"Only one $search operation is allowed per query. "
f"Received {len(search_replacements)} search expressions. "
"To combine multiple search expressions, use either a CompoundExpression for "
"fine-grained control or CombinedSearchExpression for simple logical combinations."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two ValueErrors need to be switched.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 the second is the case when:
has_vector_search but it does not has search. I think I should refactor this. It is a bit confusing. the not at the beginning is not helping.

Comment on lines 869 to 871
# Apply De Morgan's Laws.
operator = node.operator.negate() if negated else node.operator
negated = negated != (node.operator == Operator.NOT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is a little confusing because it requires some understanding of negate and the state changes.
I'll leave this as a comment here to be reviewed later.

What's an example of a NOT combinable?
I.e., how would I construct NOT (A AND B) or can this only be done via negate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied De Morgan's Law to get something in the scope of A' operator B'. So:
NOT (A AND B) = Not A or Not B => {SHOULD: [MUST_NOT(A), MUST_NOT(B)] with minimum should in 1.
The other way to handle this is push everything in a must not, but in order to handle: NOT (NOT A)) as A I decided to apply this kind of simplifications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, I almost forgot, we could handle double negation on should. (That are the ors if the minimumShouldMatch is 1 ). long story short
A and B => MUST
not C => MUST_NOT
A or B => SHOULD with minimumShouldMatch is 1
not (A or B) => not A and not B => MUST(MUST_NOT(A), MUST_NOT(B))

When A, B, C are atomic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it a bit. It is simpler now, don't know if it is simple enough 😄

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall PR looks great! I've got some minor corrections, but other than that, it is good to merge from me. Great work! 🚀

It also looks like there's a ReadTheDocs error:

/home/docs/checkouts/readthedocs.org/user_builds/django-mongodb-backend/checkouts/325/docs/source/ref/models/search.rst:654: WARNING: unknown document: 'atlas:atlas-search/scoring/' [ref.doc]

@@ -16,6 +16,12 @@ New features
- Added :class:`~.fields.PolymorphicEmbeddedModelField` and
:class:`~.fields.PolymorphicEmbeddedModelArrayField` for storing a model
instance or list of model instances that may be of more than one model class.
- Added support for MongoDB Atlas Search expressions, including
``SearchAutocomplete``, :class:`.SearchEquals`, ``SearchVector``, and others.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``SearchAutocomplete``, :class:`.SearchEquals`, ``SearchVector``, and others.
``SearchAutocomplete``, :class:`SearchEquals`, ``SearchVector``, and others.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion isn't correct. Without the leading dot, the class won't be resolved properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 maybe I forgot to add the prefix ~.expressions. But I don't know much about docs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works as is. The dot allows the path to be looked up rather than resolved as an exact match.

def create_search_index(cls, model, index_name, definition, type="search"):
collection = cls._get_collection(model)
idx = SearchIndexModel(definition=definition, name=index_name, type=type)
collection.create_search_index(idx)
Copy link
Contributor

@Jibola Jibola Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: For the sake of testing, we can make this a blocking call and check for the index before continuing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 If I don't understand wrong, I have to add a wait for predicate. Right?

@timgraham timgraham changed the title Atlas search lookups INTPYTHON-522 Add support for Atlas search queries Aug 9, 2025
<QuerySet [<Article: headline: title>]>
The ``path`` argument can be either the name of a field (as a string), or a
:class:`~django.db.models.F` instance. The ``value`` argument
Copy link
Collaborator

@timgraham timgraham Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be preferable to wrap the field name in an F object? I don't see F in any tests. (and similarly why wrap strings in Value?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I can add that in the test. For the values, I copied the idea from here. It could work if we don't wrap it, because MongoDB supports directs values here. In the case of the path, it should be an F object. Sometimes the string makes reference to an embedded model, or a column. In that case when F is resolved returns the corresponding column.

Copy link
Collaborator

@timgraham timgraham Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Django's version: "The arguments to SearchVector can be any Expression or the name of a field."

I don't see any F usage, but there is: SearchVector(Value("This week everything is 10% off"). For paths, they are strings: SearchVector("scene__setting", "dialogue").

I think the point is that while F could be passed (since it's an expression), that's not really something that needs to be mentioned because it doesn't make much sense to add the extra complication compared to a raw string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I see. So remove then the models.F from the documentation is the way.

``{"maxEdits": 1}``.
- ``token_order``: Controls token sequence behavior. Accepts values like
``"sequential"`` or ``"any"``.
- ``score``: An optional score expression such as ``{"boost": {"value": 5}}``.
Copy link
Collaborator

@timgraham timgraham Aug 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit redundant to say "optional" in the section "Optional arguments". ;-)

Isn't the expression actually SearchScoreOption({...}, not dictionary?

Maybe something like:

A :class:`SearchScoreOption` to tune the relevance score.

There's a lot of wording variations:
"An optional score argument can be used to customize relevance scoring."
"An optional score expression to adjust relevance."
"An optional score argument may be used to adjust relevance scoring."
"An optional score expression to influence relevance"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It was a dictionary. It should be changed

Comment on lines 545 to 548
fuzzy=None,
match_criteria=None,
synonyms=None,
score=None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to pass all these Nones?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, Will remove.

)
)
Args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording of "Args" is inconsistent. Sometimes we have "Required arguments", "Arguments", "Args", "Optional", "Optional arguments", sometimes no heading. I don't demand absolute adherence to one standard if it doesn't make sense, but using so many different styles doesn't help the reader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the size of this PR, I'm okay with having that consistency be made in a separate PR.


This expression is used internally when combining search expressions with
Python’s bitwise operators (``&``, ``|``, ``~``), and corresponds to
logical operators such as ``and``, ``or``, and ``not``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"such as" ... Are the any more not listed?

@timgraham timgraham changed the title INTPYTHON-522 Add support for Atlas search queries INTPYTHON-522, INTPYTHON-524 Add support for Atlas and vector search queries Aug 11, 2025
params["score"] = self.score.as_mql(compiler, connection)
if self.fuzzy is not None:
if self.fuzzy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be if self.fuzzy is not None because fuzzy={} is valid input

@@ -548,30 +544,31 @@ def search_operator(self, compiler, connection):
}
if self.score:
params["score"] = self.score.as_mql(compiler, connection)
if self.fuzzy is not None:
if self.fuzzy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.fuzzy:
if self.fuzzy is not None:

@@ -636,7 +638,7 @@ def get_search_fields(self, compiler, connection):
def get_source_expressions(self):
return [self.path, self.relation, self.geometry]

def set_source_expressions(self, exprs):
def set_source_expressionsOptional(self, exprs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accident?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def set_source_expressionsOptional(self, exprs):
def set_source_expressions(self, exprs):

)
)
Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the size of this PR, I'm okay with having that consistency be made in a separate PR.

@timgraham timgraham force-pushed the atlas-search-lookups branch from b7ea348 to 87c2ecd Compare August 11, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants