Skip to content

Mango match_failures/2 function#5858

Open
jcoglan wants to merge 12 commits into
apache:mainfrom
neighbourhoodie:mango-match-failures
Open

Mango match_failures/2 function#5858
jcoglan wants to merge 12 commits into
apache:mainfrom
neighbourhoodie:mango-match-failures

Conversation

@jcoglan
Copy link
Copy Markdown
Contributor

@jcoglan jcoglan commented Jan 21, 2026

Overview

This PR represents work so far on a version of mango_selector:match/2 than can return a representation of how the input fails to match the selector, rather than just a boolean. There are a few commits of developing this behaviour incrementally before everything "snaps into place" to get us code paths that can produce failure descriptions, and retain the original boolean behaviour that avoids creating a lot of ephemeral failure lists, and can short-circuit on compound operators.

The rough steps here are as follows; I'm retaining all these commits for now in case we want to compare different designs for code complexity and performance.

  • Add a lot of unit tests to mango_selector that check every match operator and its negation. This is necessary since various compound operators like $or and $allMatch have surprising edge case behaviour on empty lists and we need to make sure this is not broken. Mostly these tests check that if selector S returns true on an input then { "$not": S } returns false and vice versa. There are some exceptions to this due to how $or works and how $and and $or are normalised.
  • Replace the existing implementation with one where every operator returns a possibly-empty list of failures, instead of a boolean. match_int then converts this to a bool on the way out.
  • Replace the Cmp argument to everything with a ctx record that contains cmp, as well as other things needed for failure generation, e.g. the path to the current value, whether matching is currently negated, etc.
  • Try to fix $allMatch and $elemMatch normalisation to avoid the complexity that comes from not being able to apply DeMorgan to them, due to $allMatch being defined to return false for empty lists. This is a breaking change that is reverted later since we decided to retain existing behaviour above all.
  • Implement negation handling, where the presence of a $not has to be communicated to nodes lower down the tree in order to produce good failure messages. Not all negation can be normalised out of the tree and so all operators need to handle being negated during evaluation.
  • Finally, collapse all the complexity into an implementation that supports both the old and new behaviours.

The idea in the design I've ended up with that tries to minimise both complexity and runtime cost is:

  • Add #ctx.verbose which indicates whether a detailed failure description is wanted.
  • Keep the original implementations of all operators as the response when passed #ctx{verbose=false}, i.e. when only a boolean result is needed.
  • For all leaf operators, the #ctx{verbose=true} case can be implemented by calling the #ctx{verbose=false} case, and creating a #failure record if this returns false.
  • #ctx{verbose=false} cases do not need to deal with #ctx{negate=true}; they continue to return their original result and let $not invert it. We only need special handling of #ctx{negate=true} in verbose mode, where the $not operator passes its effect down via the #ctx. This reduces the number of cases each operator has to deal with to basically: non-verbose mode, and positive and negative verbose cases.
  • For compound operators, special code is needed to gather up the failures from internal selectors and deal with edge cases in a way that's consistent with the original implementation.
  • #ctx.path is only updated in #ctx{verbose=true} code paths so this expense is avoided in non-verbose mode. Path items are added to the front of this list as that's cheaper than doing Path ++ [Item]; we would reverse these before returning to a client.
  • #failure records retain a #ctx from where they can access the path and negation state, in order to generate a good human-readable error message later on.
  • The tests are updated to make sure that both verbose modes return consistent results, i.e. if verbose=false returns true, then verbose=true returns [], and if the former returns false, the latter gives a non-empty list. These are all passing.

Testing recommendations

We should benchmark this in its current version, and both verbose modes of this version, against a substantial indexing workload to look for performance regressions. Or, if performance is equivalent in both verbose modes, we can remove a lot of redundancy by removing the verbose flag entirely.

Related Issues or Pull Requests

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

@nickva
Copy link
Copy Markdown
Contributor

nickva commented Jan 21, 2026

That's a nice approach using a context record in place of the Cmp arg.

All the extra eunit tests are awesome. If you want, could even put them in a separate PR and we'd merge them right away. It would make it easier to review subsequent PRs because we can obviously see all the existing tests pass.

@jcoglan jcoglan force-pushed the mango-match-failures branch 3 times, most recently from 5076677 to 7f8f999 Compare February 4, 2026 09:57
@jcoglan jcoglan force-pushed the mango-match-failures branch 2 times, most recently from 69fe09f to fa65b74 Compare February 11, 2026 14:12
@jcoglan jcoglan changed the base branch from main to 3.5.x February 13, 2026 09:38
@jcoglan jcoglan mentioned this pull request Feb 13, 2026
6 tasks
@jcoglan jcoglan force-pushed the mango-match-failures branch from fa65b74 to 1c23788 Compare February 13, 2026 11:39
@jcoglan jcoglan changed the base branch from 3.5.x to main February 13, 2026 11:40
@jcoglan jcoglan mentioned this pull request Feb 20, 2026
6 tasks
@janl janl added this to the 3.8 milestone Feb 27, 2026
@jcoglan jcoglan force-pushed the mango-match-failures branch from 1c23788 to 82142aa Compare March 13, 2026 16:31
@jcoglan jcoglan force-pushed the mango-match-failures branch from 82142aa to 74d02e0 Compare March 23, 2026 14:28
@jcoglan jcoglan marked this pull request as ready for review March 23, 2026 14:31
@jcoglan jcoglan force-pushed the mango-match-failures branch 3 times, most recently from 407b81a to e21b4b8 Compare April 9, 2026 12:00
@janl
Copy link
Copy Markdown
Member

janl commented Apr 22, 2026

Nice work James. I’ve done some benchmarks with the bulkbench script to see what this gives us:

   no vdu: real 0m18.122s
   js vdu: real 0m55.438s (3.06x)
mango vdu: real 0m21.722s (1.19x)

@jcoglan jcoglan force-pushed the mango-match-failures branch from e21b4b8 to ba11d38 Compare April 22, 2026 13:15
Copy link
Copy Markdown
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a large PR and I don't know mango very well so I did the best I could. Sorry it took a while to get to. Those are just a few things I noticed at first, I haven't run it yet locally to play with.

Comment thread src/mango/src/mango_selector.erl Outdated
Comment thread src/couch/src/couch_query_servers.erl Outdated
ok ->
ok;
{[{<<"forbidden">>, Message}, {<<"failures">>, Failures}]} ->
throw({forbidden, Message, Failures});
Copy link
Copy Markdown
Contributor

@nickva nickva Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're changing the forbidden tuple shape. Make sure to check all the places which handle {forbidden, _} and now they may have to handle the triple-arg version. I noticed fabric_doc_update needs to handle this src/fabric/src/fabric_doc_update.erl

We should also call out and see if this will affect online cluster upgrades (new worker nodes throwing it and old coordinator nodes getting function clause errors). It maybe be fine, just needs an extra careful look at it.

Comment thread src/mango/src/mango_native_proc.erl
Comment thread src/mango/src/mango_native_proc.erl Outdated
Comment thread src/mango/src/mango_native_proc.erl Outdated
case mango_selector:has_allowed_fields(Selector, [<<"newDoc">>, <<"oldDoc">>]) of
false ->
Msg =
<<"'validate_doc_update' may only contain 'newDoc' and 'oldDoc' as top-level fields">>,
Copy link
Copy Markdown
Contributor

@nickva nickva Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder when this would fire, would it be on every time we attempt to insert a doc we'd crash the prompt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added this b/c @janl encountered a really confusing error during testing due to typing "selector": { "x": 0 } instead of "selector": { "newDoc": { "x": 0 } }. If you omit newDoc, the resulting validation failure is confusing, and we thought it better to alert the user that they are probably making a mistake instead of returning an empty result set.

Copy link
Copy Markdown
Contributor

@nickva nickva May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that it seems we are not validating the inserted document here, instead we're validating the VDU itself

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not validating the design doc when it is uploaded; this would not be consistent with the existing behaviour of PUT /db/_design/doc. You are currently allowed to upload a JS design doc with invalid code in it. Instead, this triggers when normal doc writes occur; if the Mango VDU does not make sense then we return an error about that, rather than giving the user a misleading error implying their doc is invalid.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider validating on ddoc-write here, it feels like the better behaviour

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are currently allowed to upload a JS design doc with invalid code in it. Instead, this triggers when normal doc writes occur; if the Mango VDU does not make sense then we return an error about that, rather than giving the user a misleading error implying their doc is invalid.

But if we could we would validate the JS VDU during insertion. It's a bit how we check compilation during ddoc inserts "does it even compile? -if not, we fail the ddoc insertion to start with". If we can do that early with Mango VDUs, we should, then we don't have to worry about misleading the user later because by the virtue of rejecting invalid VDUs we will only have deal with invalid user documents as all Mango VDU will be valid

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I thought the last time I checked this that CouchDB allows writing ddocs with malformed (i.e. does not even parse) JS code in them, but I just checked and this is not the case. I can see about moving this check to when the Mango ddoc is uploaded.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I wasn't quite right. map functions are checked when the ddoc is updated, but validate_doc_update is not. The compilation error is surfaced when other docs are written and we attempt to invoke the VDU function.

$ cdb '/asd/_design/foo' -X PUT -d '{ "validate_doc_update": "function (doc) {" }'
{"ok":true,"id":"_design/foo","rev":"1-416ed8128d308d5abc6b4745a64394e5"}

$ cdb '/asd/doc' -X PUT -d '{}'
{"error":"compilation_error","reason":"SyntaxError: unexpected token in expression: '' (function (doc) {)"}

Personally I think this behaviour should be consistent between JS and Mango VDUs, and if we want to preemptively validate the v-d-u field we should do this for both backends.

Comment thread src/mango/src/mango_selector.erl Outdated
@jcoglan jcoglan force-pushed the mango-match-failures branch from ba11d38 to 557d5ed Compare May 21, 2026 15:49
@jcoglan
Copy link
Copy Markdown
Contributor Author

jcoglan commented May 26, 2026

I think the only thing left to sort out here is how the errors are passed back to the HTTP/fabric frontend and back to the client, i.e. the forbidden structure. What I have done here is a hack b/c I found it hard to figure out how I was supposed to implement this, and this was something I managed to get working just enough to have a client interact with the functionality. I could use some guidance on doing it properly.

@jcoglan jcoglan force-pushed the mango-match-failures branch from 557d5ed to 6107913 Compare May 26, 2026 13:06
@jcoglan
Copy link
Copy Markdown
Contributor Author

jcoglan commented May 28, 2026

Been looking into how the errors should be passed from mango_native_proc back to the client and how this affects the use of {forbidden, _} tuples.

Many places in the codebase throw or catch (or otherwise create/use) the structure {forbidden, Msg}. forbidden is an atom and Msg is usually a binary, though it can be a string, a list (e.g. ["by_node not an object"]) or a tuple (e.g. chttpd_auth_request throws {forbidden, {Error, Reason}}).

The specific code path for VDUs is that the actual VDU engine (a JS process or mango_native_proc) throws a JS object like { forbidden: Msg }, which is {[{<<"forbidden">>, Msg}]} in Erlang. couch_query_servers:validate_doc_update() catches {[{<<"forbidden">>, Msg}]} and converts it into {forbidden, Msg} i.e. a normal Erlang pair with forbidden as an atom. This is re-thrown and passed back through fabric and other request-handling machinery until it ends up in the HTTP layer.

chttpd has two handlers relevant here:

  • error_info({forbidden, Msg}) produces a 403 response with { "error": "forbidden", "reason": Msg }
  • error_info({forbidden, Error, Msg}) produces a 403 response with { "error": Error, "reason": Msg }

The problem we have is: how to send the list of failure objects from mango_native_proc back to the client. The current design means it's safe for couch_query_servers to throw a 3-tuple and chttpd will be able to handle it, so there's no problem there vis-a-vis rolling upgrades. However, this only lets us change the error field in the response, whereas we probably want to keep "error": "forbidden" as the response for failed Mango VDU validations.

If we have couch_query_servers throw the 2-tuple {forbidden, {[{<<"failures">>, Failures}]}} (Failures is the list of validation failures) then this ends up putting the failures in the reason field of the response, e.g.:

{
  "error": "forbidden",
  "reason": {
    "failures": [
      {
        "path": ["newDoc", "ok"],
        "message": "must be present"
      }
    ]
  }
}

Technically, this is not a breaking change; it was already legal for a JS VDU to throw { forbidden: { failures: [...] } } and this would end producing the response above. However there has been concern that making reason not be a string would be surprising to most users and could break existing programs, so should be considered a breaking change.

If we want to retain reason as a string and put the failures list somewhere else then we need to invent some other way for couch_query_servers to throw an error message and failure list, and put these into the HTTP response. One such way is to make it throw {forbidden, {failures, Failures}}, and then make chttpd:send_error() inspect ReasonStr. If it's {failures, Failures} then we make it put this in the response:

  "reason": "document is not valid",
  "failures": Failures

Otherwise we make it just emit "reason": ReasonStr as it currently does. The problem with this is that it couples some generic HTTP error handling code to the specifics of VDUs, which seems like a bad idea. We could weaken this coupling by making couch_query_servers throw a more complex object containing the reason and any additional fields, and make send_error() emit all that data into the response, i.e. it lets ReasonStr be either a string, or a set of JSON fields. This removes the coupling but still adds some complexity to how errors are communicated.

@jcoglan
Copy link
Copy Markdown
Contributor Author

jcoglan commented May 28, 2026

The simplest thing to do is to have the response include "reason": { "failures": [...] } since this has no compatibility concerns and is easiest to implement. However we possibly consider this a breaking change to the reason field.

Returning "reason": "message", "failures": [...] instead probably requires us to ship a version where chttpd understands more complex error structure first, and then later ship a couch_query_servers that emits said structure, to avoid problems with rolling upgrades where couch_query_servers emits something that chttpd does not understand.

@jcoglan
Copy link
Copy Markdown
Contributor Author

jcoglan commented May 28, 2026

The occurrences of {forbidden, _} in the codebase that I have been able to determine are actually involved in the path for updating a doc are:

  • chttpd:error_info({forbidden, Msg})
  • couch_query_servers:validate_doc_update(Db, DDoc, EditDoc, DiskDoc, Ctx, SecObj)
  • fabric_doc_update:force_reply(Doc, [FirstReply | _] = Replies, {Health, W, SWS, Acc}) and its calls to check_forbidden_msg(Replies)

The last one is not obvious because the uses of {forbidden, _} occur only on edge cases in quorum logic and won't be hit if all nodes return the same reply. Nevertheless, the logic here looks as though it depends on the 2-tuple {forbidden, Msg} but never uses the 2nd item in any way other than returning/throwing it without inspecting its content. The relevant function on main is here:

force_reply(Doc, [FirstReply | _] = Replies, {Health, W, SWS, Acc}) ->
case update_quorum_met(W, Replies, SWS) of
{true, Reply} ->
% corner case new_edits:false and vdu: [noreply, forbidden, noreply]
case check_forbidden_msg(Replies) of
{forbidden, ForbiddenReply} ->
{Health, W, SWS, [{Doc, ForbiddenReply} | Acc]};
false ->
{Health, W, SWS, [{Doc, Reply} | Acc]}
end;
false ->
case [Reply || {ok, Reply} <- Replies] of
[] ->
% check if all errors are identical, if so inherit health
case lists:all(fun(E) -> E =:= FirstReply end, Replies) of
true ->
CounterKey = [fabric, doc_update, errors],
couch_stats:increment_counter(CounterKey),
{Health, W, SWS, [{Doc, FirstReply} | Acc]};
false ->
CounterKey = [fabric, doc_update, mismatched_errors],
couch_stats:increment_counter(CounterKey),
case check_forbidden_msg(Replies) of
{forbidden, ForbiddenReply} ->
{Health, W, SWS, [{Doc, ForbiddenReply} | Acc]};
false ->
{error, W, SWS, [{Doc, FirstReply} | Acc]}
end
end;
[AcceptedRev | _] ->
CounterKey = [fabric, doc_update, write_quorum_errors],
couch_stats:increment_counter(CounterKey),
NewHealth =
case Health of
ok -> accepted;
_ -> Health
end,
{NewHealth, W, SWS, [{Doc, {accepted, AcceptedRev}} | Acc]}
end
end.

This leans us toward sticking with couch_query_servers throwing a 2-tuple, not a 3-tuple, and putting {[{<<"failures">>, [...]}]} as the second item. This causes the HTTP response to look like { "error": "forbidden", "reason": { "failures": [...] } }.

We could choose to further tweak the HTTP response, but this would need changes to chttpd deployed before changes to couch_query_servers in order for rolling upgrades to go smoothly.

@jcoglan jcoglan force-pushed the mango-match-failures branch 3 times, most recently from 247d9ac to 64038cd Compare May 28, 2026 14:56
@jcoglan jcoglan force-pushed the mango-match-failures branch from e80e0a0 to 45dee90 Compare May 28, 2026 15:20
@nickva
Copy link
Copy Markdown
Contributor

nickva commented May 30, 2026

This leans us toward sticking with couch_query_servers throwing a 2-tuple, not a 3-tuple, and putting {[{<<"failures">>, [...]}]} as the second item. This causes the HTTP response to look like { "error": "forbidden", "reason": { "failures": [...] } }.

I think that could work and it's probably the cleanest solution but would be a minor incompatibility. During online cluster upgrade we could also take the approach perhaps that there won't yet be too many existing mango vdus. Unless the cluster is left in that intermediate state for a long while and the user starts exercising the new feature.

@jcoglan jcoglan force-pushed the mango-match-failures branch 4 times, most recently from 8a899d0 to 5c0fcd7 Compare June 2, 2026 15:30
jcoglan added 11 commits June 3, 2026 11:31
Rather than returning a boolean to indicate just success or failure,
`mango_selector:match/2` now returns a list of "failures" describing the
ways in which the selector failed to match the input. If this list is
empty, the match was a success.
We will need to pass other things around between `match` calls as well
the current `Cmp` function, so here we replace this argument with a
`#ctx` record that intially just contains a `cmp` field.
To give detailed feedback to the caller, the `#ctx` argument to
`mango_selector:match/3` now records the path that was taken to reach
each value, and this path is added to the `#failure` records.

Each path segment is either a binary, if it represents an object
property, or an integer if it represents an array index. Items are
pushed on the front of `#ctx.path` as this is faster than pushing onto
the back of a list. This list can then be reversed once the final list
of failures has been generated, before the failures are presented to the
caller.
Collecting detailed `#failure` records rather than a boolean true/false
when evaluating selectors imposes a performance penalty, so we would
like to only do this when a selector is used for a VDU, not when it is
used for indexing/filtering.

To this end we introduce "verbose" mode signalled via the `#ctx.verbose`
field, and each branch of `mango_selector:match/3` now has 3 distinct
versions:

- `#ctx{verbose = false}`: this is the original version that returns
  true/false, taken when a selector is used for Mango queries.

- `#ctx{verbose = true, negate = false}`: verbose mode, when the
  operator is not negated by an enclosing `$not` operator. Returns a
  list of `#failure` records which may be empty.

- `#ctx{verbose = true, negate = true}`: verbose mode, when the operator
  is negated by an enclosing `$not` operator. Returns a list of
  `#failure` records.

The different negation modes are needed because, in order to generate
meaningful failure messages, we need to record whether an operator was
negated. The behaviour of combinators like `$and`, `$or`, `$allMatch`
and `$elemMatch` means not all `$not` operators can be normalized out of
the selector before evaluation. Instead, when we encounter a `$not`
during evaluation, we flip the `#ctx.negate` field before evaluating the
inner operator.
Until now, document updates rejected by a Mango VDU returned an opaque
"forbidden" message to the client. This commit adds a detailed list of
failures, obtained by converting the `#failure` records returned by
`mango_selector:match/3` into human-readable messages.
@jcoglan jcoglan force-pushed the mango-match-failures branch from 5c0fcd7 to d499b33 Compare June 3, 2026 11:58
Currently, when a design doc is updated, we validate the `map` and
`reduce` fields, but not `validate_doc_update`. Instead, trying to
update any other doc while an invalid `validate_doc_update` exists will
trigger an error.

This comment makes VDU validation more 'eager' by performing it when the
ddoc itself is updated. Normal doc writes will still trigger an error if
an invalid `validate_doc_update` already exists, but now we try to
prevent this happening by validating VDUs when they are first created.
@jcoglan jcoglan force-pushed the mango-match-failures branch from d499b33 to b4acce2 Compare June 4, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants