Mango `match_failures/2` function by jcoglan · Pull Request #5858 · apache/couchdb

jcoglan · 2026-01-21T16:30:42Z

Overview

This PR represents work so far on a version of mango_selector:match/2 than can return a representation of how the input fails to match the selector, rather than just a boolean. There are a few commits of developing this behaviour incrementally before everything "snaps into place" to get us code paths that can produce failure descriptions, and retain the original boolean behaviour that avoids creating a lot of ephemeral failure lists, and can short-circuit on compound operators.

The rough steps here are as follows; I'm retaining all these commits for now in case we want to compare different designs for code complexity and performance.

Add a lot of unit tests to mango_selector that check every match operator and its negation. This is necessary since various compound operators like $or and $allMatch have surprising edge case behaviour on empty lists and we need to make sure this is not broken. Mostly these tests check that if selector S returns true on an input then { "$not": S } returns false and vice versa. There are some exceptions to this due to how $or works and how $and and $or are normalised.
Replace the existing implementation with one where every operator returns a possibly-empty list of failures, instead of a boolean. match_int then converts this to a bool on the way out.
Replace the Cmp argument to everything with a ctx record that contains cmp, as well as other things needed for failure generation, e.g. the path to the current value, whether matching is currently negated, etc.
Try to fix $allMatch and $elemMatch normalisation to avoid the complexity that comes from not being able to apply DeMorgan to them, due to $allMatch being defined to return false for empty lists. This is a breaking change that is reverted later since we decided to retain existing behaviour above all.
Implement negation handling, where the presence of a $not has to be communicated to nodes lower down the tree in order to produce good failure messages. Not all negation can be normalised out of the tree and so all operators need to handle being negated during evaluation.
Finally, collapse all the complexity into an implementation that supports both the old and new behaviours.

The idea in the design I've ended up with that tries to minimise both complexity and runtime cost is:

Add #ctx.verbose which indicates whether a detailed failure description is wanted.
Keep the original implementations of all operators as the response when passed #ctx{verbose=false}, i.e. when only a boolean result is needed.
For all leaf operators, the #ctx{verbose=true} case can be implemented by calling the #ctx{verbose=false} case, and creating a #failure record if this returns false.
#ctx{verbose=false} cases do not need to deal with #ctx{negate=true}; they continue to return their original result and let $not invert it. We only need special handling of #ctx{negate=true} in verbose mode, where the $not operator passes its effect down via the #ctx. This reduces the number of cases each operator has to deal with to basically: non-verbose mode, and positive and negative verbose cases.
For compound operators, special code is needed to gather up the failures from internal selectors and deal with edge cases in a way that's consistent with the original implementation.
#ctx.path is only updated in #ctx{verbose=true} code paths so this expense is avoided in non-verbose mode. Path items are added to the front of this list as that's cheaper than doing Path ++ [Item]; we would reverse these before returning to a client.
#failure records retain a #ctx from where they can access the path and negation state, in order to generate a good human-readable error message later on.
The tests are updated to make sure that both verbose modes return consistent results, i.e. if verbose=false returns true, then verbose=true returns [], and if the former returns false, the latter gives a non-empty list. These are all passing.

Testing recommendations

We should benchmark this in its current version, and both verbose modes of this version, against a substantial indexing workload to look for performance regressions. Or, if performance is equivalent in both verbose modes, we can remove a lot of redundancy by removing the verbose flag entirely.

Related Issues or Pull Requests

RFC: proposal for declarative VDUs #5792

Checklist

Code is written and works correctly
Changes are covered by tests
Any new configurable parameters are documented in rel/overlay/etc/default.ini
Documentation changes were made in the src/docs folder
Documentation changes were backported (separated PR) to affected branches

nickva · 2026-01-21T16:56:53Z

That's a nice approach using a context record in place of the Cmp arg.

All the extra eunit tests are awesome. If you want, could even put them in a separate PR and we'd merge them right away. It would make it easier to review subsequent PRs because we can obviously see all the existing tests pass.

janl · 2026-04-22T12:32:07Z

Nice work James. I’ve done some benchmarks with the bulkbench script to see what this gives us:

   no vdu: real 0m18.122s
   js vdu: real 0m55.438s (3.06x)
mango vdu: real 0m21.722s (1.19x)

nickva

This is a large PR and I don't know mango very well so I did the best I could. Sorry it took a while to get to. Those are just a few things I noticed at first, I haven't run it yet locally to play with.

nickva · 2026-04-29T07:15:54Z

        ok ->
            ok;
+        {[{<<"forbidden">>, Message}, {<<"failures">>, Failures}]} ->
+            throw({forbidden, Message, Failures});


We're changing the forbidden tuple shape. Make sure to check all the places which handle {forbidden, _} and now they may have to handle the triple-arg version. I noticed fabric_doc_update needs to handle this src/fabric/src/fabric_doc_update.erl

We should also call out and see if this will affect online cluster upgrades (new worker nodes throwing it and old coordinator nodes getting function clause errors). It maybe be fine, just needs an extra careful look at it.

nickva · 2026-04-29T07:26:43Z

+            case mango_selector:has_allowed_fields(Selector, [<<"newDoc">>, <<"oldDoc">>]) of
+                false ->
+                    Msg =
+                        <<"'validate_doc_update' may only contain 'newDoc' and 'oldDoc' as top-level fields">>,


I wonder when this would fire, would it be on every time we attempt to insert a doc we'd crash the prompt?

We added this b/c @janl encountered a really confusing error during testing due to typing "selector": { "x": 0 } instead of "selector": { "newDoc": { "x": 0 } }. If you omit newDoc, the resulting validation failure is confusing, and we thought it better to alert the user that they are probably making a mistake instead of returning an empty result set.

What I meant is that it seems we are not validating the inserted document here, instead we're validating the VDU itself

We're not validating the design doc when it is uploaded; this would not be consistent with the existing behaviour of PUT /db/_design/doc. You are currently allowed to upload a JS design doc with invalid code in it. Instead, this triggers when normal doc writes occur; if the Mango VDU does not make sense then we return an error about that, rather than giving the user a misleading error implying their doc is invalid.

we should consider validating on ddoc-write here, it feels like the better behaviour

You are currently allowed to upload a JS design doc with invalid code in it. Instead, this triggers when normal doc writes occur; if the Mango VDU does not make sense then we return an error about that, rather than giving the user a misleading error implying their doc is invalid.

But if we could we would validate the JS VDU during insertion. It's a bit how we check compilation during ddoc inserts "does it even compile? -if not, we fail the ddoc insertion to start with". If we can do that early with Mango VDUs, we should, then we don't have to worry about misleading the user later because by the virtue of rejecting invalid VDUs we will only have deal with invalid user documents as all Mango VDU will be valid

Huh, I thought the last time I checked this that CouchDB allows writing ddocs with malformed (i.e. does not even parse) JS code in them, but I just checked and this is not the case. I can see about moving this check to when the Mango ddoc is uploaded.

Actually I wasn't quite right. map functions are checked when the ddoc is updated, but validate_doc_update is not. The compilation error is surfaced when other docs are written and we attempt to invoke the VDU function.

$ cdb '/asd/_design/foo' -X PUT -d '{ "validate_doc_update": "function (doc) {" }' {"ok":true,"id":"_design/foo","rev":"1-416ed8128d308d5abc6b4745a64394e5"} $ cdb '/asd/doc' -X PUT -d '{}' {"error":"compilation_error","reason":"SyntaxError: unexpected token in expression: '' (function (doc) {)"}

Personally I think this behaviour should be consistent between JS and Mango VDUs, and if we want to preemptively validate the v-d-u field we should do this for both backends.

jcoglan · 2026-05-26T08:40:46Z

I think the only thing left to sort out here is how the errors are passed back to the HTTP/fabric frontend and back to the client, i.e. the forbidden structure. What I have done here is a hack b/c I found it hard to figure out how I was supposed to implement this, and this was something I managed to get working just enough to have a client interact with the functionality. I could use some guidance on doing it properly.

jcoglan · 2026-05-28T10:07:34Z

Been looking into how the errors should be passed from mango_native_proc back to the client and how this affects the use of {forbidden, _} tuples.

Many places in the codebase throw or catch (or otherwise create/use) the structure {forbidden, Msg}. forbidden is an atom and Msg is usually a binary, though it can be a string, a list (e.g. ["by_node not an object"]) or a tuple (e.g. chttpd_auth_request throws {forbidden, {Error, Reason}}).

The specific code path for VDUs is that the actual VDU engine (a JS process or mango_native_proc) throws a JS object like { forbidden: Msg }, which is {[{<<"forbidden">>, Msg}]} in Erlang. couch_query_servers:validate_doc_update() catches {[{<<"forbidden">>, Msg}]} and converts it into {forbidden, Msg} i.e. a normal Erlang pair with forbidden as an atom. This is re-thrown and passed back through fabric and other request-handling machinery until it ends up in the HTTP layer.

chttpd has two handlers relevant here:

error_info({forbidden, Msg}) produces a 403 response with { "error": "forbidden", "reason": Msg }
error_info({forbidden, Error, Msg}) produces a 403 response with { "error": Error, "reason": Msg }

The problem we have is: how to send the list of failure objects from mango_native_proc back to the client. The current design means it's safe for couch_query_servers to throw a 3-tuple and chttpd will be able to handle it, so there's no problem there vis-a-vis rolling upgrades. However, this only lets us change the error field in the response, whereas we probably want to keep "error": "forbidden" as the response for failed Mango VDU validations.

If we have couch_query_servers throw the 2-tuple {forbidden, {[{<<"failures">>, Failures}]}} (Failures is the list of validation failures) then this ends up putting the failures in the reason field of the response, e.g.:

{
  "error": "forbidden",
  "reason": {
    "failures": [
      {
        "path": ["newDoc", "ok"],
        "message": "must be present"
      }
    ]
  }
}

Technically, this is not a breaking change; it was already legal for a JS VDU to throw { forbidden: { failures: [...] } } and this would end producing the response above. However there has been concern that making reason not be a string would be surprising to most users and could break existing programs, so should be considered a breaking change.

If we want to retain reason as a string and put the failures list somewhere else then we need to invent some other way for couch_query_servers to throw an error message and failure list, and put these into the HTTP response. One such way is to make it throw {forbidden, {failures, Failures}}, and then make chttpd:send_error() inspect ReasonStr. If it's {failures, Failures} then we make it put this in the response:

  "reason": "document is not valid",
  "failures": Failures

Otherwise we make it just emit "reason": ReasonStr as it currently does. The problem with this is that it couples some generic HTTP error handling code to the specifics of VDUs, which seems like a bad idea. We could weaken this coupling by making couch_query_servers throw a more complex object containing the reason and any additional fields, and make send_error() emit all that data into the response, i.e. it lets ReasonStr be either a string, or a set of JSON fields. This removes the coupling but still adds some complexity to how errors are communicated.

jcoglan · 2026-05-28T10:12:31Z

The simplest thing to do is to have the response include "reason": { "failures": [...] } since this has no compatibility concerns and is easiest to implement. However we possibly consider this a breaking change to the reason field.

Returning "reason": "message", "failures": [...] instead probably requires us to ship a version where chttpd understands more complex error structure first, and then later ship a couch_query_servers that emits said structure, to avoid problems with rolling upgrades where couch_query_servers emits something that chttpd does not understand.

jcoglan · 2026-05-28T14:19:13Z

The occurrences of {forbidden, _} in the codebase that I have been able to determine are actually involved in the path for updating a doc are:

chttpd:error_info({forbidden, Msg})
couch_query_servers:validate_doc_update(Db, DDoc, EditDoc, DiskDoc, Ctx, SecObj)
fabric_doc_update:force_reply(Doc, [FirstReply | _] = Replies, {Health, W, SWS, Acc}) and its calls to check_forbidden_msg(Replies)

The last one is not obvious because the uses of {forbidden, _} occur only on edge cases in quorum logic and won't be hit if all nodes return the same reply. Nevertheless, the logic here looks as though it depends on the 2-tuple {forbidden, Msg} but never uses the 2nd item in any way other than returning/throwing it without inspecting its content. The relevant function on main is here:

couchdb/src/fabric/src/fabric_doc_update.erl

Lines 214 to 253 in 7505692

    
           force_reply(Doc, [FirstReply | _] = Replies, {Health, W, SWS, Acc}) -> 
        
               case update_quorum_met(W, Replies, SWS) of 
        
                   {true, Reply} -> 
        
                       % corner case new_edits:false and vdu: [noreply, forbidden, noreply] 
        
                       case check_forbidden_msg(Replies) of 
        
                           {forbidden, ForbiddenReply} -> 
        
                               {Health, W, SWS, [{Doc, ForbiddenReply} | Acc]}; 
        
                           false -> 
        
                               {Health, W, SWS, [{Doc, Reply} | Acc]} 
        
                       end; 
        
                   false -> 
        
                       case [Reply || {ok, Reply} <- Replies] of 
        
                           [] -> 
        
                               % check if all errors are identical, if so inherit health 
        
                               case lists:all(fun(E) -> E =:= FirstReply end, Replies) of 
        
                                   true -> 
        
                                       CounterKey = [fabric, doc_update, errors], 
        
                                       couch_stats:increment_counter(CounterKey), 
        
                                       {Health, W, SWS, [{Doc, FirstReply} | Acc]}; 
        
                                   false -> 
        
                                       CounterKey = [fabric, doc_update, mismatched_errors], 
        
                                       couch_stats:increment_counter(CounterKey), 
        
                                       case check_forbidden_msg(Replies) of 
        
                                           {forbidden, ForbiddenReply} -> 
        
                                               {Health, W, SWS, [{Doc, ForbiddenReply} | Acc]}; 
        
                                           false -> 
        
                                               {error, W, SWS, [{Doc, FirstReply} | Acc]} 
        
                                       end 
        
                               end; 
        
                           [AcceptedRev | _] -> 
        
                               CounterKey = [fabric, doc_update, write_quorum_errors], 
        
                               couch_stats:increment_counter(CounterKey), 
        
                               NewHealth = 
        
                                   case Health of 
        
                                       ok -> accepted; 
        
                                       _ -> Health 
        
                                   end, 
        
                               {NewHealth, W, SWS, [{Doc, {accepted, AcceptedRev}} | Acc]} 
        
                       end 
        
               end.

This leans us toward sticking with couch_query_servers throwing a 2-tuple, not a 3-tuple, and putting {[{<<"failures">>, [...]}]} as the second item. This causes the HTTP response to look like { "error": "forbidden", "reason": { "failures": [...] } }.

We could choose to further tweak the HTTP response, but this would need changes to chttpd deployed before changes to couch_query_servers in order for rolling upgrades to go smoothly.

nickva · 2026-05-30T04:55:10Z

This leans us toward sticking with couch_query_servers throwing a 2-tuple, not a 3-tuple, and putting {[{<<"failures">>, [...]}]} as the second item. This causes the HTTP response to look like { "error": "forbidden", "reason": { "failures": [...] } }.

I think that could work and it's probably the cleanest solution but would be a minor incompatibility. During online cluster upgrade we could also take the approach perhaps that there won't yet be too many existing mango vdus. Unless the cluster is left in that intermediate state for a long while and the user starts exercising the new feature.

Rather than returning a boolean to indicate just success or failure, `mango_selector:match/2` now returns a list of "failures" describing the ways in which the selector failed to match the input. If this list is empty, the match was a success.

We will need to pass other things around between `match` calls as well the current `Cmp` function, so here we replace this argument with a `#ctx` record that intially just contains a `cmp` field.

To give detailed feedback to the caller, the `#ctx` argument to `mango_selector:match/3` now records the path that was taken to reach each value, and this path is added to the `#failure` records. Each path segment is either a binary, if it represents an object property, or an integer if it represents an array index. Items are pushed on the front of `#ctx.path` as this is faster than pushing onto the back of a list. This list can then be reversed once the final list of failures has been generated, before the failures are presented to the caller.

Collecting detailed `#failure` records rather than a boolean true/false when evaluating selectors imposes a performance penalty, so we would like to only do this when a selector is used for a VDU, not when it is used for indexing/filtering. To this end we introduce "verbose" mode signalled via the `#ctx.verbose` field, and each branch of `mango_selector:match/3` now has 3 distinct versions: - `#ctx{verbose = false}`: this is the original version that returns true/false, taken when a selector is used for Mango queries. - `#ctx{verbose = true, negate = false}`: verbose mode, when the operator is not negated by an enclosing `$not` operator. Returns a list of `#failure` records which may be empty. - `#ctx{verbose = true, negate = true}`: verbose mode, when the operator is negated by an enclosing `$not` operator. Returns a list of `#failure` records. The different negation modes are needed because, in order to generate meaningful failure messages, we need to record whether an operator was negated. The behaviour of combinators like `$and`, `$or`, `$allMatch` and `$elemMatch` means not all `$not` operators can be normalized out of the selector before evaluation. Instead, when we encounter a `$not` during evaluation, we flip the `#ctx.negate` field before evaluating the inner operator.

Until now, document updates rejected by a Mango VDU returned an opaque "forbidden" message to the client. This commit adds a detailed list of failures, obtained by converting the `#failure` records returned by `mango_selector:match/3` into human-readable messages.

…ct if the doc is newly created

… in Mango VDUs

Currently, when a design doc is updated, we validate the `map` and `reduce` fields, but not `validate_doc_update`. Instead, trying to update any other doc while an invalid `validate_doc_update` exists will trigger an error. This comment makes VDU validation more 'eager' by performing it when the ddoc itself is updated. Normal doc writes will still trigger an error if an invalid `validate_doc_update` already exists, but now we try to prevent this happening by validating VDUs when they are first created.

jcoglan force-pushed the mango-match-failures branch 3 times, most recently from 5076677 to 7f8f999 Compare February 4, 2026 09:57

jcoglan force-pushed the mango-match-failures branch 2 times, most recently from 69fe09f to fa65b74 Compare February 11, 2026 14:12

jcoglan changed the base branch from main to 3.5.x February 13, 2026 09:38

jcoglan mentioned this pull request Feb 13, 2026

$data operator for VDUs #5889

Draft

6 tasks

jcoglan force-pushed the mango-match-failures branch from fa65b74 to 1c23788 Compare February 13, 2026 11:39

jcoglan changed the base branch from 3.5.x to main February 13, 2026 11:40

jcoglan mentioned this pull request Feb 20, 2026

Mango unit tests #5895

Merged

6 tasks

janl added this to the 3.8 milestone Feb 27, 2026

jcoglan force-pushed the mango-match-failures branch from 1c23788 to 82142aa Compare March 13, 2026 16:31

jcoglan force-pushed the mango-match-failures branch from 82142aa to 74d02e0 Compare March 23, 2026 14:28

jcoglan marked this pull request as ready for review March 23, 2026 14:31

jcoglan force-pushed the mango-match-failures branch 3 times, most recently from 407b81a to e21b4b8 Compare April 9, 2026 12:00

jcoglan force-pushed the mango-match-failures branch from e21b4b8 to ba11d38 Compare April 22, 2026 13:15

nickva requested changes Apr 29, 2026

View reviewed changes

jcoglan force-pushed the mango-match-failures branch from ba11d38 to 557d5ed Compare May 21, 2026 15:49

jcoglan force-pushed the mango-match-failures branch from 557d5ed to 6107913 Compare May 26, 2026 13:06

jcoglan force-pushed the mango-match-failures branch 3 times, most recently from 247d9ac to 64038cd Compare May 28, 2026 14:56

jcoglan force-pushed the mango-match-failures branch from e80e0a0 to 45dee90 Compare May 28, 2026 15:20

jcoglan force-pushed the mango-match-failures branch 4 times, most recently from 8a899d0 to 5c0fcd7 Compare June 2, 2026 15:30

jcoglan added 11 commits June 3, 2026 11:31

fix: The match/2 rule for filtering out <<>> should filter []

407b0e8

fix: Normalize negations inside $allMatch and other operators

c3c7435

chore: Replace Cmp argument to mango_selector:match/3 with a record

e747bf0

We will need to pass other things around between `match` calls as well the current `Cmp` function, so here we replace this argument with a `#ctx` record that intially just contains a `cmp` field.

chore: Add some benchmarks for Mango selector matching

34f2cf5

fix: Mango VDUs should omit the oldDoc field from the matching stru…

24db80f

…ct if the doc is newly created

docs: More detailed explanation of the behaviour of the oldDocs field…

c705b39

… in Mango VDUs

fix: Raise an error if a Mango VDU contains invalid top-level fields

e2e042f

jcoglan force-pushed the mango-match-failures branch from 5c0fcd7 to d499b33 Compare June 3, 2026 11:58

jcoglan force-pushed the mango-match-failures branch from d499b33 to b4acce2 Compare June 4, 2026 14:41

Conversation

jcoglan commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Testing recommendations

Related Issues or Pull Requests

Checklist

Uh oh!

nickva commented Jan 21, 2026

Uh oh!

janl commented Apr 22, 2026

Uh oh!

nickva left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nickva Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nickva Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcoglan May 7, 2026

Choose a reason for hiding this comment

Uh oh!

nickva May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcoglan May 26, 2026

Choose a reason for hiding this comment

Uh oh!

janl May 26, 2026

Choose a reason for hiding this comment

Uh oh!

nickva May 30, 2026

Choose a reason for hiding this comment

Uh oh!

jcoglan Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jcoglan Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcoglan commented May 26, 2026

Uh oh!

jcoglan commented May 28, 2026

Uh oh!

jcoglan commented May 28, 2026

Uh oh!

jcoglan commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickva commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jcoglan commented Jan 21, 2026 •

edited

Loading

nickva Apr 29, 2026 •

edited

Loading

nickva Apr 29, 2026 •

edited

Loading

nickva May 7, 2026 •

edited

Loading

jcoglan commented May 28, 2026 •

edited

Loading