Mango match_failures/2 function#5858
Conversation
|
That's a nice approach using a context record in place of the Cmp arg. All the extra eunit tests are awesome. If you want, could even put them in a separate PR and we'd merge them right away. It would make it easier to review subsequent PRs because we can obviously see all the existing tests pass. |
5076677 to
7f8f999
Compare
69fe09f to
fa65b74
Compare
fa65b74 to
1c23788
Compare
1c23788 to
82142aa
Compare
82142aa to
74d02e0
Compare
407b81a to
e21b4b8
Compare
|
Nice work James. I’ve done some benchmarks with the bulkbench script to see what this gives us: |
e21b4b8 to
ba11d38
Compare
nickva
left a comment
There was a problem hiding this comment.
This is a large PR and I don't know mango very well so I did the best I could. Sorry it took a while to get to. Those are just a few things I noticed at first, I haven't run it yet locally to play with.
| ok -> | ||
| ok; | ||
| {[{<<"forbidden">>, Message}, {<<"failures">>, Failures}]} -> | ||
| throw({forbidden, Message, Failures}); |
There was a problem hiding this comment.
We're changing the forbidden tuple shape. Make sure to check all the places which handle {forbidden, _} and now they may have to handle the triple-arg version. I noticed fabric_doc_update needs to handle this src/fabric/src/fabric_doc_update.erl
We should also call out and see if this will affect online cluster upgrades (new worker nodes throwing it and old coordinator nodes getting function clause errors). It maybe be fine, just needs an extra careful look at it.
| case mango_selector:has_allowed_fields(Selector, [<<"newDoc">>, <<"oldDoc">>]) of | ||
| false -> | ||
| Msg = | ||
| <<"'validate_doc_update' may only contain 'newDoc' and 'oldDoc' as top-level fields">>, |
There was a problem hiding this comment.
I wonder when this would fire, would it be on every time we attempt to insert a doc we'd crash the prompt?
There was a problem hiding this comment.
We added this b/c @janl encountered a really confusing error during testing due to typing "selector": { "x": 0 } instead of "selector": { "newDoc": { "x": 0 } }. If you omit newDoc, the resulting validation failure is confusing, and we thought it better to alert the user that they are probably making a mistake instead of returning an empty result set.
There was a problem hiding this comment.
What I meant is that it seems we are not validating the inserted document here, instead we're validating the VDU itself
There was a problem hiding this comment.
We're not validating the design doc when it is uploaded; this would not be consistent with the existing behaviour of PUT /db/_design/doc. You are currently allowed to upload a JS design doc with invalid code in it. Instead, this triggers when normal doc writes occur; if the Mango VDU does not make sense then we return an error about that, rather than giving the user a misleading error implying their doc is invalid.
There was a problem hiding this comment.
we should consider validating on ddoc-write here, it feels like the better behaviour
There was a problem hiding this comment.
You are currently allowed to upload a JS design doc with invalid code in it. Instead, this triggers when normal doc writes occur; if the Mango VDU does not make sense then we return an error about that, rather than giving the user a misleading error implying their doc is invalid.
But if we could we would validate the JS VDU during insertion. It's a bit how we check compilation during ddoc inserts "does it even compile? -if not, we fail the ddoc insertion to start with". If we can do that early with Mango VDUs, we should, then we don't have to worry about misleading the user later because by the virtue of rejecting invalid VDUs we will only have deal with invalid user documents as all Mango VDU will be valid
There was a problem hiding this comment.
Huh, I thought the last time I checked this that CouchDB allows writing ddocs with malformed (i.e. does not even parse) JS code in them, but I just checked and this is not the case. I can see about moving this check to when the Mango ddoc is uploaded.
There was a problem hiding this comment.
Actually I wasn't quite right. map functions are checked when the ddoc is updated, but validate_doc_update is not. The compilation error is surfaced when other docs are written and we attempt to invoke the VDU function.
$ cdb '/asd/_design/foo' -X PUT -d '{ "validate_doc_update": "function (doc) {" }'
{"ok":true,"id":"_design/foo","rev":"1-416ed8128d308d5abc6b4745a64394e5"}
$ cdb '/asd/doc' -X PUT -d '{}'
{"error":"compilation_error","reason":"SyntaxError: unexpected token in expression: '' (function (doc) {)"}Personally I think this behaviour should be consistent between JS and Mango VDUs, and if we want to preemptively validate the v-d-u field we should do this for both backends.
ba11d38 to
557d5ed
Compare
|
I think the only thing left to sort out here is how the errors are passed back to the HTTP/fabric frontend and back to the client, i.e. the |
557d5ed to
6107913
Compare
|
Been looking into how the errors should be passed from Many places in the codebase throw or catch (or otherwise create/use) the structure The specific code path for VDUs is that the actual VDU engine (a JS process or
The problem we have is: how to send the list of failure objects from If we have {
"error": "forbidden",
"reason": {
"failures": [
{
"path": ["newDoc", "ok"],
"message": "must be present"
}
]
}
}Technically, this is not a breaking change; it was already legal for a JS VDU to throw If we want to retain "reason": "document is not valid",
"failures": FailuresOtherwise we make it just emit |
|
The simplest thing to do is to have the response include Returning |
|
The occurrences of
The last one is not obvious because the uses of couchdb/src/fabric/src/fabric_doc_update.erl Lines 214 to 253 in 7505692 This leans us toward sticking with We could choose to further tweak the HTTP response, but this would need changes to |
247d9ac to
64038cd
Compare
e80e0a0 to
45dee90
Compare
I think that could work and it's probably the cleanest solution but would be a minor incompatibility. During online cluster upgrade we could also take the approach perhaps that there won't yet be too many existing mango vdus. Unless the cluster is left in that intermediate state for a long while and the user starts exercising the new feature. |
8a899d0 to
5c0fcd7
Compare
Rather than returning a boolean to indicate just success or failure, `mango_selector:match/2` now returns a list of "failures" describing the ways in which the selector failed to match the input. If this list is empty, the match was a success.
We will need to pass other things around between `match` calls as well the current `Cmp` function, so here we replace this argument with a `#ctx` record that intially just contains a `cmp` field.
To give detailed feedback to the caller, the `#ctx` argument to `mango_selector:match/3` now records the path that was taken to reach each value, and this path is added to the `#failure` records. Each path segment is either a binary, if it represents an object property, or an integer if it represents an array index. Items are pushed on the front of `#ctx.path` as this is faster than pushing onto the back of a list. This list can then be reversed once the final list of failures has been generated, before the failures are presented to the caller.
Collecting detailed `#failure` records rather than a boolean true/false
when evaluating selectors imposes a performance penalty, so we would
like to only do this when a selector is used for a VDU, not when it is
used for indexing/filtering.
To this end we introduce "verbose" mode signalled via the `#ctx.verbose`
field, and each branch of `mango_selector:match/3` now has 3 distinct
versions:
- `#ctx{verbose = false}`: this is the original version that returns
true/false, taken when a selector is used for Mango queries.
- `#ctx{verbose = true, negate = false}`: verbose mode, when the
operator is not negated by an enclosing `$not` operator. Returns a
list of `#failure` records which may be empty.
- `#ctx{verbose = true, negate = true}`: verbose mode, when the operator
is negated by an enclosing `$not` operator. Returns a list of
`#failure` records.
The different negation modes are needed because, in order to generate
meaningful failure messages, we need to record whether an operator was
negated. The behaviour of combinators like `$and`, `$or`, `$allMatch`
and `$elemMatch` means not all `$not` operators can be normalized out of
the selector before evaluation. Instead, when we encounter a `$not`
during evaluation, we flip the `#ctx.negate` field before evaluating the
inner operator.
Until now, document updates rejected by a Mango VDU returned an opaque "forbidden" message to the client. This commit adds a detailed list of failures, obtained by converting the `#failure` records returned by `mango_selector:match/3` into human-readable messages.
…ct if the doc is newly created
5c0fcd7 to
d499b33
Compare
Currently, when a design doc is updated, we validate the `map` and `reduce` fields, but not `validate_doc_update`. Instead, trying to update any other doc while an invalid `validate_doc_update` exists will trigger an error. This comment makes VDU validation more 'eager' by performing it when the ddoc itself is updated. Normal doc writes will still trigger an error if an invalid `validate_doc_update` already exists, but now we try to prevent this happening by validating VDUs when they are first created.
d499b33 to
b4acce2
Compare
Overview
This PR represents work so far on a version of
mango_selector:match/2than can return a representation of how the input fails to match the selector, rather than just a boolean. There are a few commits of developing this behaviour incrementally before everything "snaps into place" to get us code paths that can produce failure descriptions, and retain the original boolean behaviour that avoids creating a lot of ephemeralfailurelists, and can short-circuit on compound operators.The rough steps here are as follows; I'm retaining all these commits for now in case we want to compare different designs for code complexity and performance.
mango_selectorthat check every match operator and its negation. This is necessary since various compound operators like$orand$allMatchhave surprising edge case behaviour on empty lists and we need to make sure this is not broken. Mostly these tests check that if selectorSreturnstrueon an input then{ "$not": S }returnsfalseand vice versa. There are some exceptions to this due to how$orworks and how$andand$orare normalised.match_intthen converts this to a bool on the way out.Cmpargument to everything with actxrecord that containscmp, as well as other things needed for failure generation, e.g. the path to the current value, whether matching is currently negated, etc.$allMatchand$elemMatchnormalisation to avoid the complexity that comes from not being able to apply DeMorgan to them, due to$allMatchbeing defined to return false for empty lists. This is a breaking change that is reverted later since we decided to retain existing behaviour above all.$nothas to be communicated to nodes lower down the tree in order to produce good failure messages. Not all negation can be normalised out of the tree and so all operators need to handle being negated during evaluation.The idea in the design I've ended up with that tries to minimise both complexity and runtime cost is:
#ctx.verbosewhich indicates whether a detailed failure description is wanted.#ctx{verbose=false}, i.e. when only a boolean result is needed.#ctx{verbose=true}case can be implemented by calling the#ctx{verbose=false}case, and creating a#failurerecord if this returns false.#ctx{verbose=false}cases do not need to deal with#ctx{negate=true}; they continue to return their original result and let$notinvert it. We only need special handling of#ctx{negate=true}in verbose mode, where the$notoperator passes its effect down via the#ctx. This reduces the number of cases each operator has to deal with to basically: non-verbose mode, and positive and negative verbose cases.#ctx.pathis only updated in#ctx{verbose=true}code paths so this expense is avoided in non-verbose mode. Path items are added to the front of this list as that's cheaper than doingPath ++ [Item]; we would reverse these before returning to a client.#failurerecords retain a#ctxfrom where they can access the path and negation state, in order to generate a good human-readable error message later on.verbosemodes return consistent results, i.e. ifverbose=falsereturnstrue, thenverbose=truereturns[], and if the former returnsfalse, the latter gives a non-empty list. These are all passing.Testing recommendations
We should benchmark this in its current version, and both
verbosemodes of this version, against a substantial indexing workload to look for performance regressions. Or, if performance is equivalent in bothverbosemodes, we can remove a lot of redundancy by removing theverboseflag entirely.Related Issues or Pull Requests
Checklist
rel/overlay/etc/default.inisrc/docsfolder