-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql: audit DistSQL supportability for streamer #160423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
da16a48 to
b0f3f6d
Compare
971c63b to
33c48f1
Compare
5a01250 to
3c19a73
Compare
3c19a73 to
a63e6e3
Compare
DrewKimball
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement. I just have some nits you can take or leave. Also, it seems like there's some opportunity to call this set of properties something more general now that we're using it for both DistSQL and streamer decisions. What do you think?
@DrewKimball reviewed 23 files and all commit messages, and made 4 comments.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich).
pkg/sql/distsql_check.go line 38 at r4 (raw file):
// It is separate from distSQLProhibitedCauses so that we can generate the // stringer for each cause. type distSQLProhibitedCause uint64
Alternate names (though feel free to keep the current one):
- distSQLBlocker
- distSQLRestriction
- distSQLConstraint
pkg/sql/distsql_check.go line 64 at r4 (raw file):
// distSQLProhibitedCauses is a bit-map describing all reasons for why a query // couldn't be distributed. type distSQLProhibitedCauses uint64
nit: worth having an Empty() or HasNone() method for readability?
pkg/sql/distsql_check.go line 222 at r4 (raw file):
sd *sessiondata.SessionData, txnHasBufferedWrites bool, ) (retRec distRecommendation, retCauses distSQLProhibitedCauses) {
nit: What if we instead had a method on distSQLProhibitedCauses mapping to a distRecommendation, and had this function only return the causes?
In f6038a1 we made it so that if LocalPlanNodeSpec handles `scanBufferNode` or `bufferNode`, that doesn't prevent usage of the streamer. However - theoretically - it's possible that a single LocalPlanNodeSpec handles multiple planNodes. This is the case since we traverse the tree of planNodes until we find first DistSQL-plannable one ("firstNotWrapped"), and all planNodes before that are handled via the single processor. Thus we could've applied an exception to more cases than we initially wanted (but in practice it should be unlikely if possible), so this commit hardens that change by tracking whether a single planNode is being wrapped or not and applying the exception only if so. Release note: None
This commit fixes a data race that was exposed by the following commit (but has existed since a4ed1a1 where we started reusing the same planner's visitor for DistSQL-supportability check). Namely, if we have concurrent parallel checks, each will run `checkSupportForPlanNode` which might internally use `distSQLExprCheckVisitor` from the planner. In order to fix the race we now will allocate a fresh visitor for all worker goroutines but keep on using the planner's one in all other contexts. The impact of the bug is minor if any - we don't expect the parallel checks to actually have any distsql-unsupported features, so `err` field might never be modified to have a race. Release note: None
This commit extracts the code for checking distsql supportability into a separate file. This is done because the following commit will be extending the state we're storing during the distsql check. It's purely a mechanical move, with a couple of tweaks: - `checkSupportForInvertedFilterNode` is inlined in its single call site (to get symmetry with all other planNodes). - all unsupported errors have been extracted into global singletons. The following commit will take advantage of this by replacing errors with bit-map of causes - the loss of some extra context should be negligible. Release note: None
a63e6e3 to
27d3798
Compare
In c17591e we made so that if we have a plan that cannot be distributed for any reason, we would disable the streamer out of caution. This was known to be a super-set of problematic scenarios, and this commit addresses that oversight. In particular, it revamps `checkSupportForPlanNode` function so that we accumulate all possible blockers for DistSQL, and then we examine each one to decide whether it's safe to use the streamer given the blockers found. The blockers are accumulated into a bit-map, and the methods have been adjusted to do a full walk over the planNode tree (as well as over all expressions) to capture all blockers - this should have negligible overhead. An additional improvement is that we now replace errors with different bits and we can log all blockers for DistSQL not being supported. What prompted this work was a customer query where we didn't use the streamer because of "unsupported planNode" distsql error. This was attempted to be fixed in f6038a1, but the fix was incomplete, and now it is. Release note: None
27d3798 to
6676559
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting question. I feel like if we were to find the common denominator, we'd have to call it something like "execution blockers" or "execution properties", which seems rather generic. I'll keep it as distSQLBlocker since it's clearly related to the distsql exec engine, and the way we examine blockers for the streamer - to decide whether it's ok to use the leaf txn - is also somewhat related to distsql exec engine.
Thanks for the quick review!
bors r+
@yuzefovich made 4 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball).
pkg/sql/distsql_check.go line 38 at r4 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Alternate names (though feel free to keep the current one):
- distSQLBlocker
- distSQLRestriction
- distSQLConstraint
Thanks for the ideas, I like distSQLBlocker the most. Done.
pkg/sql/distsql_check.go line 64 at r4 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
nit: worth having an
Empty()orHasNone()method for readability?
Hm, I went back and forth on this and didn't quite land on anything I liked. It seems like comparing against zero is simple enough and perhaps the most clear.
pkg/sql/distsql_check.go line 222 at r4 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
nit: What if we instead had a method on
distSQLProhibitedCausesmapping to adistRecommendation, and had this function only return the causes?
Single return value is not enough - if we have any distsql blockers, then it must be cannotDistribute, but otherwise it could either be canDistribute or shouldDistribute.
|
Build succeeded: |
This PR contains several commits that address some edge cases around disabling the streamer due to features that are not supported by DistSQL. See each commit for details.
Epic: None
Release note: None