Commit 7029127
committed
sql: fix rare race around concurrent remote flows setup
A few years ago in 0c1095e we changed
the way we set up distributed query plans. Namely, we now start by
setting up the gateway (i.e. local) flow first, and then we'll issue
SetupFlowRequest RPCs concurrently to set up remote flows without
actually blocking on the gateway until the setup is complete.
We have seen about 5 occurrences where the protobuf marshaling code
crashed when handling those concurrent RPCs. I have a hypothesis is that
this is due to the main goroutine of the gateway flow not waiting until
after RPCs are done. In particular, we put `PhysicalInfrastructure`
objects into `sync.Pool` and they are released by executing
`PlanningCtx.getCleanupFunc` function. That function is executed in
a defer after `Run`ning the local flow completes. However, it's possible
that it'll be executed _before_ concurrent SetupFlowRequest RPCs
(evaluated via the distsql worker goroutines) are performed, and I'm
guessing the flow specs might get corrupted because of that.
In order to prevent this race, we now will block execution of
`Flow.Cleanup` function of the gateway flow until all concurrent RPCs
are done.
I tried injecting the sleep right before executing the concurrent RPCs
but still was unable to reproduce the problem on the gceworker. Given
that we've only seen this a handful of times, I decided to omit the release
note.
Release note: None1 parent 8e5af1e commit 7029127
1 file changed
+27
-31
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
568 | 568 | | |
569 | 569 | | |
570 | 570 | | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
571 | 574 | | |
572 | 575 | | |
573 | 576 | | |
574 | 577 | | |
| 578 | + | |
575 | 579 | | |
576 | 580 | | |
577 | 581 | | |
| |||
642 | 646 | | |
643 | 647 | | |
644 | 648 | | |
645 | | - | |
646 | | - | |
647 | | - | |
648 | | - | |
649 | | - | |
650 | | - | |
651 | | - | |
652 | | - | |
653 | | - | |
654 | | - | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
655 | 654 | | |
656 | | - | |
657 | | - | |
658 | | - | |
659 | | - | |
660 | | - | |
661 | | - | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
662 | 659 | | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
663 | 663 | | |
664 | 664 | | |
665 | 665 | | |
| |||
673 | 673 | | |
674 | 674 | | |
675 | 675 | | |
676 | | - | |
677 | | - | |
678 | | - | |
679 | | - | |
680 | | - | |
681 | | - | |
682 | | - | |
683 | | - | |
684 | | - | |
685 | | - | |
686 | | - | |
687 | | - | |
688 | | - | |
689 | | - | |
690 | | - | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
691 | 687 | | |
692 | 688 | | |
693 | 689 | | |
| |||
0 commit comments