Conversation
| pp.runnerGroup.Go(func() error { | ||
| defer pp.state.SetState(PPStateStopping) | ||
| return join.CatchupForever(runnerCtx, false) | ||
| return join.CatchupForever(runnerCtx, true) |
There was a problem hiding this comment.
this makes the join-table trying to reconnect while the processor is running (together with the other table some lines below)
| // wait for the runner to be done | ||
| runningErrs := multierror.Append(pp.runnerGroup.Wait().ErrorOrNil()) | ||
|
|
||
| close(pp.input) |
There was a problem hiding this comment.
channels are now closed after the runner-group is done --> visitors are attaching to the runner-group for this.
|
|
||
| var wg sync.WaitGroup | ||
|
|
||
| // drains the channel and drops out when closed. |
There was a problem hiding this comment.
there was actually no point to distinguish between draining until close or draining until it's empty, because this function is writing to the channel.
In case two visitors are started at the same time and one of them panics or is stopped, it'll drain the other's messages too - but that is an issue that existed before, so we'll ignore it here :)
|
|
||
| if errors.As(err, &errProc) { | ||
| g.log.Debugf("error processing message (non-transient), shutting down processor: %v", err) | ||
| g.log.Printf("error processing message (non-transient), shutting down processor: %v", err) |
There was a problem hiding this comment.
let's have those important errors not as debug.
|
|
||
| err := broker.Open(config) | ||
| if err != nil { | ||
| if err != nil && !errors.Is(err, sarama.ErrAlreadyConnected) { |
There was a problem hiding this comment.
accordin to docs, Open might return this if it's already connected and it's not an error.
| brokers := initSystemTest(t) | ||
| var ( | ||
| topic = goka.Stream(fmt.Sprintf("goka_systemtest_proc_shutdown_disconnect-%d", time.Now().Unix())) | ||
| join = goka.Stream(fmt.Sprintf("goka_systemtest_proc_shutdown_disconnect-%d-join", time.Now().Unix())) |
There was a problem hiding this comment.
adding some join tables to the tests so we can test the reconnecting joins change from above.
2de9c0e to
17448ff
Compare
What this PR tries to improve
Stability of processors in the face of restarting/unstable kafka processors.
Background
We're facing the issue that our kafka-cluster restarts or rebalances from time to time, which makes all processors restart. Since the processors will rebalance, this PR uses reconnecting views to be used for the join/lookup tables.