Improve node pooling: circuit breaker

## What, why, who

As a user of the node service I would like to have unhealthy nodes timed out so that they do not get stressed with unnecessary load and the node selection focuses on healthy nodes.

## Acceptance criteria

* Count erroneous calls to nodes within a sliding window
  * Error: node does not respond properly (call times out, i/o timeout, connection refused, ...).
  * Treat tick delay to the reliable node tick as an error.
  * Sliding window: call based (not time based)
* If a certain threshold of errors is reached within a certain window then time out the node.
* After a time out do a probe request if the node is working again.
  * In case the probe fails then time out the node again.
  * If the probe is successful reset the error counter and treat the node as healthy.
* If a node times out a certain number of times and does not recover by probes it should be removed from the node pool completely (it will be added again if it is still a public peer). This feature is only enabled if the public node strategy is used.
* Statically configured nodes *must never* be removed from the node pool!
* Configuration
  * Error threshold (percentage of erroneous calls)
  * Sliding window (number of counted calls)
  * Tick delay that is treated as an error
  * Number of failed probes for node removal

## Open questions

* We don't know where to implement this because we also have go-qubic that implements node pooling. Where should we add this?
* Should we support time based sliding windows, too?
* Can we use a library?

## References

Circuit breaker example: https://resilience4j.readme.io/docs/circuitbreaker

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve node pooling: circuit breaker #19

What, why, who

Acceptance criteria

Open questions

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve node pooling: circuit breaker #19

Description

What, why, who

Acceptance criteria

Open questions

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions