-
Notifications
You must be signed in to change notification settings - Fork 295
Description
Current behaviour
If a child of the Shape.Supervisor crashes more than 3 times in 5 seconds the Shape.Supervisor shuts down and doesn't restart because it is transient. The WAL will then start to build up. If the connection restarts the Connection.Manager with then report that the Shape.Supervisor is :already_present.
This happened in production for AutoArc when the ShapeLogCollector crashed 4 times in under a second due to trying to call a missing Materializer.
Suggested behaviour
If a child of the Shape.Supervisor crashes more than 3 times in 5 seconds the Shape.Supervisor should be restarted but with all shape data wiped. It is important that the shape data is wiped in this situation as otherwise the ShapeLogCollector will go on trying to processes the txn/shape combinations it crashes on, going into an infinite crash loop. Ideally the Shape.Supervisor should be permanent not transient.
Please note that the Connection.Manager should not match on :already_present. :already_present indicates a serious issue has already happened (the shape subsystem has gone down and has not been restarted) and there is no point in handling this gracefully.