Skip to content

Enable the TraceBuffer in production #697

@etorreborre

Description

@etorreborre

Abstract

We have the support for emitting execution traces at runtime when a node runs.
Those execution traces can be replayed in case of a bug.
We should enable them, make sure that we control the impact on the local disk, on performance and that we can indeed replay them if necessay.

Why?

Concurrency bugs can be extremely hard to reproduce. Having a good support for reproducing real production bugs can be invaluable.

How?

By setting a proper trace buffer in the TokioBuilder.

Testing Strategy / Acceptance Criteria

Run a node, simulate a bug, collect and replay the trace. Check with the trace animation if we can understand what went wrong.

Discussion points

No response

Dependencies & Related Tasks

No response

Checklist

  • I understand that feature requests and unrefined work item should be open as GitHub Discussions instead.
  • I have assigned this item to an existing milestone from the roadmap
  • I have added a label capturing the impact of this item (i.e. value for users/stakeholders if successful)
  • I have added a label capturing the delivery risk of this item (i.e. how likely is it that this task will succeed as planned)
  • I have added a label capturing the effort of this item (i.e. how large is the task?)

Metadata

Metadata

Assignees

Labels

.EFFORT.LowDays.RISK.LowWell-understood task, mostly mechanical.VALUE.MediumImproves reliability, performance, or developer experienceTOPIC.ConsensusMostly related to amaru-consensus / amaru-ouroboros

Type

Projects

Status

No status

Relationships

None yet

Development

No branches or pull requests

Issue actions