Improve the deletion process of the `last_error_event` from the error history of a machine

The `last_error_event` from machines should be cleared from the issue history after some time (about 6 days).

While deploying metal-stack on the new supermicro nodes, we encountered the following problem: Already allocated machines (integrated into a Kubernetes cluster) had the `last_error_event` of : `unexpectedly received in state pxe booting`.

The `metal-api-liveliness` is running in the `metal-control-plane` namespace. The logs do not show any errors for machines.
```bash
{... "msg":"machine liveliness was requested"}
{... "msg":"machine liveliness evaluated","alive":x,"dead":0,"unknown":0,"errors":0}
````

However, listing the machines with `metalctl machine ls` returns some allocated machines with a `⭕ crashloop` issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve the deletion process of the `last_error_event` from the error history of a machine #599

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve the deletion process of the last_error_event from the error history of a machine #599

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Improve the deletion process of the `last_error_event` from the error history of a machine #599