Skip to content

Bug: Sporadically safe task error is not handled by the safe worker, but a regular worker #20

@PiotrKorkus

Description

@PiotrKorkus

Original report: https://github.com/qorix-group/inc_orchestrator_internal/issues/397

Describe the bug:

Sporadically safe task error is not handled by the safe worker, but a regular worker.

Test spawns safely task which returns error. It is expected that task is continued on the safety worker, but sporadically it is continuing on the regular worker.

Safe worker thread ID is retrieved from the kyron logs, e.g.:
{"timestamp":"2126","level":"DEBUG","fields":{"message":"Safety worker UniqueWorkerId(14144299263607051467) started"},"target":"kyron::scheduler::workers::safety_worker","threadId":"ThreadId(2)"}

ThreadId(2) is then compared with trace in the given task.

Steps to reproduce the behavior:

(No pure rust repro this time as there is no possibility to quickly distinguish the failure without test logic)

  1. git checkout igorostrowskiq_safety_worker_tests
  2. cd component_integration_tests/python_test_cases
  3. ./run_tests.py --count 1000 --repeat-scope=session -x tests/runtime/worker/test_safety_worker.py::TestTaskHandling - to run test in a loop up to 1000 times with interrupt on the first failure. All logs will be printed then.

Observed behavior:

Safe spawned task error is not handled by the safety worker, but regular worker

Expected behavior

Safe spawned tasks error is handled by the safety worker

Occurrence:

Sporadic (~1% repro rate)

Attachments / Logs:

log.txt

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions