-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Original report: https://github.com/qorix-group/inc_orchestrator_internal/issues/397
Describe the bug:
Sporadically safe task error is not handled by the safe worker, but a regular worker.
Test spawns safely task which returns error. It is expected that task is continued on the safety worker, but sporadically it is continuing on the regular worker.
Safe worker thread ID is retrieved from the kyron logs, e.g.:
{"timestamp":"2126","level":"DEBUG","fields":{"message":"Safety worker UniqueWorkerId(14144299263607051467) started"},"target":"kyron::scheduler::workers::safety_worker","threadId":"ThreadId(2)"}
ThreadId(2) is then compared with trace in the given task.
Steps to reproduce the behavior:
(No pure rust repro this time as there is no possibility to quickly distinguish the failure without test logic)
git checkout igorostrowskiq_safety_worker_testscd component_integration_tests/python_test_cases./run_tests.py --count 1000 --repeat-scope=session -x tests/runtime/worker/test_safety_worker.py::TestTaskHandling- to run test in a loop up to 1000 times with interrupt on the first failure. All logs will be printed then.
Observed behavior:
Safe spawned task error is not handled by the safety worker, but regular worker
Expected behavior
Safe spawned tasks error is handled by the safety worker
Occurrence:
Sporadic (~1% repro rate)
Attachments / Logs:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status