Exclusive tasks running without a corresponding lock

# What happened?

It was detected that the `resource-usage-tracker` service did not have a lock inside redis and it was reporting the following error: `redis.exceptions.LockNotOwnedError: Cannot reacquire a lock that's no longer owned`.


# What triggered this situation?

The cause is currently unknown.


# What could trigger this situation?

- Lock is removed manually or somehow expires.
- Redis database becoming unavailable for a short amount of time.

# Why is this bad?

Immagine having multiple instances relying on this lock, if somehow it is freed (and the `lock_context` does nothing), a different instance will acquire it and use it as if the resource (protected by the lock) was free to be used.

# How does it present itself?

Consider the following code:

```python
async with lock_context(...):
    # some user defined
    # code here
```

Two situations compete to producing the issue:
- `lock_context` creates a task which raises `redis.exceptions.LockNotOwnedError` but only logs the issue without handling it.
- `lock_context` (context manager) has no way of stopping the execution of the code defined by the user.


# Is there a way to reproduce it in a test?

Yes, apply the following changes in your repo [changes.diff.zip](https://github.com/user-attachments/files/17632146/changes.diff.zip)

Then run one of the following tests to make the issue appear:
- `tests/test_redis_utils.py::test_possible_regression_lock_extension_fails_if_key_is_missing`
- `tests/test_redis_utils.py::test_possible_regression_first_extension_delayed_and_expires_key`

---

# What was already tried?

To mitigate the issue it was proposed to try and stop the user defined code.
The only possible solution for a context manger is using `signals` (which are only available on linux).

When running test `tests/test_redis_utils.py::test_context_manager_timing_out` the code hangs unexpectedly inside the signal's `handler` functions. This means the entire asyncio runtime will be halted.

This approach cannot be used.

`tests/test_redis_utils.py::test_context_manager_timing_out` which produces a very unexpected result.
```
    def handler(signum, frame):
>       raise RuntimeError(f"Operation timed out after {after} seconds")
E       RuntimeError: Operation timed out after 10 seconds

tests/test_redis_utils.py:190: RuntimeError
> /home/silenthk/work/pr-osparc-redis-lock-issues/packages/service-library/tests/test_redis_utils.py(190)handler()
-> raise RuntimeError(f"Operation timed out after {after} seconds")
```
---

# Different proposals

Feel free to suggest more below.

- @GitHK: I would do the following. The `locking mechanism` should receive a `task` (which runs the user defined code) which it can cancel if something goes wrong. This solution no longer uses a context manger, but uses a battle tested method of stopping the running user defined code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Exclusive tasks running without a corresponding lock #6668

What happened?

What triggered this situation?

What could trigger this situation?

Why is this bad?

How does it present itself?

Is there a way to reproduce it in a test?

What was already tried?

Different proposals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Exclusive tasks running without a corresponding lock #6668

Description

What happened?

What triggered this situation?

What could trigger this situation?

Why is this bad?

How does it present itself?

Is there a way to reproduce it in a test?

What was already tried?

Different proposals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions