Skip to content

Coroutines may randomly hang #4531

@iseki0

Description

@iseki0

I make sure it shouldn't happended, it's a bug, but I can't reproduce it easily. It confuses me more than 1 year.

Describe the bug

The coroutines may randomly hang in heavy workload.

What happened? What should have happened instead?

Sometimes I wrote code like this(but it's not simple as-it, maybe it's a reason that I can't reproducing easily.)

runBlocking(Dispatchers.Default/IO){
    coroutineScope{
      // maybe in a very deep coroutine tree
      val values = list.map {
        async {
          // Here maybe a very complex CPU/IO-bound operation
          // such as RegDos, blocked IO operation.
          // This function is not a suspend function.
          doSomethingMightBlockTheThreadForAVeryLongTime(it)
        }
      }.awaitAll()
  }
}

The coroutines might randomly hang at the async-tree. Even cancelling can't let it back.
In my investgations, it looks like the dispatchers just ignore the dispatched continuation.
After using the kotlinx-coroutine debugger, I found it was hang at three point: SUSPENDED, CREATED and CANCELLING(if the parent is cancelling), the line number is null.
I checked the jstack output, all dispatchers(Default/IO) were in parking, they're waiting dispatched continuations.

Looks like it only happens when the server in a heavy load with many coroutines.
I can't provide the code because of the codebase is very huge. If any, please provide some debugging advices.

Other things I was used: kotlinx-coroutines-slf4j with MDCContext()

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions