Remove created threads restriction from BlockingCoroutineDispatcherThreadLimitStressTest (#4061)

qwwdfsad · web-flow · commit 9892dc2a62da · 2024-03-11T16:26:40.000+01:00
* Better documentation * Remove incorrect assertion from BlockingCoroutineDispatcherThreadLimitStressTest Fixes #3712
diff --git a/kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt b/kotlinx-coroutines-core/jvm/src/scheduling/CoroutineScheduler.kt
@@ -10,30 +10,33 @@ import kotlin.jvm.internal.Ref.ObjectRef
 import kotlin.math.*
 
 /**
- * Coroutine scheduler (pool of shared threads) which primary target is to distribute dispatched coroutines
- * over worker threads, including both CPU-intensive and blocking tasks, in the most efficient manner.
+ * Coroutine scheduler (pool of shared threads) with a primary target to distribute dispatched coroutines
+ * over worker threads, including both CPU-intensive and potentially blocking tasks, in the most efficient manner.
  *
- * Current scheduler implementation has two optimization targets:
- * - Efficiency in the face of communication patterns (e.g. actors communicating via channel)
- * - Dynamic resizing to support blocking calls without re-dispatching coroutine to separate "blocking" thread pool.
+ * The current scheduler implementation has two optimization targets:
+ * - Efficiency in the face of communication patterns (e.g. actors communicating via channel).
+ * - Dynamic thread state and resizing to schedule blocking calls without re-dispatching coroutine to a separate "blocking" thread pool.
  *
  * ### Structural overview
  *
- * Scheduler consists of [corePoolSize] worker threads to execute CPU-bound tasks and up to
- * [maxPoolSize] lazily created  threads to execute blocking tasks.
- * Every worker has a local queue in addition to a global scheduler queue
- * and the global queue has priority over local queue to avoid starvation of externally-submitted
- * (e.g. from Android UI thread) tasks.
- * Work-stealing is implemented on top of that queues to provide
- * even load distribution and illusion of centralized run queue.
+ * The scheduler consists of [corePoolSize] worker threads to execute CPU-bound tasks and up to
+ * [maxPoolSize] lazily created threads to execute blocking tasks.
+ * The scheduler has two global queues -- one for CPU tasks and one for blocking tasks.
+ * These queues are used for tasks that a submited externally (from threads not belonging to the scheduler)
+ * and as overflow buffers for thread-local queues.
+ *
+ * Every worker has a local queue in addition to global scheduler queues.
+ * The queue to pick the task from is selected randomly to avoid starvation of both local queue and
+ * global queue submitted tasks.
+ * Work-stealing is implemented on top of that queues to provide even load distribution and an illusion of centralized run queue.
  *
  * ### Scheduling policy
  *
  * When a coroutine is dispatched from within a scheduler worker, it's placed into the head of worker run queue.
  * If the head is not empty, the task from the head is moved to the tail. Though it is an unfair scheduling policy,
  * it effectively couples communicating coroutines into one and eliminates scheduling latency
  * that arises from placing tasks to the end of the queue.
- * Placing former head to the tail is necessary to provide semi-FIFO order, otherwise, queue degenerates to stack.
+ * Placing former head to the tail is necessary to provide semi-FIFO order, otherwise, queue degenerates to a stack.
  * When a coroutine is dispatched from an external thread, it's put into the global queue.
  * The original idea with a single-slot LIFO buffer comes from Golang runtime scheduler by D. Vyukov.
  * It was proven to be "fair enough", performant and generally well accepted and initially was a significant inspiration
@@ -45,39 +48,41 @@ import kotlin.math.*
  * before parking when his local queue is empty.
  * A non-standard solution is implemented to provide tasks affinity: a task from FIFO buffer may be stolen
  * only if it is stale enough based on the value of [WORK_STEALING_TIME_RESOLUTION_NS].
- * For this purpose, monotonic global clock is used, and every task has associated with its submission time.
+ * For this purpose, monotonic global clock is used, and every task has a submission time associated with task.
  * This approach shows outstanding results when coroutines are cooperative,
- * but as downside scheduler now depends on a high-resolution global clock,
- * which may limit scalability on NUMA machines. Tasks from LIFO buffer can be stolen on a regular basis.
+ * but as a downside, the scheduler now depends on a high-resolution global clock,
+ * which may limit scalability on NUMA machines.
  *
  * ### Thread management
- * One of the hardest parts of the scheduler is decentralized management of the threads with the progress guarantees
+ *
+ * One of the hardest parts of the scheduler is decentralized management of the threads with progress guarantees
  * similar to the regular centralized executors.
  * The state of the threads consists of [controlState] and [parkedWorkersStack] fields.
- * The former field incorporates the amount of created threads, CPU-tokens and blocking tasks
- * that require a thread compensation,
- * while the latter represents intrusive versioned Treiber stack of idle workers.
- * When a worker cannot find any work, they first add themselves to the stack,
+ * The former field incorporates the number of created threads, CPU-tokens and blocking tasks
+ * that require thread compensation,
+ * while the latter represents an intrusive versioned Treiber stack of idle workers.
+ * When a worker cannot find any work, it first adds itself to the stack,
  * then re-scans the queue to avoid missing signals and then attempts to park
- * with additional rendezvous against unnecessary parking.
+ * with an additional rendezvous against unnecessary parking.
  * If a worker finds a task that it cannot yet steal due to time constraints, it stores this fact in its state
  * (to be uncounted when additional work is signalled) and parks for such duration.
  *
- * When a new task arrives in the scheduler (whether it is local or global queue),
+ * When a new task arrives to the scheduler (whether it is a local or a global queue),
  * either an idle worker is being signalled, or a new worker is attempted to be created.
  * (Only [corePoolSize] workers can be created for regular CPU tasks)
  *
  * ### Support for blocking tasks
+ *
  * The scheduler also supports the notion of [blocking][TASK_PROBABLY_BLOCKING] tasks.
- * When executing or enqueuing blocking tasks, the scheduler notifies or creates one more worker in
- * addition to core pool size, so at any given moment, it has [corePoolSize] threads (potentially not yet created)
- * to serve CPU-bound tasks. To properly guarantee liveness, the scheduler maintains
- * "CPU permits" -- [corePoolSize] special tokens that permit an arbitrary worker to execute and steal CPU-bound tasks.
- * When worker encounters blocking tasks, it basically hands off its permit to another thread (not directly though) to
- * keep invariant "scheduler always has at least min(pending CPU tasks, core pool size)
+ * When executing or enqueuing blocking tasks, the scheduler notifies or creates an additional worker in
+ * addition to the core pool size, so at any given moment, it has [corePoolSize] threads (potentially not yet created)
+ * available to serve CPU-bound tasks. To properly guarantee liveness, the scheduler maintains
+ * "CPU permits" -- #[corePoolSize] special tokens that allow an arbitrary worker to execute and steal CPU-bound tasks.
+ * When a worker encounters a blocking tasks, it releases its permit to the scheduler to
+ * keep an invariant "scheduler always has at least min(pending CPU tasks, core pool size)
  * and at most core pool size threads to execute CPU tasks".
  * To avoid overprovision, workers without CPU permit are allowed to scan [globalBlockingQueue]
- * and steal **only** blocking tasks from other workers.
+ * and steal **only** blocking tasks from other workers which imposes a non-trivial complexity to the queue management.
  *
  * The scheduler does not limit the count of pending blocking tasks, potentially creating up to [maxPoolSize] threads.
  * End users do not have access to the scheduler directly and can dispatch blocking tasks only with
diff --git a/kotlinx-coroutines-core/jvm/test/scheduling/BlockingCoroutineDispatcherThreadLimitStressTest.kt b/kotlinx-coroutines-core/jvm/test/scheduling/BlockingCoroutineDispatcherThreadLimitStressTest.kt
@@ -14,7 +14,7 @@ class BlockingCoroutineDispatcherThreadLimitStressTest : SchedulerTestBase() {
         corePoolSize = CORES_COUNT
     }
 
-    private val observedConcurrency = ConcurrentHashMap<Int, Boolean>()
+    private val observedParallelism = ConcurrentHashMap<Int, Boolean>().keySet(true)
     private val concurrentWorkers = AtomicInteger(0)
 
     @Test
@@ -27,41 +27,32 @@ class BlockingCoroutineDispatcherThreadLimitStressTest : SchedulerTestBase() {
                 async(limitingDispatcher) {
                     try {
                         val currentlyExecuting = concurrentWorkers.incrementAndGet()
-                        observedConcurrency[currentlyExecuting] = true
-                        assertTrue(currentlyExecuting <= CORES_COUNT)
+                        observedParallelism.add(currentlyExecuting)
                     } finally {
                         concurrentWorkers.decrementAndGet()
                     }
                 }
             }
-            tasks.forEach { it.await() }
-            for (i in CORES_COUNT + 1..CORES_COUNT * 2) {
-                require(i !in observedConcurrency.keys) { "Unexpected state: $observedConcurrency" }
-            }
-            checkPoolThreadsCreated(0..CORES_COUNT + 1)
+            tasks.awaitAll()
+            assertEquals(1, observedParallelism.single(), "Expected parallelism should be 1, had $observedParallelism")
         }
     }
 
     @Test
-    @Ignore
     fun testLimitParallelism() = runBlocking {
         val limitingDispatcher = blockingDispatcher(CORES_COUNT)
         val iterations = 50_000 * stressTestMultiplier
         val tasks = (1..iterations).map {
             async(limitingDispatcher) {
                 try {
                     val currentlyExecuting = concurrentWorkers.incrementAndGet()
-                    observedConcurrency[currentlyExecuting] = true
-                    assertTrue(currentlyExecuting <= CORES_COUNT)
+                    observedParallelism.add(currentlyExecuting)
                 } finally {
                     concurrentWorkers.decrementAndGet()
                 }
             }
         }
-        tasks.forEach { it.await() }
-        for (i in CORES_COUNT + 1..CORES_COUNT * 2) {
-            require(i !in observedConcurrency.keys) { "Unexpected state: $observedConcurrency" }
-        }
-        checkPoolThreadsCreated(CORES_COUNT..CORES_COUNT * 3)
+        tasks.awaitAll()
+        assertTrue(observedParallelism.max() <= CORES_COUNT, "Unexpected state: $observedParallelism")
     }
 }
diff --git a/kotlinx-coroutines-core/jvm/test/scheduling/SchedulerTestBase.kt b/kotlinx-coroutines-core/jvm/test/scheduling/SchedulerTestBase.kt
@@ -95,6 +95,12 @@ abstract class SchedulerTestBase : TestBase() {
     }
 }
 
+/**
+ * Implementation note:
+ * Our [Dispatcher.IO] is a [limitedParallelism][CoroutineDispatcher.limitedParallelism] dispatcher
+ * on top of unbounded scheduler. We want to test this scenario, but on top of non-singleton
+ * scheduler so we can control the number of threads, thus this method.
+ */
 internal fun SchedulerCoroutineDispatcher.blocking(parallelism: Int = 16): CoroutineDispatcher {
     return object : CoroutineDispatcher() {
 

Original file line number	Diff line number	Diff line change
`@@ -14,7 +14,7 @@ class BlockingCoroutineDispatcherThreadLimitStressTest : SchedulerTestBase() {`
`14`	`14`	`corePoolSize = CORES_COUNT`
`15`	`15`	`}`
`16`	`16`
`17`		`- private val observedConcurrency = ConcurrentHashMap<Int, Boolean>()`
	`17`	`+ private val observedParallelism = ConcurrentHashMap<Int, Boolean>().keySet(true)`
`18`	`18`	`private val concurrentWorkers = AtomicInteger(0)`
`19`	`19`
`20`	`20`	`@Test`
`@@ -27,41 +27,32 @@ class BlockingCoroutineDispatcherThreadLimitStressTest : SchedulerTestBase() {`
`27`	`27`	`async(limitingDispatcher) {`
`28`	`28`	`try {`
`29`	`29`	`val currentlyExecuting = concurrentWorkers.incrementAndGet()`
`30`		`- observedConcurrency[currentlyExecuting] = true`
`31`		`- assertTrue(currentlyExecuting <= CORES_COUNT)`
	`30`	`+ observedParallelism.add(currentlyExecuting)`
`32`	`31`	`} finally {`
`33`	`32`	`concurrentWorkers.decrementAndGet()`
`34`	`33`	`}`
`35`	`34`	`}`
`36`	`35`	`}`
`37`		`- tasks.forEach { it.await() }`
`38`		`- for (i in CORES_COUNT + 1..CORES_COUNT * 2) {`
`39`		`- require(i !in observedConcurrency.keys) { "Unexpected state: $observedConcurrency" }`
`40`		`- }`
`41`		`- checkPoolThreadsCreated(0..CORES_COUNT + 1)`
	`36`	`+ tasks.awaitAll()`
	`37`	`+ assertEquals(1, observedParallelism.single(), "Expected parallelism should be 1, had $observedParallelism")`
`42`	`38`	`}`
`43`	`39`	`}`
`44`	`40`
`45`	`41`	`@Test`
`46`		`- @Ignore`
`47`	`42`	`fun testLimitParallelism() = runBlocking {`
`48`	`43`	`val limitingDispatcher = blockingDispatcher(CORES_COUNT)`
`49`	`44`	`val iterations = 50_000 * stressTestMultiplier`
`50`	`45`	`val tasks = (1..iterations).map {`
`51`	`46`	`async(limitingDispatcher) {`
`52`	`47`	`try {`
`53`	`48`	`val currentlyExecuting = concurrentWorkers.incrementAndGet()`
`54`		`- observedConcurrency[currentlyExecuting] = true`
`55`		`- assertTrue(currentlyExecuting <= CORES_COUNT)`
	`49`	`+ observedParallelism.add(currentlyExecuting)`
`56`	`50`	`} finally {`
`57`	`51`	`concurrentWorkers.decrementAndGet()`
`58`	`52`	`}`
`59`	`53`	`}`
`60`	`54`	`}`
`61`		`- tasks.forEach { it.await() }`
`62`		`- for (i in CORES_COUNT + 1..CORES_COUNT * 2) {`
`63`		`- require(i !in observedConcurrency.keys) { "Unexpected state: $observedConcurrency" }`
`64`		`- }`
`65`		`- checkPoolThreadsCreated(CORES_COUNT..CORES_COUNT * 3)`
	`55`	`+ tasks.awaitAll()`
	`56`	`+ assertTrue(observedParallelism.max() <= CORES_COUNT, "Unexpected state: $observedParallelism")`
`66`	`57`	`}`
`67`	`58`	`}`
Original file line number	Diff line number	Diff line change
`@@ -95,6 +95,12 @@ abstract class SchedulerTestBase : TestBase() {`
`95`	`95`	`}`
`96`	`96`	`}`
`97`	`97`
	`98`	`+/**`
	`99`	`+ * Implementation note:`
	`100`	`+ * Our [Dispatcher.IO] is a [limitedParallelism][CoroutineDispatcher.limitedParallelism] dispatcher`
	`101`	`+ * on top of unbounded scheduler. We want to test this scenario, but on top of non-singleton`
	`102`	`+ * scheduler so we can control the number of threads, thus this method.`
	`103`	`+ */`
`98`	`104`	`internal fun SchedulerCoroutineDispatcher.blocking(parallelism: Int = 16): CoroutineDispatcher {`
`99`	`105`	`return object : CoroutineDispatcher() {`
`100`	`106`