-
Notifications
You must be signed in to change notification settings - Fork 225
Workload Assignment
There are four critical components at play when a worker requests a workload.
-
Priority: Tests with a higher priority are completed before tests of lower priority, in general. However, this does not strictly mean that the highest priority test will be returned to the worker. The highest priority test, that the worker is capable of completing, will be assigned. A worker must have the proper Operating System, CPU Flags, Syzygy requirements, Thread counts, Compiler Versions (public engines), and Fine-grained Tokens (private engines), to be able to accept a workload. The list of all such highest priority tests that a worker can complete comprises the candidate assignments.
-
Throughput: Every test has a throughput value associated with it. By default, tests are created with a throughput of 1000. Suppose there is exactly one test running, with
Throughput=1000. If we want to create another test, and have it get twice a many resources, we would create one with aThroughput=2000. This means that ~2/3rds of all resources will go to the new test. If we wanted a third test, which got 1/2 of all resource, we would need to create one withThroughput=3000. This is because the total throughput is now 6000, and the test itself has 3000. When summing up the throughput values, we only consider tests in the candidate assignments. -
Engine Balancing: The
balance_engine_throughputsoption in the main configuration, if enabled, will scale the throughput of all tests for an engine, by dividing by the number of tests that the engine has at once, in the candidate assignments. For example, if Ethereal has a two tests atPriority=1, and three tests atPriority=0, then the throughput will be scaled down by a factor of 2, not by a factor of 5. -
Focus Enabled Workers: The client has an option,
--focus ENGINE1 [ENGINE2 ...]which allows the user to specify a list of engines which they would prefer to have their machine contribute games towards. If a focused engine appears in the candidate assignments, then all non-focused engines will be excluded. Furthermore, the balancing algorithm will ignore a machine which is assigned to one of the focused engines, when assigning workloads to non focus-assigned workers. This makes it such that connecting your own machines for your own engine will be a strict increase in the resources your tests get. If a machine is given an assignment whose dev engine appears in their focus list, we call this a focus-assigned machine
-
Identify all tests which the machine is capable of completing. Filter that list down such that only the highest priority tests remain. If the machine was run with
--focus, and some of the focused engines appear in the list still, then remove all engines which are not focused. These are the candidate assignments. -
Determine how many threads are currently assigned to each of the candidate tests. If the machine is not going to be focus-assigned, then ignore the thread contribution to the candidate tests that come from other focus-assigned machines. The conditional ignore is needed, in order to still balance amongst focus-assigned machines.
-
Compute the effective-throughput for each test. If
balance_engine_throughputsis not enabled, then the throughput is the effective-throughput. Otherwise, divide each individual test's throughput, by the number of tests from the test's dev engine which appear in the candidate assignments. -
Compute the resource ratios for each test. The ratio for a test is the number of assigned threads, divided by the effective-throughput. For example: Test #1 has 1000 effective-throughput and 32 threads. Test #2 has 2000 effective-throughput and 16 threads. The ratios for test 1 and 2 would be 0.032 and 0.008 respectively. This tells us that Test #1 is getting 4x the resources as Test #2. A 2x factor comes from the 1000 vs 2000 throughput, and another 2x factor comes from the 32 vs 16 threads.
-
Compute the fair resource ratio. This is the total number of threads on the candidate tests, divided by the sum of the effective throughputs. In our example, this would be 0.016. From this, we see that Test #1 is getting twice as many resources as would be fair, and Test #2 is only getting half the resources as would be fair.
-
For efficiency, we would like the machines to repeat the same workload multiple times, to reduce overhead on downloading and building engines. If the machine 's most recent workload is in our candidates, and no machine is receiving less than 75% of the fair amount of resources, then repeat the same test.
-
If any of the candidate tests have a ratio of 0, select from all tests with a ratio of 0, using the throughput of each test for random weighting.
-
Otherwise, return the test which has the lowest ratio. If many tests have a shared lowest ratio, select from them at random, without throughput weighting.
The server provides three critical values in the workload JSON response. These are cutechess-count, concurrency-per, and games-per-cutechess. These values are a function of the number of threads and sockets, as well as the nature of the test or tune. They are explained below.
-
cutechess-countindicates the number of cutechess copies that should be running at one time. For a typical workload, where each engine is playing with one thread,cutechess-countwill be equal to the number of sockets on the worker, as provided via--nsocketsor-Nwhen starting the Client. If the workload is an SPSA tune, using theMULTIPLEmethod of distributing SPSA-points, thencutechess-countwill be the maximum number of concurrent games divided by two. Finally, if the previous condition is not true, and the workload uses more than 1 thread for either engine, thencutechess-countwill be set to1. -
concurrency-perindicates the number of concurrent games that will be played, for any particular cutechess copy that is running. If the workload is an SPSA tune, using theMULTIPLEmethod of distributing SPSA-points, then this value will be2. Otherwise, it will be the maximum number of concurrent games, which is defined as(threads // cutechess-count) // max(dev_threads, base_threads). -
games-per-cutechessis the number of games to play total, on each particular cutechess copy that is running. Once again, SPSA tunes using theMULTIPLEmethod are a special case, and will play2 * workload_sizegames, ie a game-pair for eachworkload_size. The general case will instead play2 * workload_size * concurrency-per, ie a game-pair for eachworkload_sizefor each possible concurrent game.