You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: reps/2023-05-10-actors-batch-remote-api.md
+46-5Lines changed: 46 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ In distributed computing scenarios, such as big data computing、AI training and
8
8
For example, in a typical star-topology architecture with one master Actor and 400 worker Actors, the same computational request needs to be sent to 400 worker Actors.
9
9
However, the computing tasks of each Worker Actor are very short. In such a scenario, the performance requirements for executing batch actor task remote of a large number of Actor are very high.
10
10
Therefore, for the scenario of batch calling Actor tasks, I want to add a new optimization API, batch_remote(), to improve the performance of batch submission of Actor Task calls.
11
-
After my own performance testing and comparison, this API can achieve a performance improvement of 40% ~ 98%.
11
+
After my own performance testing and comparison, this API interface can improve performance by 2 ~ 100+ times.
12
12
13
13
Current situation of batch calling actor tasks:
14
14
```
@@ -87,7 +87,7 @@ The following are the performance comparison results.
87
87
**Table 1: Comparison of remote call time with varying parameter sizes and 400 Actors**
88
88
89
89
90
-
Parameter Size (byte) | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The improvement rate
90
+
Parameter Size (byte) | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The ratio of time reduction
91
91
-- | -- | -- | --
92
92
10 | 40.532 | 9.226 | 77.2%
93
93
409846 | 584.345 | 24.106 | 95.9%
@@ -119,7 +119,7 @@ Parameter Size (byte) | Time taken for foreach_remote(ms) | Time taken for batch
119
119
120
120
**Table 2: Comparison of remote call time with varying numbers of Actors and a fixed parameter size (1MB)**
121
121
122
-
actor counts | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The improvement rate
122
+
actor counts | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The ratio of time reduction
123
123
-- | -- | -- | --
124
124
50 | 95.889 | 4.657 | 95.1%
125
125
100 | 196.184 | 8.447 | 95.7%
@@ -151,7 +151,7 @@ The more actors, the greater the performance gain.
151
151
152
152
This test is to confirm the degree of performance optimization after reducing the frequency of switching between the Python and C++ execution layers.
153
153
154
-
actor counts | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The improvement rate
154
+
actor counts | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The ratio of time reduction
155
155
-- | -- | -- | --
156
156
50 | 2.083 | 1.257 | 39.7%
157
157
100 | 4.005 | 2.314 | 42.2%
@@ -177,7 +177,36 @@ actor counts | Time taken for foreach_remote(ms) | Time taken for batch_remote(m
177
177
178
178
179
179
**Conclusion:**
180
-
After comparison, in the scenario of 400 actors and remote calls without parameters, the performance is optimized by 40%~50%.
180
+
After comparison, in the scenario of remote calls without parameters, the performance is optimized by 2+ times.
181
+
182
+
**Table 3: Comparison of remote call time with varying numbers of Actors and object ref parameters in remote calls**
183
+
184
+
actor counts | The time taken for foreach_remote(ms) | The time taken for batch_remote(ms) | The ratio of time reduction
185
+
-- | -- | -- | --
186
+
50 | 3.878 | 1.488 | 61.6%
187
+
100 | 8.383 | 2.405 | 71.3%
188
+
150 | 12.16 | 3.255 | 73.2%
189
+
200 | 16.835 | 4.913 | 70.8%
190
+
250 | 21.09 | 6.424 | 69.5%
191
+
300 | 24.674 | 8.272 | 66.5%
192
+
350 | 28.639 | 8.862 | 69.1%
193
+
400 | 33.42 | 10.352 | 69.0%
194
+
450 | 37.39 | 12.02 | 67.9%
195
+
500 | 39.944 | 13.288 | 66.7%
196
+
550 | 45.019 | 15.005 | 66.7%
197
+
600 | 48.237 | 15.349 | 68.2%
198
+
650 | 53.304 | 17.149 | 67.8%
199
+
700 | 56.961 | 18.124 | 68.2%
200
+
750 | 61.672 | 19.079 | 69.1%
201
+
800 | 66.185 | 20.485 | 69.0%
202
+
850 | 69.524 | 21.584 | 69.0%
203
+
900 | 74.754 | 22.304 | 70.2%
204
+
950 | 79.493 | 25.932 | 67.4%
205
+
206
+

207
+
208
+
**Conclusion:**
209
+
After comparison, in the scenario of remote calls with object ref paramter, the performance is optimized by 3~4 times.
181
210
182
211
**Summary:**
183
212
The newly added Batch Remote API can improve performance in the case of batch calling Actor task. It can reduce performance costs such as parameter serialization, object store consumption, and Python and C++ execution layer switching, thereby improving the performance of the entire distributed computing system.
@@ -186,6 +215,18 @@ Especially in the following scenario:
186
215
2. a large number of Actors
187
216
188
217
218
+
### Failure & Exception Scenario.
219
+
220
+
**1. Exceptions occurred during parameter validation or preprocessing before batch submission of ActorTasks.**
221
+
Since these exceptions occur before the process of submitting ActorTasks, they can be handled by directly throwing specific error exceptions as current situation.
222
+
223
+
**2. Some actors throw exceptions during the process of batch submitting ActorTasks.**
224
+
When traversing and submitting ActorTasks in a loop, if one of the Actors throws an exception during submission, the subsequent ActorTasks will be terminated immediately, and the exception will be throwed to user.
225
+
226
+
Reason:
227
+
1. Submitting ActorTask is normally done without any exceptions being thrown. If an error does occur, it is likely due to issues with the code and will require modifications.
228
+
2. The exception behavior of this plan is the same as the current foreach remote.
0 commit comments