Skip to content

Commit d5d23b1

Browse files
committed
Add failure & exception scenario
Signed-off-by: 稚鱼 <[email protected]>
1 parent a4002b6 commit d5d23b1

File tree

1 file changed

+46
-5
lines changed

1 file changed

+46
-5
lines changed

reps/2023-05-10-actors-batch-remote-api.md

Lines changed: 46 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ In distributed computing scenarios, such as big data computing、AI training and
88
For example, in a typical star-topology architecture with one master Actor and 400 worker Actors, the same computational request needs to be sent to 400 worker Actors.
99
However, the computing tasks of each Worker Actor are very short. In such a scenario, the performance requirements for executing batch actor task remote of a large number of Actor are very high.
1010
Therefore, for the scenario of batch calling Actor tasks, I want to add a new optimization API, batch_remote(), to improve the performance of batch submission of Actor Task calls.
11-
After my own performance testing and comparison, this API can achieve a performance improvement of 40% ~ 98%.
11+
After my own performance testing and comparison, this API interface can improve performance by 2 ~ 100+ times.
1212

1313
Current situation of batch calling actor tasks:
1414
```
@@ -87,7 +87,7 @@ The following are the performance comparison results.
8787
**Table 1: Comparison of remote call time with varying parameter sizes and 400 Actors**
8888

8989

90-
Parameter Size (byte) | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The improvement rate
90+
Parameter Size (byte) | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The ratio of time reduction
9191
-- | -- | -- | --
9292
10 | 40.532 | 9.226 | 77.2%
9393
409846 | 584.345 | 24.106 | 95.9%
@@ -119,7 +119,7 @@ Parameter Size (byte) | Time taken for foreach_remote(ms) | Time taken for batch
119119

120120
**Table 2: Comparison of remote call time with varying numbers of Actors and a fixed parameter size (1MB)**
121121

122-
actor counts | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The improvement rate
122+
actor counts | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The ratio of time reduction
123123
-- | -- | -- | --
124124
50 | 95.889 | 4.657 | 95.1%
125125
100 | 196.184 | 8.447 | 95.7%
@@ -151,7 +151,7 @@ The more actors, the greater the performance gain.
151151

152152
This test is to confirm the degree of performance optimization after reducing the frequency of switching between the Python and C++ execution layers.
153153

154-
actor counts | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The improvement rate
154+
actor counts | Time taken for foreach_remote(ms) | Time taken for batch_remote(ms) | The ratio of time reduction
155155
-- | -- | -- | --
156156
50 | 2.083 | 1.257 | 39.7%
157157
100 | 4.005 | 2.314 | 42.2%
@@ -177,7 +177,36 @@ actor counts | Time taken for foreach_remote(ms) | Time taken for batch_remote(m
177177

178178

179179
**Conclusion:**
180-
After comparison, in the scenario of 400 actors and remote calls without parameters, the performance is optimized by 40%~50%.
180+
After comparison, in the scenario of remote calls without parameters, the performance is optimized by 2+ times.
181+
182+
**Table 4: Comparison of remote call time with varying numbers of Actors and object ref parameters in remote calls**
183+
184+
actor counts | The time taken for foreach_remote(ms) | The time taken for batch_remote(ms) | The ratio of time reduction
185+
-- | -- | -- | --
186+
50 | 3.878 | 1.488 | 61.6%
187+
100 | 8.383 | 2.405 | 71.3%
188+
150 | 12.16 | 3.255 | 73.2%
189+
200 | 16.835 | 4.913 | 70.8%
190+
250 | 21.09 | 6.424 | 69.5%
191+
300 | 24.674 | 8.272 | 66.5%
192+
350 | 28.639 | 8.862 | 69.1%
193+
400 | 33.42 | 10.352 | 69.0%
194+
450 | 37.39 | 12.02 | 67.9%
195+
500 | 39.944 | 13.288 | 66.7%
196+
550 | 45.019 | 15.005 | 66.7%
197+
600 | 48.237 | 15.349 | 68.2%
198+
650 | 53.304 | 17.149 | 67.8%
199+
700 | 56.961 | 18.124 | 68.2%
200+
750 | 61.672 | 19.079 | 69.1%
201+
800 | 66.185 | 20.485 | 69.0%
202+
850 | 69.524 | 21.584 | 69.0%
203+
900 | 74.754 | 22.304 | 70.2%
204+
950 | 79.493 | 25.932 | 67.4%
205+
206+
![Comparison of remote call time with varying numbers of Actors and object ref parameters in remote calls](https://github.com/ray-project/ray/assets/11072802/89a5a0c4-3dfe-4fae-b046-0e1c72790fe1)
207+
208+
**Conclusion:**
209+
After comparison, in the scenario of remote calls with object ref paramter, the performance is optimized by 3~4 times.
181210

182211
**Summary:**
183212
The newly added Batch Remote API can improve performance in the case of batch calling Actor task. It can reduce performance costs such as parameter serialization, object store consumption, and Python and C++ execution layer switching, thereby improving the performance of the entire distributed computing system.
@@ -186,6 +215,18 @@ Especially in the following scenario:
186215
2. a large number of Actors
187216

188217

218+
### Failure & Exception Scenario.
219+
220+
**1. Exceptions occurred during parameter validation or preprocessing before batch submission of ActorTasks.**
221+
Since these exceptions occur before the process of submitting ActorTasks, they can be handled by directly throwing specific error exceptions as current situation.
222+
223+
**2. Some actors throw exceptions during the process of batch submitting ActorTasks.**
224+
When traversing and submitting ActorTasks in a loop, if one of the Actors throws an exception during submission, the subsequent ActorTasks will be terminated immediately, and the exception will be throwed to user.
225+
226+
Reason:
227+
1. Submitting ActorTask is normally done without any exceptions being thrown. If an error does occur, it is likely due to issues with the code and will require modifications.
228+
2. The exception behavior of this plan is the same as the current foreach remote.
229+
189230
## Compatibility, Deprecation, and Migration Plan
190231
N/A
191232

0 commit comments

Comments
 (0)