You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: reps/2022-03-10-plasma_alternative.md
+22-22Lines changed: 22 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
1
2
2
## Summary - Plasma Alternative
3
3
### General Motivation
4
-
Ray is a general-purpose and powerful computing framework and makes it simple to scale any compute-intensive Python workload. By storing the objects in Plasma, data can be efficiently (0-copy) shared between Ray tasks. We propose to provide a plasma-like object store with more functionalities target for different situations (e.g. to better support [Kubernetes](https://kubernetes.io/)). In other words, we can simultaneously have multiple plasma object-store variants with different functionalities. (e.g. users can choose a variant at `ray.init()`).
4
+
Ray is a general-purpose and powerful computing framework and makes it simple to scale any compute-intensive Python workload. By storing the objects in Plasma, data can be efficiently (0-copy) shared between Ray tasks. We propose to provide a plasma-like object store with more functionalities targeted for different situations (e.g. to better support [Kubernetes](https://kubernetes.io/)). In other words, we can simultaneously have multiple plasma object-store variants with different functionalities. (e.g. users can choose a variant at `ray.init()`).
5
5
6
6
[**[Vineyard]**](https://github.com/v6d-io/v6d) is also an in-memory immutable data manager that provides out-of-the-box high-level abstraction and zero-copy in-memory sharing for distributed data in big data tasks. It aims to optimize the end-to-end performance of complex data analytics jobs on cloud platforms. It manages various data structures produced in the job, and cross-system data sharing can be conducted without serialization/deserialization overheads by decoupling metadata and data payloads.
7
7
8
8
**Opportunities**: Vineyard can enhance the ray object store with more functionalities for modern data-intensive applications. To be specific, the benefits are four-fold:
9
9
10
10
- O1: Sharing data with systems out of Ray. Vineyard is a daemon in cluster that any system can put/get objects into/from it.
11
-
- O2: Composability. Objects will not be sealed twice when we create a new object form a existing one(e.g. add a new column for dataframe).
11
+
- O2: Composability. Objects will not be sealed twice when we create a new object from a existing one(e.g. add a new column for dataframe).
12
12
- O3: Zero copy in put/get. Objects can be directly created in object store without copying or serialization.
13
13
- O4: Reusable Routines (IO/Chunking/Partitions).
14
14
@@ -28,9 +28,7 @@ The proposal will be open to the public, but please suggest a few experience Ray
### Shepherd of the Proposal (should be a senior committer)
31
-
To make the review process more productive, the owner of each proposal should identify a **shepherd** (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review.
The goal of lifecycle management in local plasma store is to drop some not-in-use objects (evict or spill) when we we ran out of memory. Both plasma and vineyard have thier own lifecycle management, and their behavior may be different. We think that raylet logic should not rely on the underlying lifecycle management strategies (e.g. different eviction policy). We propose to define a **protocol** here that both vineyard and plasma should agree. For example, the protocol will define when a object adds/remove its reference, when object can be evicted/spilled. In other word The protocol defines the impact on lifecycle management in server-side when calling the above `ClientImplInterface` API in client-side. Since most of concerns about the integration may happens in this part, we may need further discussions about these protocols.
119
+
The goal of lifecycle management in local plasma store is to drop some not-in-use objects (evict or spill) when we we ran out of memory. Both plasma and vineyard have thier own lifecycle management, and their behavior may be different. We think that raylet logic should not rely on the underlying lifecycle management strategies (e.g. different eviction policy). We propose to define a **Invariant** here that both vineyard and plasma should agree. For example, the invariant will define when a object adds/remove its reference, when object can be evicted/spilled. In other word The Invariant defines the impact on lifecycle management in server-side when calling the above `ClientImplInterface` API in client-side. Since most of concerns about the integration may happens in this part, we may need further discussions about these invariants.
122
120
123
121
#### Object Ref_count
124
122
125
123
The Ray keeps two type of "reference count", one is `ref_count` in plasma to track the local plasma object, and the other is the `reference_count` in core_worker to track the primary copy. In this proposal, we will not change the logic of `reference_count`, and the vineyard server will have its own `ref_count`.
126
124
127
-
**Protocol#1**: A object will increase its `ref_count` iff the `Get(...)` , `TryCreateImmediately(...)`or `CreateAndSpillIfNeeded(...)` is invoked. ([Create](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L198), [Get](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L110))
125
+
**Invariant#1**: A object will increase its `ref_count` iff the `Get(...)` , `TryCreateImmediately(...)`or `CreateAndSpillIfNeeded(...)` is invoked. ([Create](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L198), [Get](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L110))
128
126
129
-
**Protocol#2:** A object will decrease its `ref_count` iff the `release(...)` or `Disconnect(...)`is invoked. ([Release](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L277), [Disconnect](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L346))
127
+
**Invariant#2:** A object will decrease its `ref_count` iff the `release(...)` or `Disconnect(...)`is invoked. ([Release](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L277), [Disconnect](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L346))
130
128
131
129
#### Object eviction
132
130
133
131
As mentioned above, the eviction policy may be different in different store variants. Currently, the policies of Plasma and Vineyard are both based on LRU.
134
132
135
-
**Protocol#3:** Eviction only happens in an eviction request and create request. ([evict](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L451), [create](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/object_lifecycle_manager.cc#L191))
133
+
**Invariant#3:** Eviction only happens in an eviction request and create request. ([evict](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L451), [create](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/object_lifecycle_manager.cc#L191))
136
134
137
-
**Protocol#4:** Servers can only evict the objects that are not needed by clients, (a.k.a.reference count = 0). specially, pinned object should never be evcited. ([AddtoCache](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/object_lifecycle_manager.cc#L160))
135
+
**Invariant#4:** Servers can only evict the objects that are not needed by clients, (a.k.a.reference count = 0). specially, pinned object should never be evcited. ([AddtoCache](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/object_lifecycle_manager.cc#L160))
138
136
139
137
#### Object spilling
140
138
@@ -148,19 +146,19 @@ We put the `PlasmaProxy` on a thread in Raylet instead of the `PlasmaStoreRunner
148
146
149
147
The `PlasmaProxy` will share fate with raylet, if vineyard is running into issues, `PlasmaProxy` will complain that (we can use heartbeat here). I think ray can treat the `PlasmaProxy` as the third-party object store instance, it will throw exceptions just like the original plasma runner.
150
148
151
-
**Protocol#5:** The spilling only happens in a object creation.(we trigger spilling when memory usage > threshold([triggle](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L174))) ([spill](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/create_request_queue.cc#L97))
149
+
**Invariant#5:** The spilling only happens in a object creation.(we trigger spilling when memory usage > threshold([triggle](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/store.cc#L174))) ([spill](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/create_request_queue.cc#L97))
152
150
153
-
**Protocol#6:** The spilling only happens when eviction can not meet the requirement. ([create object will first trigger eviction](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/object_lifecycle_manager.cc#L191))
151
+
**Invariant#6:** The spilling only happens when eviction can not meet the requirement. ([create object will first trigger eviction](https://github.com/ray-project/ray/blob/bb4ff42eeca50a5b91dd569caafdd71bf771b66e/src/ray/object_manager/plasma/object_lifecycle_manager.cc#L191))
154
152
155
-
**Protocol#7:** The spilling only happens in primary objects.(the spill_objects_callback will handle the spillng).
153
+
**Invariant#7:** The spilling only happens in primary objects.(the spill_objects_callback will handle the spillng).
156
154
157
-
**Protocol#8:** Shouldn't unpin objects when spilling is initiated.
155
+
**Invariant#8:** Shouldn't unpin objects when spilling is initiated.
158
156
159
157
#### Object Delete
160
158
161
-
**Protocol#9:**`Delete()`a object only when (1) it exists, (2) has been sealed, (3) not used by any clients. or it will do nothing.
159
+
**Invariant#9:**`Delete()`a object only when (1) it exists, (2) has been sealed, (3) not used by any clients. or it will do nothing.
162
160
163
-
We believe that if a third-party object store client can follow the above protocol when implementation the `ClientImplInterface`. Then the ray logic can be seamlessly built on top of the third-party object store.
161
+
We believe that if a third-party object store client can follow the above invariants when implementation the `ClientImplInterface`. Then the ray logic can be seamlessly built on top of the third-party object store.
164
162
165
163
#### Dupilicate IDs in multiple raylet.
166
164
@@ -170,7 +168,7 @@ The third-party object store should not break the invariants within Ray. e.g whe
170
168
171
169
**[w/o visbility]** For other third-party object stores which do not provide such isolation control. we can add prefix to the duplicated object ids to distinguish them (this should be handled in `xxxClinetImpl` which inherits the `ClientImplInterface`).
172
170
173
-
**Protocol#10**: when the raylet is restarted, the storage should guarantee the object id from the previous raylet should be isolated to the new one.
171
+
**Invariant#10**: when the raylet is restarted, the storage should guarantee the object id from the previous raylet should be isolated to the new one.
174
172
175
173
#### Failure Mode
176
174
@@ -182,9 +180,11 @@ The third-party objects' failure mode should be consistent with plasma: which me
182
180
183
181
**[Situation 3]** When the `XXXPlasmaProxy` crashes, the third-party store will behave like the raylet crashes, it will close the connection, forbid the clients in the same session to get objects. the raylet will behave like the object store crashes, it will throw exception just like original plasma runner. This is a recoverable failure for vineyard, raylet can restart PlasmaProxy, then connect to the prevoius session.
184
182
185
-
#### Callback Overhead
183
+
#### Callback
184
+
185
+
**Invariant #11**: When an object is added the `object_added callback` should be called, `object_deleted callback` should be called when an sealed object is deleted. `object_spill_callback` should be called when spilling is triggered.
186
186
187
-
The cost of a callback depends on the underlying store implementation. The abstraction here can be compatible with different object store. For plasma, the `PlasmaProxy` will execute the callback via in-process function call, for standalone store service like vineyard, the `VineyardPlasmaProy` will execute the callback via IPC. Supporting the third-party object store will not incurs a performance hit for current coude path (plasma-version).
187
+
**[Overhead]**The cost of a callback depends on the underlying store implementation. The abstraction here can be compatible with different object store. For plasma, the `PlasmaProxy` will execute the callback via in-process function call, for standalone store service like vineyard, the `VineyardPlasmaProy` will execute the callback via IPC. Supporting the third-party object store will not incurs a performance hit for current coude path (plasma-version).
188
188
189
189
If the IPC of third-party object store really obstacles the performance of ray scheduler (e.g. obtain the memory usage of the plasma store, and today ray design adopts the direct function call is to avoid to obtain the stale memory usage). We can share meta between ray and third-party object store via SHM. For example, the third-party object store writes the memory usage to SHM after each creation/deletion, and the ray read the memory usage from SHM when required.
190
190
@@ -195,8 +195,8 @@ An important part of the proposal is to explicitly point out any compability imp
195
195
- (**Step 1**) Add a new option named `plasma_store_impl` to choose the underlying local object store.
196
196
- (**Step 1**) Add a new option named `plasma_store_impl` to not launch a plasma runner in third-party mode.
197
197
- (**Step 1**) Add a new virtual class named `ClientImplInterface` for third-party object store to implement.
198
-
- (**Step 2**) Add a new `VineyardClientImp` to implement the `ClientImplInterface` and follw the above protocols.
199
-
- (**Step 3**) Add a new `PlasmaRunnerProxy ` as a helper to execute the callback of raylet.
198
+
- (**Step 2**) Add a new `VineyardClientImp` to implement the `ClientImplInterface` and follw the above invariants.
199
+
- (**Step 3**) Add a new `PlasmaRunnerProxy ` as a helper to execute the callback of raylet, raylet can decide whether to launch `ObjectStoreRunner` or `PlasmaRunnerProxy` depending on the configuration.
200
200
- Ray API
201
201
- (**Step 4**) Handle the python/cpp/jave wrappers. provide demos, examples, and docs.
202
202
- Deprecation
@@ -206,7 +206,7 @@ An important part of the proposal is to explicitly point out any compability imp
206
206
## Test Plan and Acceptance Criteria
207
207
The proposal should discuss how the change will be tested **before** it can be merged or enabled. It should also include other acceptance criteria including documentation and examples.
208
208
209
-
- Unit and integration test for the above protocols
209
+
- Unit and integration test for the above invariants
210
210
- Documentation with representative workload, covered by CI.
0 commit comments