Skip to content

Commit 9c6bb5b

Browse files
authored
Merge pull request #844 from bbernhar/webgpu_export
MLTensor explainer: replace "import buffer" with "export tensor"
2 parents 3aa67b3 + c1d5d15 commit 9c6bb5b

File tree

1 file changed

+18
-23
lines changed

1 file changed

+18
-23
lines changed

mltensor-explainer.md

Lines changed: 18 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -203,18 +203,18 @@ A privacy-conscious user wants to perform real-time selfie segmentation of a vid
203203

204204
Currently, using WebNN for this task would require - for each frame - an expensive readback of `GPUBuffer` data to script, uploading the data to the ML context device (which may be the same GPU!), copying the result back to script, and then uploading the frame to be rendered back into a `GPUBuffer`.
205205

206-
An `MLTensor` may be imported into WebGPU, minimizing the number of buffer copies required to render the results of some ML compute. Zero-copy buffer sharing between the two APIs may be supported in some cases.
206+
An `MLTensor` may be exported into WebGPU, minimizing the number of buffer copies required to render the results of some ML compute. Zero-copy buffer sharing between the two APIs may be supported in some cases.
207207

208208
```js
209209
// Create a couple MLTensors to be shared with WebGPU.
210-
const mlTensor1 = await mlContext.createTensor({..., importableToWebGPU: true});
211-
const mlTensor2 = await mlContext.createTensor({..., importableToWebGPU: true});
210+
const mlTensor1 = await mlContext.createTensor({..., exportableToGPU: true});
211+
const mlTensor2 = await mlContext.createTensor({..., exportableToGPU: true});
212212

213213
const applyEffectToFrame = async () => {
214214
const gpuVideoTexture = gpuDevice.importExternalTexture({source: video});
215215

216216
// Wait for all ML work involving `mlTensor1` to complete, then rent it out to WebGPU.
217-
const tensorizedGpuBuffer = await gpuDevice.importExternalBuffer(mlTensor1);
217+
const tensorizedGpuBuffer = await mlContext.exportToGPU(mlTensor1);
218218

219219
// Create a bind group for `gpuVideoTexture`, create a command encoder, etc.
220220
// to "tensorize" `gpuVideoTexture` and store the result in `tensorizedGpuBuffer`
@@ -234,7 +234,7 @@ const applyEffectToFrame = async () => {
234234
);
235235

236236
// Wait for all ML work involving `mlTensor2` to complete, then rent it out to WebGPU.
237-
const tensorizedGpuBufferAfterInference = await gpuDevice.importExternalBuffer(mlTensor2);
237+
const tensorizedGpuBufferAfterInference = await mlContext.exportToGPU(mlTensor2);
238238

239239
// Create a bind group for `tensorizedGpuBufferAfterInference`,
240240
// create a command encoder, etc to feed `tensorizedGpuBufferAfterInference`
@@ -264,25 +264,25 @@ Specifying WebNN timelines is tracked in [#529](https://github.com/webmachinelea
264264

265265
The WebNN API requires the developer to declare how an `MLTensor` will be used (via `MLTensorDescriptor`), which the user agent may use as a hint in deciding where to allocate the memory backing an `MLTensor`. Where the memory is ultimately allocated is up to the user agent.
266266

267-
For example [an `MLContext` may be created with a `GPUDevice`](https://www.w3.org/TR/webnn/#dom-ml-createcontext-gpudevice), and creating an `MLTensor` from this context with the `MLTensorDescriptor.importableToWebGPU` flag expresses a clear intention to share the tensor with the given `GPUDevice`. However, there is no guarantee that sharing this tensor with WebGPU will be zero-copy.
267+
For example [an `MLContext` may be created with a `GPUDevice`](https://www.w3.org/TR/webnn/#dom-ml-createcontext-gpudevice), and creating an `MLTensor` from this context with the `MLTensorDescriptor.exportableToGPU` flag expresses a clear intention to share the tensor with the given `GPUDevice`. However, there is no guarantee that sharing this tensor with WebGPU will be zero-copy.
268268

269269
The `MLTensorDescriptor.readable` and `MLTensorDescriptor.writable` flags likewise are hints to the user agent indicating that the underlying data will be read and written to, respectively, by script.
270270

271-
### Importing an `MLTensor` to WebGPU
271+
### Exporting an `MLTensor` to WebGPU
272272

273-
An `MLTensor` created with the `MLTensorDescriptor.importableToWebGPU` flag may be imported as a `GPUBuffer` to a `GPUDevice`. In the best case, this requires no data copies. If the underlying buffer backing the `MLTensor` is not accessible to the `GPUDevice`, this will require copying the contents of the `MLTensor` to a new buffer, then copying the contents of this buffer back to the `MLTensor` once WebGPU releases its handle to the buffer.
273+
An `MLTensor` created with the `MLTensorDescriptor.exportableToGPU` flag may be export as a `GPUBuffer` to a `GPUDevice`. In the best case, this requires no data copies. If the underlying buffer backing the `MLTensor` is not accessible to the `GPUDevice`, this will require copying the contents of the `MLTensor` to a new buffer, then copying the contents of this buffer back to the `MLTensor` once WebGPU releases its handle to the buffer.
274274

275-
While an `MLTensor` is rented to a `GPUDevice`, the `GPUDevice` has exclusive, read/write access to the imported buffer, which is created as a `GPUBuffer` with `GPUBufferUsageFlags.STORAGE`, `GPUBufferUsageFlags.COPY_SRC`, and `GPUBufferUsageFlags.COPY_DST`. All WebNN work depending - directly or indirectly - on the imported `MLTensor` is blocked until the `GPUDevice` returns the tensor.
275+
While an `MLTensor` is rented to a `GPUDevice`, the `GPUDevice` has exclusive, read/write access to the exported tensor, which is created as a `GPUBuffer` with `GPUBufferUsageFlags.STORAGE`, `GPUBufferUsageFlags.COPY_SRC`, and `GPUBufferUsageFlags.COPY_DST`. All WebNN work depending - directly or indirectly - on the exported `MLTensor` is blocked until the `GPUDevice` returns the tensor.
276276

277-
The `GPUBuffer` can be accessed as an `array<T>` in WGSL - a 1D packed array of type `T` in GPU memory. The size of the array is determined by the number of bytes of the packed `MLTensor` and `T`. For example, an `MLTensor` with `{dataType: 'int8', shape: [2, 3, 4]}` may be imported as an `array<u32>` of length 6.
277+
The `GPUBuffer` can be accessed as an `array<T>` in WGSL - a 1D packed array of type `T` in GPU memory. The size of the array is determined by the number of bytes of the packed `MLTensor` and `T`. For example, an `MLTensor` with `{dataType: 'int8', shape: [2, 3, 4]}` may be exported as an `array<u32>` of length 6.
278278

279279
```
280-
// An example of how to declare the imported MLTensor as
280+
// An example of how to declare the exported MLTensor as
281281
// a GPUBuffer in a WGSL shader.
282282
@group(0) @binding(0) var<storage, read_write> tensor: array<f32>;
283283
```
284284

285-
Importing and returning the `MLTensor` are each points of synchronization between the respective WebNN and WebGPU [timelines](https://www.w3.org/TR/webgpu/#programming-model-timelines). The `importExternalBuffer()` method is asynchronous to allow the user agent to await completion of WebNN operations before posting WebGPU commands with the imported buffer. This is to avoid making WebGPU workloads - which may involve compositing - explicitly dependent on WebNN operations, which may be inefficient (e.g. if ML compute is not expressed in terms of GPU commands) or impossible (e.g. [some platforms don't support enqueuing GPU work that waits on a fence to be later signaled by the CPU](https://github.com/webmachinelearning/webnn/pull/754#discussion_r1740841364)) on some platforms.
285+
Exporting and returning the `MLTensor` are each points of synchronization between the respective WebNN and WebGPU [timelines](https://www.w3.org/TR/webgpu/#programming-model-timelines). The `exportToGPU()` method is asynchronous to allow the user agent to await completion of WebNN operations before posting WebGPU commands with the exported tensor. This is to avoid making WebGPU workloads - which may involve compositing - explicitly dependent on WebNN operations, which may be inefficient (e.g. if ML compute is not expressed in terms of GPU commands) or impossible (e.g. [some platforms don't support enqueuing GPU work that waits on a fence to be later signaled by the CPU](https://github.com/webmachinelearning/webnn/pull/754#discussion_r1740841364)) on some platforms.
286286

287287
### `compute()` vs. `dispatch()`
288288

@@ -296,8 +296,8 @@ It's possible `compute()` may have a performance advantage on some platforms for
296296
- *Update: [#778](https://github.com/webmachinelearning/webnn/issues/778) is a proposal for reporting non-fatal errors from the WebNN timeline*
297297
- Does the user agent have enough information to appropriately allocate an `MLTensor` if an `MLDeviceType` or `GPUDevice` is not used to create an `MLContext`? See [#350](https://github.com/webmachinelearning/webnn/issues/350) and [#749](https://github.com/webmachinelearning/webnn/issues/749)
298298
- Should the `dispatch()` method be a part of the `MLGraph` interface rather than `MLContext`? Should `readTensor()` and `writeTensor()` exist on an `MLTensor`? See [#697](https://github.com/webmachinelearning/webnn/issues/697).
299-
- Is a sync variant of the `importExternalBuffer()` method feasible (1) on platforms where completion of ML compute can be signaled on a GPU timeline, or (2) when blocking WebGPU workloads which do not themselves block compositing.
300-
- The requirement that an imported `GPUBuffer` may be represented as an `array<T>` in WGSL is very restrictive. Could we instead create a `GPUImportedTensor` type which abstracts away the layout of the underlying tensor?
299+
- Is a sync variant of the `exportToGPU()` method feasible (1) on platforms where completion of ML compute can be signaled on a GPU timeline, or (2) when blocking WebGPU workloads which do not themselves block compositing.
300+
- The requirement that an exported `GPUBuffer` may be represented as an `array<T>` in WGSL is very restrictive. Could we instead create a `GPUExportedTensor` type which abstracts away the layout of the underlying tensor?
301301

302302
## Considered Alternatives
303303

@@ -382,7 +382,7 @@ Many thanks for valuable feedback and advice from:
382382
dictionary MLTensorDescriptor : MLOperandDescriptor {
383383
boolean readable = false;
384384
boolean writable = false;
385-
boolean importableToWebGPU = false;
385+
boolean exportableToGPU = false;
386386
};
387387

388388
typedef record<DOMString, MLTensor> MLNamedTensors;
@@ -392,7 +392,7 @@ interface MLTensor {
392392
readonly attribute FrozenArray<unsigned long> shape;
393393
readonly attribute boolean readable;
394394
readonly attribute boolean writable;
395-
readonly attribute boolean importableToWebGPU;
395+
readonly attribute boolean exportableToGPU;
396396

397397
void destroy();
398398
};
@@ -412,13 +412,8 @@ partial interface MLContext {
412412

413413
// For WebGPU Interop
414414

415-
dictionary GPUImportedTensorDescriptor
416-
: GPUObjectDescriptorBase {
417-
required MLTensor source;
418-
};
419-
420-
partial interface GPUDevice {
421-
Promise<GPUBuffer> importExternalBuffer(GPUImportedTensorDescriptor descriptor);
415+
partial interface MLContext {
416+
Promise<GPUBuffer> exportToGPU(MLTensor source);
422417
}
423418

424419
partial interface ML {

0 commit comments

Comments
 (0)