|
| 1 | +<h1><a name="ml_example">World ml-example</a></h1> |
| 2 | +<p><code>wasi-nn</code> is a WASI API for performing machine learning (ML) inference. The API is not (yet) |
| 3 | +capable of performing ML training. WebAssembly programs that want to use a host's ML |
| 4 | +capabilities can access these capabilities through <code>wasi-nn</code>'s core abstractions: <em>graphs</em> and |
| 5 | +<em>tensors</em>. A user <a href="#load"><code>load</code></a>s a model -- instantiated as a <em>graph</em> -- to use in an ML <em>backend</em>. |
| 6 | +Then, the user passes <em>tensor</em> inputs to the <em>graph</em>, computes the inference, and retrieves the |
| 7 | +<em>tensor</em> outputs.</p> |
| 8 | +<p>This example world shows how to use these primitives together.</p> |
| 9 | +<ul> |
| 10 | +<li>Imports: |
| 11 | +<ul> |
| 12 | +<li>interface <a href="#wasi:nn_tensor"><code>wasi:nn/tensor</code></a></li> |
| 13 | +<li>interface <a href="#wasi:nn_errors"><code>wasi:nn/errors</code></a></li> |
| 14 | +<li>interface <a href="#wasi:nn_graph"><code>wasi:nn/graph</code></a></li> |
| 15 | +<li>interface <a href="#wasi:nn_inference"><code>wasi:nn/inference</code></a></li> |
| 16 | +</ul> |
| 17 | +</li> |
| 18 | +</ul> |
| 19 | +<h2><a name="wasi:nn_tensor">Import interface wasi:nn/tensor</a></h2> |
| 20 | +<p>All inputs and outputs to an ML inference are represented as <a href="#tensor"><code>tensor</code></a>s.</p> |
| 21 | +<hr /> |
| 22 | +<h3>Types</h3> |
| 23 | +<h4><a name="tensor_type"><code>enum tensor-type</code></a></h4> |
| 24 | +<p>The type of the elements in a tensor.</p> |
| 25 | +<h5>Enum Cases</h5> |
| 26 | +<ul> |
| 27 | +<li><a name="tensor_type.fp16"><code>fp16</code></a></li> |
| 28 | +<li><a name="tensor_type.fp32"><code>fp32</code></a></li> |
| 29 | +<li><a name="tensor_type.bf16"><code>bf16</code></a></li> |
| 30 | +<li><a name="tensor_type.up8"><code>up8</code></a></li> |
| 31 | +<li><a name="tensor_type.ip32"><code>ip32</code></a></li> |
| 32 | +</ul> |
| 33 | +<h4><a name="tensor_dimensions"><code>type tensor-dimensions</code></a></h4> |
| 34 | +<p><a href="#tensor_dimensions"><a href="#tensor_dimensions"><code>tensor-dimensions</code></a></a></p> |
| 35 | +<p>The dimensions of a tensor. |
| 36 | +<p>The array length matches the tensor rank and each element in the array describes the size of |
| 37 | +each dimension</p> |
| 38 | +<h4><a name="tensor_data"><code>type tensor-data</code></a></h4> |
| 39 | +<p><a href="#tensor_data"><a href="#tensor_data"><code>tensor-data</code></a></a></p> |
| 40 | +<p>The tensor data. |
| 41 | +<p>Initially conceived as a sparse representation, each empty cell would be filled with zeros |
| 42 | +and the array length must match the product of all of the dimensions and the number of bytes |
| 43 | +in the type (e.g., a 2x2 tensor with 4-byte f32 elements would have a data array of length |
| 44 | +16). Naturally, this representation requires some knowledge of how to lay out data in |
| 45 | +memory--e.g., using row-major ordering--and could perhaps be improved.</p> |
| 46 | +<h4><a name="tensor"><code>record tensor</code></a></h4> |
| 47 | +<h5>Record Fields</h5> |
| 48 | +<ul> |
| 49 | +<li><a name="tensor.dimensions"><code>dimensions</code></a>: <a href="#tensor_dimensions"><a href="#tensor_dimensions"><code>tensor-dimensions</code></a></a></li> |
| 50 | +<li><a name="tensor.tensor_type"><a href="#tensor_type"><code>tensor-type</code></a></a>: <a href="#tensor_type"><a href="#tensor_type"><code>tensor-type</code></a></a></li> |
| 51 | +<li><a name="tensor.data"><code>data</code></a>: <a href="#tensor_data"><a href="#tensor_data"><code>tensor-data</code></a></a></li> |
| 52 | +</ul> |
| 53 | +<h2><a name="wasi:nn_errors">Import interface wasi:nn/errors</a></h2> |
| 54 | +<p>TODO: create function-specific errors (https://github.com/WebAssembly/wasi-nn/issues/42)</p> |
| 55 | +<hr /> |
| 56 | +<h3>Types</h3> |
| 57 | +<h4><a name="error"><code>enum error</code></a></h4> |
| 58 | +<h5>Enum Cases</h5> |
| 59 | +<ul> |
| 60 | +<li><a name="error.invalid_argument"><code>invalid-argument</code></a></li> |
| 61 | +<li><a name="error.invalid_encoding"><code>invalid-encoding</code></a></li> |
| 62 | +<li><a name="error.busy"><code>busy</code></a></li> |
| 63 | +<li><a name="error.runtime_error"><code>runtime-error</code></a></li> |
| 64 | +<li><a name="error.unsupported_operation"><code>unsupported-operation</code></a></li> |
| 65 | +<li><a name="error.model_too_large"><code>model-too-large</code></a></li> |
| 66 | +<li><a name="error.model_not_found"><code>model-not-found</code></a></li> |
| 67 | +</ul> |
| 68 | +<h2><a name="wasi:nn_graph">Import interface wasi:nn/graph</a></h2> |
| 69 | +<p>A <a href="#graph"><code>graph</code></a> is a loaded instance of a specific ML model (e.g., MobileNet) for a specific ML |
| 70 | +framework (e.g., TensorFlow):</p> |
| 71 | +<hr /> |
| 72 | +<h3>Types</h3> |
| 73 | +<h4><a name="error"><code>type error</code></a></h4> |
| 74 | +<p><a href="#error"><a href="#error"><code>error</code></a></a></p> |
| 75 | +<p> |
| 76 | +#### <a name="tensor">`type tensor`</a> |
| 77 | +[`tensor`](#tensor) |
| 78 | +<p> |
| 79 | +#### <a name="graph_encoding">`enum graph-encoding`</a> |
| 80 | +<p>Describes the encoding of the graph. This allows the API to be implemented by various |
| 81 | +backends that encode (i.e., serialize) their graph IR with different formats.</p> |
| 82 | +<h5>Enum Cases</h5> |
| 83 | +<ul> |
| 84 | +<li><a name="graph_encoding.openvino"><code>openvino</code></a></li> |
| 85 | +<li><a name="graph_encoding.onnx"><code>onnx</code></a></li> |
| 86 | +<li><a name="graph_encoding.tensorflow"><code>tensorflow</code></a></li> |
| 87 | +<li><a name="graph_encoding.pytorch"><code>pytorch</code></a></li> |
| 88 | +<li><a name="graph_encoding.tensorflowlite"><code>tensorflowlite</code></a></li> |
| 89 | +<li><a name="graph_encoding.autodetect"><code>autodetect</code></a></li> |
| 90 | +</ul> |
| 91 | +<h4><a name="graph_builder"><code>type graph-builder</code></a></h4> |
| 92 | +<p><a href="#graph_builder"><a href="#graph_builder"><code>graph-builder</code></a></a></p> |
| 93 | +<p>The graph initialization data. |
| 94 | +<p>This gets bundled up into an array of buffers because implementing backends may encode their |
| 95 | +graph IR in parts (e.g., OpenVINO stores its IR and weights separately).</p> |
| 96 | +<h4><a name="graph"><code>type graph</code></a></h4> |
| 97 | +<p><code>u32</code></p> |
| 98 | +<p>An execution graph for performing inference (i.e., a model). |
| 99 | +<p>TODO: replace with <code>resource</code> (https://github.com/WebAssembly/wasi-nn/issues/47).</p> |
| 100 | +<h4><a name="execution_target"><code>enum execution-target</code></a></h4> |
| 101 | +<p>Define where the graph should be executed.</p> |
| 102 | +<h5>Enum Cases</h5> |
| 103 | +<ul> |
| 104 | +<li><a name="execution_target.cpu"><code>cpu</code></a></li> |
| 105 | +<li><a name="execution_target.gpu"><code>gpu</code></a></li> |
| 106 | +<li><a name="execution_target.tpu"><code>tpu</code></a></li> |
| 107 | +</ul> |
| 108 | +<hr /> |
| 109 | +<h3>Functions</h3> |
| 110 | +<h4><a name="load"><code>load: func</code></a></h4> |
| 111 | +<p>Load a <a href="#graph"><code>graph</code></a> from an opaque sequence of bytes to use for inference.</p> |
| 112 | +<h5>Params</h5> |
| 113 | +<ul> |
| 114 | +<li><a name="load.builder"><code>builder</code></a>: list<<a href="#graph_builder"><a href="#graph_builder"><code>graph-builder</code></a></a>></li> |
| 115 | +<li><a name="load.encoding"><code>encoding</code></a>: <a href="#graph_encoding"><a href="#graph_encoding"><code>graph-encoding</code></a></a></li> |
| 116 | +<li><a name="load.target"><code>target</code></a>: <a href="#execution_target"><a href="#execution_target"><code>execution-target</code></a></a></li> |
| 117 | +</ul> |
| 118 | +<h5>Return values</h5> |
| 119 | +<ul> |
| 120 | +<li><a name="load.0"></a> result<<a href="#graph"><a href="#graph"><code>graph</code></a></a>, <a href="#error"><a href="#error"><code>error</code></a></a>></li> |
| 121 | +</ul> |
| 122 | +<h4><a name="load_named_model"><code>load-named-model: func</code></a></h4> |
| 123 | +<p>Load a <a href="#graph"><code>graph</code></a> by name.</p> |
| 124 | +<p>How the host expects the names to be passed and how it stores the graphs for retrieval via |
| 125 | +this function is <strong>implementation-specific</strong>. This allows hosts to choose name schemes that |
| 126 | +range from simple to complex (e.g., URLs?) and caching mechanisms of various kinds.</p> |
| 127 | +<h5>Params</h5> |
| 128 | +<ul> |
| 129 | +<li><a name="load_named_model.name"><code>name</code></a>: <code>string</code></li> |
| 130 | +</ul> |
| 131 | +<h5>Return values</h5> |
| 132 | +<ul> |
| 133 | +<li><a name="load_named_model.0"></a> result<<a href="#graph"><a href="#graph"><code>graph</code></a></a>, <a href="#error"><a href="#error"><code>error</code></a></a>></li> |
| 134 | +</ul> |
| 135 | +<h2><a name="wasi:nn_inference">Import interface wasi:nn/inference</a></h2> |
| 136 | +<p>An inference "session" is encapsulated by a <a href="#graph_execution_context"><code>graph-execution-context</code></a>. This structure binds a |
| 137 | +<a href="#graph"><code>graph</code></a> to input tensors before <a href="#compute"><code>compute</code></a>-ing an inference:</p> |
| 138 | +<hr /> |
| 139 | +<h3>Types</h3> |
| 140 | +<h4><a name="error"><code>type error</code></a></h4> |
| 141 | +<p><a href="#error"><a href="#error"><code>error</code></a></a></p> |
| 142 | +<p> |
| 143 | +#### <a name="tensor">`type tensor`</a> |
| 144 | +[`tensor`](#tensor) |
| 145 | +<p> |
| 146 | +#### <a name="tensor_data">`type tensor-data`</a> |
| 147 | +[`tensor-data`](#tensor_data) |
| 148 | +<p> |
| 149 | +#### <a name="graph">`type graph`</a> |
| 150 | +[`graph`](#graph) |
| 151 | +<p> |
| 152 | +#### <a name="graph_execution_context">`type graph-execution-context`</a> |
| 153 | +`u32` |
| 154 | +<p>Bind a `graph` to the input and output tensors for an inference. |
| 155 | +<p>TODO: this is no longer necessary in WIT (https://github.com/WebAssembly/wasi-nn/issues/43)</p> |
| 156 | +<hr /> |
| 157 | +<h3>Functions</h3> |
| 158 | +<h4><a name="init_execution_context"><code>init-execution-context: func</code></a></h4> |
| 159 | +<p>Create an execution instance of a loaded graph.</p> |
| 160 | +<h5>Params</h5> |
| 161 | +<ul> |
| 162 | +<li><a name="init_execution_context.graph"><a href="#graph"><code>graph</code></a></a>: <a href="#graph"><a href="#graph"><code>graph</code></a></a></li> |
| 163 | +</ul> |
| 164 | +<h5>Return values</h5> |
| 165 | +<ul> |
| 166 | +<li><a name="init_execution_context.0"></a> result<<a href="#graph_execution_context"><a href="#graph_execution_context"><code>graph-execution-context</code></a></a>, <a href="#error"><a href="#error"><code>error</code></a></a>></li> |
| 167 | +</ul> |
| 168 | +<h4><a name="set_input"><code>set-input: func</code></a></h4> |
| 169 | +<p>Define the inputs to use for inference.</p> |
| 170 | +<h5>Params</h5> |
| 171 | +<ul> |
| 172 | +<li><a name="set_input.ctx"><code>ctx</code></a>: <a href="#graph_execution_context"><a href="#graph_execution_context"><code>graph-execution-context</code></a></a></li> |
| 173 | +<li><a name="set_input.index"><code>index</code></a>: <code>u32</code></li> |
| 174 | +<li><a name="set_input.tensor"><a href="#tensor"><code>tensor</code></a></a>: <a href="#tensor"><a href="#tensor"><code>tensor</code></a></a></li> |
| 175 | +</ul> |
| 176 | +<h5>Return values</h5> |
| 177 | +<ul> |
| 178 | +<li><a name="set_input.0"></a> result<_, <a href="#error"><a href="#error"><code>error</code></a></a>></li> |
| 179 | +</ul> |
| 180 | +<h4><a name="compute"><code>compute: func</code></a></h4> |
| 181 | +<p>Compute the inference on the given inputs.</p> |
| 182 | +<p>Note the expected sequence of calls: <a href="#set_input"><code>set-input</code></a>, <a href="#compute"><code>compute</code></a>, <a href="#get_output"><code>get-output</code></a>. TODO: this |
| 183 | +expectation could be removed as a part of https://github.com/WebAssembly/wasi-nn/issues/43.</p> |
| 184 | +<h5>Params</h5> |
| 185 | +<ul> |
| 186 | +<li><a name="compute.ctx"><code>ctx</code></a>: <a href="#graph_execution_context"><a href="#graph_execution_context"><code>graph-execution-context</code></a></a></li> |
| 187 | +</ul> |
| 188 | +<h5>Return values</h5> |
| 189 | +<ul> |
| 190 | +<li><a name="compute.0"></a> result<_, <a href="#error"><a href="#error"><code>error</code></a></a>></li> |
| 191 | +</ul> |
| 192 | +<h4><a name="get_output"><code>get-output: func</code></a></h4> |
| 193 | +<p>Extract the outputs after inference.</p> |
| 194 | +<h5>Params</h5> |
| 195 | +<ul> |
| 196 | +<li><a name="get_output.ctx"><code>ctx</code></a>: <a href="#graph_execution_context"><a href="#graph_execution_context"><code>graph-execution-context</code></a></a></li> |
| 197 | +<li><a name="get_output.index"><code>index</code></a>: <code>u32</code></li> |
| 198 | +</ul> |
| 199 | +<h5>Return values</h5> |
| 200 | +<ul> |
| 201 | +<li><a name="get_output.0"></a> result<<a href="#tensor_data"><a href="#tensor_data"><code>tensor-data</code></a></a>, <a href="#error"><a href="#error"><code>error</code></a></a>></li> |
| 202 | +</ul> |
0 commit comments