You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A separate `saveGraph` API might introduce overhead on some native ML frameworks, such as ONNX Runtime, because its implementation may need to hold the source model in the memory and recompile the source model when user code calls `saveGraph`.
91
+
A separate `saveGraph()` API might introduce overhead on some native ML frameworks, such as ONNX Runtime, because its implementation may need to hold the source model in the memory and recompile the source model when user code calls `saveGraph()`.
92
92
93
-
An alternative consideration is to have a `buildAndSave` method. The implementation can just compile the graph once and drop the source model after the compilation.
93
+
An alternative consideration is to have a `buildAndSave()` method. The implementation can just compile the graph once and drop the source model after the compilation.
However, a compliant implementation of `build()` could save the compiled model into a temporary file which is deleted unless `saveGraph()` is called later, rendering an explicit `buildAndSave()` unnecessary.
102
+
101
103
### Explicit vs implicit API
102
104
103
-
>GPU shader caching is implicit, however the difference is that a shader program is a small input and so it's easy for the site to regenerate the shader so the browser can hash it to compare with the cache. ML models on the other hand are large because of the weights. Loading all the weights just to discover that a cached version of the model is available would be a waste of time and resources. (via [comment](https://github.com/webmachinelearning/webnn/issues/807#issuecomment-2608135598))
105
+
GPU shader caching is implicit, however the difference is that a shader program is a small input and so it's easy for the site to regenerate the shader so the browser can hash it to compare with the cache. ML models on the other hand are large because of the weights. Loading all the weights just to discover that a cached version of the model is available would be a waste of time and resources. (via [comment](https://github.com/webmachinelearning/webnn/issues/807#issuecomment-2608135598))
104
106
105
107
Furthermore, an ML model can't be compiled without the weights because the implementation may perform device-specific constant folding and memory layout optimizations.
0 commit comments