-
Notifications
You must be signed in to change notification settings - Fork 60
Refactor device selection: Rename to computePolicy, remove accelerated, and add fallback #923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -980,15 +980,15 @@ WorkerNavigator includes NavigatorML; | |
|
|
||
| ## {{ML}} interface ## {#api-ml} | ||
| <script type=idl> | ||
| enum MLPowerPreference { | ||
| enum MLComputePolicy { | ||
| "default", | ||
| "high-performance", | ||
| "low-power" | ||
| "low-power", | ||
| "fallback" | ||
| }; | ||
|
|
||
| dictionary MLContextOptions { | ||
| MLPowerPreference powerPreference = "default"; | ||
| boolean accelerated = true; | ||
| MLComputePolicy computePolicy = "default"; | ||
| }; | ||
|
|
||
| [SecureContext, Exposed=(Window, Worker)] | ||
|
|
@@ -1002,18 +1002,18 @@ interface ML { | |
|
|
||
| Note: {{MLContextOptions}} is under active development, and the design is expected to change, informed by further implementation experience and new use cases from the wider web community. The Working Group is considering additional API controls to allow the definition of a fallback device, multiple devices in a preferred order, or an exclusion of a specific device. Other considerations under discussion include error handling, ultimate fallback, and quantized operators. Feedback is welcome on any of these design considerations from web developers, library authors, OS and hardware vendors, and other stakeholders via <a href="https://github.com/webmachinelearning/webnn/labels/device%20selection">GitHub</a>. See [[#privacy]] for additional discussion of fingerprinting considerations. | ||
|
|
||
| The <dfn dfn-for=MLContextOptions dfn-type=dict-member>powerPreference</dfn> option is an <dfn dfn-type=enum>MLPowerPreference</dfn> and indicates the application's preference as related to power consumption. It is one of the following: | ||
| <dl dfn-for="MLPowerPreference"> | ||
| The <dfn dfn-for=MLContextOptions dfn-type=dict-member>computePolicy</dfn> option is an <dfn dfn-type=enum>MLComputePolicy</dfn> and indicates the application's policy for the compute device. The policy is designed to be extensible. While the current policy values mainly cover power consumption and performance, future extensions may introduce additional execution polices. It is one of the following: | ||
| <dl dfn-for="MLComputePolicy"> | ||
| <dt>"<dfn enum-value>default</dfn>"</dt> | ||
| <dd>Let the user agent select the most suitable behavior.</dd> | ||
| <dt>"<dfn enum-value>high-performance</dfn>"</dt> | ||
| <dd>Prioritizes execution speed over power consumption.</dd> | ||
| <dd>Prioritizes execution speed over other considerations such as power consumption.</dd> | ||
| <dt>"<dfn enum-value>low-power</dfn>"</dt> | ||
| <dd>Prioritizes power consumption over other considerations such as execution speed.</dd> | ||
| <dt>"<dfn enum-value>fallback</dfn>"</dt> | ||
| <dd>Prioritizes maximum compatibility over other considerations, typically running on a CPU. This is useful for testing a model's numeric behavior without utilizing parallel accelerators like GPUs or NPUs.</dd> | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this wording is too implementation specific:
and something more general similar to the WebGPU wording where either "compatibility, more predictable behavior, or improved privacy" can be prioritized would be more consistent with existing web specification. |
||
| </dl> | ||
|
|
||
| The <dfn dfn-for=MLContextOptions dfn-type=dict-member>accelerated</dfn> option indicates the application's preference as related to massively parallel acceleration. This option has less priority than {{MLContextOptions/powerPreference}}. When set to `true` (by default), the underlying platform will attempt to use the available massively parallel accelerators, such as a GPU or NPU, also depending on the {{MLContextOptions/powerPreference}}. When set to `false`, the application indicates it prefers CPU inference. If there is contradictory input, for instance when {{MLContextOptions/powerPreference}} is {{MLPowerPreference/"high-performance"}} and {{MLContextOptions/accelerated}} is `false`, then the implementation will choose the best available match in the underlying platform (for instance a high performance CPU mode, or will ignore {{MLContextOptions/accelerated}} as it has less priority than {{MLContextOptions/powerPreference}}). | ||
fdwr marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### {{ML/createContext()}} ### {#api-ml-createcontext} | ||
|
|
||
| <div dfn-for="ML/createContext(options), ML/createContext(gpuDevice)" dfn-type=argument> | ||
|
|
@@ -1030,15 +1030,12 @@ The <dfn dfn-for=MLContextOptions dfn-type=dict-member>accelerated</dfn> option | |
| 1. Let |context| be a new {{MLContext}} in |realm|. | ||
| 1. If |options| is a {{GPUDevice}} object, then: | ||
| 1. Set |context|.{{MLContext/[[contextType]]}} to "[=context type/webgpu=]". | ||
| 1. Set |context|.{{MLContext/[[powerPreference]]}} to {{MLPowerPreference/"default"}}. | ||
| 1. Set |context|.{{MLContext/[[accelerated]]}} to `true`. | ||
| 1. Set |context|.{{MLContext/[[computePolicy]]}} to {{MLComputePolicy/"default"}}. | ||
| 1. Otherwise: | ||
| 1. Set |context|.{{MLContext/[[contextType]]}} to "[=context type/default=]". | ||
| 1. Set |context|.{{MLContext/[[lost]]}} to [=a new promise=] in |realm|. | ||
| 1. If |options|["{{MLContextOptions/powerPreference}}"] [=map/exists=], then set |context|.{{MLContext/[[powerPreference]]}} to |options|["{{MLContextOptions/powerPreference}}"]. | ||
| 1. Otherwise, set |context|.{{MLContext/[[powerPreference]]}} to {{MLPowerPreference/"default"}}. | ||
| 1. If |options|["{{MLContextOptions/accelerated}}"] [=map/exists=], then set |context|.{{MLContext/[[accelerated]]}} to |options|["{{MLContextOptions/accelerated}}"]. | ||
| 1. Otherwise, set |context|.{{MLContext/[[accelerated]]}} to `true`. | ||
| 1. If |options|["{{MLContextOptions/computePolicy}}"] [=map/exists=], then set |context|.{{MLContext/[[computePolicy]]}} to |options|["{{MLContextOptions/computePolicy}}"]. | ||
| 1. Otherwise, set |context|.{{MLContext/[[computePolicy]]}} to {{MLComputePolicy/"default"}}. | ||
| 1. If the user agent cannot support |context|.{{MLContext/[[contextType]]}}, then return failure. | ||
| 1. Return |context|. | ||
| </details> | ||
|
|
@@ -1072,7 +1069,7 @@ The <dfn dfn-for=MLContextOptions dfn-type=dict-member>accelerated</dfn> option | |
| </details> | ||
|
|
||
| ## {{MLContext}} interface ## {#api-mlcontext} | ||
| The {{MLContext}} interface represents a global state of neural network compute workload and execution processes. Each {{MLContext}} object has associated [=context type=] and {{MLPowerPreference}}. | ||
| The {{MLContext}} interface represents a global state of neural network compute workload and execution processes. Each {{MLContext}} object has associated [=context type=] and {{MLComputePolicy}}. | ||
|
|
||
| <script type=idl> | ||
| typedef record<USVString, MLTensor> MLNamedTensors; | ||
|
|
@@ -1098,7 +1095,7 @@ interface MLContext { | |
|
|
||
| undefined destroy(); | ||
|
|
||
| readonly attribute boolean accelerated; | ||
| readonly attribute MLComputePolicy computePolicy; | ||
| readonly attribute Promise<MLContextLostInfo> lost; | ||
| }; | ||
| </script> | ||
|
|
@@ -1109,12 +1106,9 @@ interface MLContext { | |
| : <dfn>\[[contextType]]</dfn> of type [=context type=]. | ||
| :: | ||
| The {{MLContext}}'s [=context type=]. | ||
| : <dfn>\[[powerPreference]]</dfn> of type {{MLPowerPreference}}. | ||
| :: | ||
| The {{MLContext}}'s {{MLPowerPreference}}. | ||
| : <dfn>\[[accelerated]]</dfn> of type {{boolean}}. | ||
| : <dfn>\[[computePolicy]]</dfn> of type {{MLComputePolicy}}. | ||
| :: | ||
| The {{MLContext}}'s processing type (CPU or massively parallel processing). | ||
| The {{MLContext}}'s {{MLComputePolicy}}. | ||
| : <dfn>\[[lost]]</dfn> of type {{Promise}}<{{MLContextLostInfo}}>. | ||
| :: | ||
| A {{Promise}} that is resolved when the {{MLContext}}'s underlying execution device is no longer available. | ||
|
|
@@ -1135,7 +1129,7 @@ The <dfn>context type</dfn> is the type of the execution context that manages th | |
| </dl> | ||
|
|
||
| <div algorithm> | ||
| The <dfn attribute for=MLContext>accelerated</dfn> getter steps are to return [=this=].{{MLContext/[[accelerated]]}}. | ||
| The <dfn attribute for=MLContext>computePolicy</dfn> getter steps are to return [=this=].{{MLContext/[[computePolicy]]}}. | ||
| </div> | ||
|
|
||
| <details open algorithm> | ||
|
|
@@ -2199,7 +2193,7 @@ Build a composed graph up to a given output operand into a computational graph a | |
| 1. Let |promise| be [=a new promise=] in |realm|. | ||
| 1. Enqueue the following steps to |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[timeline]]}}: | ||
| 1. Run these steps, but [=/abort when=] |graph|.{{MLGraph/[[context]]}} [=MLContext/is lost=]: | ||
| 1. Let |graphImpl| be the result of converting [=this=]'s [=MLGraphBuilder/graph=] with |operands|, |operators|, |inputs|, and |outputs|'s [=map/values=], as well as |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[powerPreference]]}} and |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[accelerated]]}} into an [=implementation-defined=] format which can be interpreted by the underlying platform. | ||
| 1. Let |graphImpl| be the result of converting [=this=]'s [=MLGraphBuilder/graph=] with |operands|, |operators|, |inputs|, and |outputs|'s [=map/values=], as well as |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[computePolicy]]}} into an [=implementation-defined=] format which can be interpreted by the underlying platform. | ||
| 1. If the previous step failed, then [=queue an ML task=] with |global| to [=reject=] |promise| with an "{{OperationError}}" {{DOMException}}, and abort these steps. | ||
| 1. Set |graph|.{{MLGraph/[[implementation]]}} to |graphImpl|. | ||
| 1. [=Queue an ML task=] with |global| to [=resolve=] |promise| with |graph|. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This policy is useful for audio processing that doesn't want to depend on accelerators due to latency reasons. Can the text be updated to mention this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just naming) Hmm, reading the description of this enum ("Prioritizes maximum compatibility...", "useful for testing a model's numeric behavior..."), some more immediately enlightening names to readers would be something like "compatible"/"compatibility"/"stable"/"precision". I would have not guessed that description from the word "fallback", which implies you fell back to a less capable device than the one you really wanted, when actually CPU may have been exactly what you wanted (that is, it was the primary preference, not a fallback). e.g.
Totally, as I recall some teams in the past complaining about GPU overhead for background audio filtering in chat apps, preferring to keep compute more local on the CPU. Perhaps as a separate PR, we could add an explicit "low-latency" option too, which would be even clearer in intent for that scenario.
Also "precision" would be a useful preference, as some NPU's and GPU's chop off low bits (also, separate PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for both the "compatible" (in this PR), adding "low-latency" in a next PR, and adding a preference/hint for "precision" (to
MLContextOptions?).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me throw in "non-accelerated" here or even "cpu" if we don't like "fallback". Reading "compatible" makes me be mildly suspecting an accelerator could still be in the mix :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not tied to
"fallback"but it does align with existing WebGPU naming.Some opposition to
"cpu"since it describes an implementation detail and not a policy. Definition of fallback as used in the WebGPU specification https://www.w3.org/TR/webgpu/#fallback-adapter:whatever name chosen should help describe the behavior and not the implementation. The spec does not require cpu execution so having that in the name would seem incorrect.