Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 43 additions & 4 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -748,7 +748,7 @@ An {{MLContext}} interface represents a global state of neural network execution
In a situation when a GPU context executes a graph with a constant or an input in the system memory as an {{ArrayBufferView}}, the input content is automatically uploaded from the system memory to the GPU memory, and downloaded back to the system memory of an {{ArrayBufferView}} output buffer at the end of the graph execution. This data upload and download cycles will only occur whenever the execution device requires the data to be copied out of and back into the system memory, such as in the case of the GPU. It doesn't occur when the device is a CPU device. Additionally, the result of the graph execution is in a known layout format. While the execution may be optimized for a native memory access pattern in an intermediate result within the graph, the output of the last operation of the graph must convert the content back to a known layout format at the end of the graph in order to maintain the expected behavior from the caller's perspective.

<div class="note">
When an {{MLContext}} is created with {{MLContextOptions}}, the user agent selects and creates the underlying execution device by taking into account these options, currently only the {{MLPowerPreference}} option.
When an {{MLContext}} is created with {{MLContextOptions}}, the user agent selects and creates the underlying execution device by taking into account these options.

Depending on the underlying platform, the user agent <span class=allow-2119>may</span> select different combinations of CPU, NPU and GPU devices.
</div>
Expand Down Expand Up @@ -978,6 +978,7 @@ enum MLPowerPreference {

dictionary MLContextOptions {
MLPowerPreference powerPreference = "default";
boolean accelerated = true;
};

[SecureContext, Exposed=(Window, Worker)]
Expand All @@ -1001,6 +1002,8 @@ The <dfn dfn-for=MLContextOptions dfn-type=dict-member>powerPreference</dfn> opt
<dd>Prioritizes power consumption over other considerations such as execution speed.</dd>
</dl>

The <dfn dfn-for=MLContextOptions dfn-type=dict-member>accelerated</dfn> option indicates the application's preference as related to massively parallel acceleration. When set to `true` (by default), the underlying platform will attempt to use the available massively parallel accelerators, such as GPU or NPU, also depending on the {{MLContextOptions/powerPreference}}. When set to `false`, the application hints to prefer CPU inference.

### {{ML/createContext()}} ### {#api-ml-createcontext}

<div dfn-for="ML/createContext(options), ML/createContext(gpuDevice)" dfn-type=argument>
Expand All @@ -1018,11 +1021,16 @@ The <dfn dfn-for=MLContextOptions dfn-type=dict-member>powerPreference</dfn> opt
1. If |options| is a {{GPUDevice}} object, then:
1. Set |context|.{{MLContext/[[contextType]]}} to "[=context type/webgpu=]".
1. Set |context|.{{MLContext/[[powerPreference]]}} to {{MLPowerPreference/"default"}}.
1. Set |context|.{{MLContext/[[accelerated]]}} to `true`.
1. Set |context|.{{MLContext/[[cpuFallbackActive]]}} to `undefined`.
1. Otherwise:
1. Set |context|.{{MLContext/[[contextType]]}} to "[=context type/default=]".
1. Set |context|.{{MLContext/[[lost]]}} to [=a new promise=] in |realm|.
1. If |options|["{{MLContextOptions/powerPreference}}"] [=map/exists=], then set |context|.{{MLContext/[[powerPreference]]}} to |options|["{{MLContextOptions/powerPreference}}"].
1. Otherwise, set |context|.{{MLContext/[[powerPreference]]}} to {{MLPowerPreference/"default"}}.
1. If |options|["{{MLContextOptions/accelerated}}"] [=map/exists=], then set |context|.{{MLContext/[[accelerated]]}} to |options|["{{MLContextOptions/accelerated}}"].
1. Otherwise, set |context|.{{MLContext/[[accelerated]]}} to `true`.
1. Set |context|.{{MLContext/[[cpuFallbackActive]]}} to `undefined`.
1. If the user agent cannot support |context|.{{MLContext/[[contextType]]}}, then return failure.
1. Return |context|.
</details>
Expand Down Expand Up @@ -1082,6 +1090,8 @@ interface MLContext {

undefined destroy();

readonly attribute boolean accelerated;
readonly attribute boolean cpuFallbackActive;
readonly attribute Promise<MLContextLostInfo> lost;
};
</script>
Expand All @@ -1095,6 +1105,12 @@ interface MLContext {
: <dfn>\[[powerPreference]]</dfn> of type {{MLPowerPreference}}.
::
The {{MLContext}}'s {{MLPowerPreference}}.
: <dfn>\[[accelerated]]</dfn> of type {{boolean}}.
::
The {{MLContext}}'s processing type (CPU or massively parallel processing).
: <dfn>\[[cpuFallbackActive]]</dfn> of type {{boolean}}.
::
The {{MLContext}}'s status for CPU fallback type (CPU or massively parallel processing).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the major native ML runtimes, including Core ML, Windows ML (ONNX Runtime) and TFLite, enable CPU fallback by default. Some runtimes, e.g. ONNX Runtime, allow developers to disable CPU fallback explicitly through a session option disable_cpu_ep_fallback 1. Without CPU fallback, model compilation may fail if the accelerator cannot execute all ops. Chromium prototype has a switch for that only for debugging purpose 2. What are the other cases that a WebNN implementation may set this to false?

Copy link
Collaborator Author

@zolkis zolkis Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the CPU fallback option to false is when the application wants to have an (error) indication if massively parallel execution is not guaranteed with high chance (not an exact thing, but among many contradicting options, it's good enough). The use case is laid out in issue #815, see e.g. comment, and the following discussion.
(Feel free to suggest other solutions.)

EDIT (w.r.t. where to check for CPU fallback): this use case would prefer early warning of CPU fallback likelihood (to be able to choose another inference path), so for that the checks make more sense in the build steps, indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

application wants to have an (error) indication if massively parallel execution is not guaranteed with high chance

How could an application indicate that? Should MLContextOptions add another property, something like boolean cpuFallback, default to true? An application can set contextOptions.cpuFallback to false for this use case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was discussed in earlier calls (in the explainer related discussions): exposing a context option for setting CPU fallback to false hits some constraints and could be accomplished with the accelerated option, hence was discarded as an approach.

In #884 there is a code example for this use case:

// create a context that should use massive parallel processing (e.g. GPU/NPU)
context = await navigator.ml.createContext({accelerated: true});
if (context.accelerated) {
    // the context will mostly use GPU/NPU, but CPU fallback may happen
} else {
    // the platform tells it likely cannot provide NPU or GPU, so try something else
}

// create a context that should preferably use NPU
context = await navigator.ml.createContext({accelerated: true, powerPreference: 'low-power'});
if (context.accelerated) {
    // NPU is likely used -- further requirements could be set by opSupportLimitsPerDevice
} else {
    // NPU is likely not available, and since GPU needs high power, it is not used
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the code example. I understand an implementation should preferably use GPU/NPU if accelerated option is set to true. However, as I shared, the CPU fallback is enabled by default by major native ML runtimes. It's not clear to me how an implementation can tell an application wants to disable the CPU fallback.

could be accomplished with the accelerated option

Do you mean the implementation should disable CPU fallback if accelerated option is set to true? Then how could an application indicate it is fine with CPU fallback while preferring GPU/NPU execution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable to me.

@zolkis if you agree and are available, please open a separate issue for cpuFallbackActive and seed it with your insights. If you also update this PR accordingly we should be able to merge this PR by the end of the week.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have already removed the context option for preventing CPU fallback.

I'd like to understand the concerns with the cpuFallbackActive attribute. If it is it because of the polling steps, I already removed calling them from graph.dispatch() and didn't include them in build(), so there is only the getter, for which @handellm said would be good enough for Meet (instead of an event, which would present more issues).

Are there any further issues to be clarified, @huningxin , @reillyeon?

Copy link
Contributor

@huningxin huningxin Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the offline discussion, cpuFallbackActive seems to be a useful attribute of MLGraph (maybe coordinating with @philloooo 's proposal #854) rather than MLContext. I'll let @reillyeon and @philloooo chime in and share more thoughts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpuFallbackActive seems to be a useful attribute of MLGraph

I agree, that makes a lot of sense. Even more, actually a sub-graph or individual ops might fall back to CPU (as mentioned before, a context / graph should be associated with an execution plan, not only with underlying execution devices).

For this PR, exposing cpuFallbackActive on context was chosen for the "simplicity" argument, also because a context still is associated with an underlying execution device -- for which we opened another issue in #897. Once we relax that and work with these terms, I think we should properly address CPU fallback as well.

This is a good development, so I will remove it from this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @zolkis's comments above that an MLContext represents a preferred order of execution providers (determined by power/acceleration preference) while only when you construct an MLGraph do you know what the actual execution plan for a given graph will look like.

: <dfn>\[[lost]]</dfn> of type {{Promise}}<{{MLContextLostInfo}}>.
::
A {{Promise}} that is resolved when the {{MLContext}}'s underlying execution device is no longer available.
Expand All @@ -1114,6 +1130,28 @@ The <dfn>context type</dfn> is the type of the execution context that manages th
<dd>Context created from WebGPU device.</dd>
</dl>

<div algorithm>
The <dfn attribute for=MLContext>accelerated</dfn> getter steps are to return [=this=].{{MLContext/[[accelerated]]}}.
</div>

<div algorithm>
The <dfn attribute for=MLContext>cpuFallbackActive</dfn> getter steps are:
1. If [=this=].{{MLContext/[[cpuFallbackActive]]}} is `undefined`, then invoke [=poll CPU fallback status=].
1. Return [=this=].{{MLContext/[[cpuFallbackActive]]}}.
</div>

<details open algorithm>
<summary>
To <dfn>poll CPU fallback status</dfn>, run the following steps.
</summary>
1. If [=this=].{{MLContext/[[accelerated]]}} is `false`, then:
1. Set [=this=].{{MLContext/[[cpuFallbackActive]]}} to `true` and return.
1. If the underlying execution device is available, then:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding a definition for "underlying execution device"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are mentioned in the device selection section, though no formal definition is given.

If we wanted to give one, it's important to stress it's not a single device, but the final, eventually heterogeneous execution plan that maps specific parts of the model graph to the best available combination of accelerators at the exact moment of inference.

During the build phase, we should not select a device, but define preferences (e.g. prioritized list of execution providers/delegates), which the runtime / underlying platform uses for the actual decisions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This opens again the discussion on the relationship between context and underlying execution device(s). I think we should not refer to a single device here, in the light of past discussions.

In general we should bind the context not to a device, but to the execution plan (prioritized list of execution providers) mentioned above. Then, a separate concept (internal slot) would be the actual execution plan in the moment of inference. The text formulation should allow for a single device per context to supporting heterogeneous sub-graph execution on different devices.

I think we could track that in a separate issue. In this PR, I have just removed the text "currently only the {{MLPowerPreference}} option" in line 751, and used the term from the device selection section in this algorithm.

For this PR, I modify the text so that it's compatible with my explanation above.

1. Issue a request to check whether the device executes the workload on CPU. If yes, then set [=this=].{{MLContext/[[cpuFallbackActive]]}} to `true` and return.
1. Otherwise, set [=this=].{{MLContext/[[cpuFallbackActive]]}} to `false` and return.
1. Set [=this=].{{MLContext/[[cpuFallbackActive]]}} to `undefined`.
</details>

<details open algorithm>
<summary>
To <dfn>validate buffer with descriptor</dfn> given {{AllowSharedBufferSource}} |bufferSource| and {{MLOperandDescriptor}} |descriptor|, run the following steps:
Expand Down Expand Up @@ -1178,7 +1216,8 @@ Note: `dispatch()` itself provides no signal that graph execution has completed.
1. If [=validating tensors with descriptors=] given |outputs| and |graph|.{{MLGraph/[[outputDescriptors]]}} returns false, then [=exception/throw=] a {{TypeError}}.
1. Enqueue the following steps to |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[timeline]]}}:
1. Run these steps, but [=/abort when=] [=this=] [=MLContext/is lost=]:
1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |inputs| and |outputs|.
1. Issue a compute request to |graph|.{{MLGraph/[[implementation]]}} given |inputs| and |outputs|, as well as |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[powerPreference]]}} and |graph|.{{MLGraph/[[context]]}}.{{MLContext/[[accelerated]]}}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose powerPreference and accelerated options should be used by build steps rather than dispatch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These steps were meant for the dispatch phase, when the actual accelerators are selected.
If underlying accelerators cannot be modified during dispatch, then yes, these could be in the build steps, for static preparation.
However, if supported, they should also be included in the dispatch steps, which is the final decision point in dynamic execution.
I guess for now we could just move it to the build phase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

1. Run the steps to [=poll CPU fallback status=] for |graph|.{{MLGraph/[[context]]}}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step seems to be unnecessary because cpuFallbackActive getter already runs it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, these would only be needed if there was an event (discussed earlier and agreed that polling is enough for now).


Issue(778): Add a mechanism for reporting errors during graph execution.

Expand Down Expand Up @@ -1730,7 +1769,7 @@ typedef (bigint or unrestricted double) MLNumber;
: <dfn>\[[operator]]</dfn> of type [=operator=]
::
Reference to {{MLOperand}}'s corresponding [=operator=].

: <dfn>\[[constantTensor]]</dfn> of type {{MLTensor}}
::
The {{MLOperand}}'s tensor (only for constant operands).
Expand Down Expand Up @@ -2151,7 +2190,7 @@ Build a composed graph up to a given output operand into a computational graph a
1. If |name| is empty, then return [=a new promise=] in |realm| [=rejected=] with a {{TypeError}}.
1. If [=MLGraphBuilder/validating operand=] given [=this=] and |operand| returns false, then return [=a new promise=] in |realm| [=rejected=] with a {{TypeError}}.
1. If |operand| is in [=this=]'s [=MLGraphBuilder/graph=]'s [=computational graph/inputs=] or [=computational graph/constants=], then return [=a new promise=] in |realm| [=rejected=] with a {{TypeError}}.
1. If |operand|.{{MLOperand/[[constantTensor]]}} exists and |operand|.{{MLOperand/[[constantTensor]]}}.{{MLTensor/[[isDestroyed]]}} is true, then return [=a new promise=] in |realm| [=rejected=] with a {{TypeError}}.
1. If |operand|.{{MLOperand/[[constantTensor]]}} exists and |operand|.{{MLOperand/[[constantTensor]]}}.{{MLTensor/[[isDestroyed]]}} is true, then return [=a new promise=] in |realm| [=rejected=] with a {{TypeError}}.
1. Let |operands| be a new empty [=/set=].
1. Let |operators| be a new empty [=/set=].
1. Let |inputs| be a new empty [=/set=].
Expand Down