Skip to content

Simplify the operand layout support of conv2d and pooling 2d operations #324

@huningxin

Description

@huningxin

In the existing WebNN spec, conv2d supports two input operand layouts defined by MLInputOperandLayout and four filter operand layouts defined by MLConv2dFilterOperandLayout.

enum MLInputOperandLayout {
  "nchw",
  "nhwc"
};

enum MLConv2dFilterOperandLayout {
  "oihw",
  "hwio",
  "ohwi",
  "ihwo"
};

This may make the implementation more complicated especially if a native ML framework or OS API doesn't support some of these layouts. If one layout is unsupported, the implementation may need to insert the transpose operations into the graph around the conv2d operation that transposes the unsupported layout to supported one. This would easily lead to an inefficient graph representation that may have redundant transpose operations. Or the implementation may need to optimize the graph by techniques such as "transpose sink" which may require a more complex implementation. This issue was raised in Chromium CL review.

To simplify the implementation, the proposal is to reduce the supported operand layouts, for example, just keep the default one. Because WebNN supports transpose operation, the layout adaption and graph level optimization can be handled by ML frameworks that usually already support such functionalities.

Thanks @wacky6 for this idea.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions