Skip to content

Conversation

adrianlizarraga
Copy link
Contributor

@adrianlizarraga adrianlizarraga commented Sep 30, 2025

Description

This PR adds an initial set of C APIs necessary to support kernel registration for plugin EPs.

Example use

The example plugin EP implementation now registers MemcpyFromHost and MemcpyToHost operator kernels using the new APIs. New utilities in the example implementation make the process of defining operator kernels very similar to the existing process used by provider-bridge EPs.

First, the operator kernel class is defined:

// File: onnxruntime/test/autoep/library/kernels/memcpy.h
struct Memcpy : public OrtKernelImpl {
  static OrtStatus* Create(const OrtKernelInfo* info, void* state, /*out*/ std::unique_ptr<Memcpy>& kernel);

  Memcpy(const OrtKernelInfo* info, void* state);

  static OrtStatus* ORT_API_CALL ComputeImpl(OrtKernelImpl* this_ptr, OrtKernelContext* kernel_ctx) noexcept;
  static void ORT_API_CALL ReleaseImpl(OrtKernelImpl* this_ptr) noexcept;

  OrtStatus* DoCompute(OrtKernelContext* kernel_ctx) noexcept;

 private:
  const OrtKernelInfo* info_;
  void* state_;  // Custom state passed from OrtEp
};

Then, a macro defines a function that can be called to register the operator with the EP's kernel registry:

// File: onnxruntime/test/autoep/library/kernels/memcpy.cc
ONNX_OPERATOR_KERNEL_EX(
    MemcpyFromHost,
    kOnnxDomain,
    1,
    (Ort::KernelDefBuilder()
         .SetInputMemType(0, OrtMemType::OrtMemTypeCPUInput)
         .AddTypeConstraint("T", MLDataTypes::GetTensorType(ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT))),
    Memcpy)

ONNX_OPERATOR_KERNEL_EX(
    MemcpyToHost,
    kOnnxDomain,
    1,
    (Ort::KernelDefBuilder()
         .SetOutputMemType(0, OrtMemType::OrtMemTypeCPUOutput)
         .AddTypeConstraint("T", MLDataTypes::GetTensorType(ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT))),
    Memcpy)

Lastly, the functions defined by the above macro are entered into a table:

// File: onnxruntime/test/autoep/library/ep_kernel_registration.cc

// Include kernel files:
#include "kernels/memcpy.h"

// Forward declarations of kernel classes used as template args for BuildKernelCreateInfo
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyFromHost);
class ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyToHost);

// Table of BuildKernelCreateInfo functions for each operator
static const BuildKernelCreateInfoFn build_kernel_create_info_funcs[] = {
    BuildKernelCreateInfo<void>,  // Dummy to avoid table becoming empty.
    BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyFromHost)>,
    BuildKernelCreateInfo<ONNX_OPERATOR_KERNEL_CLASS_NAME(kOnnxDomain, 1, MemcpyToHost)>,
};

The example EP processes the entries in the above table to add information about the supported operator kernels to the EP's kernel registry (OrtKernelRegistry).

Additionally, during the call to OrtEp::GetCapability, an EP can now lookup registered kernel definitions via the new API EpGraphSupportInfo_LookUpKernel. Note that an EP would not normally lookup kernels for Memcpy**Host, which are inserted by ORT. Instead, it would be used to look up other registered operator kernels like Conv, for example.

static OrtStatus* ORT_API_CALL GetCapabilityImpl(OrtEp* this_ptr, const OrtGraph* graph,
                                                           OrtEpGraphSupportInfo* graph_support_info) noexcept {
  // ...

  for (const OrtNode* node : nodes) {
    const OrtKernelDef* kernel_def = nullptr;
    OrtStatus* status = this_ep->ep_api->EpGraphSupportInfo_LookUpKernel(graph_support_info, node, &kernel_def);

    if (status != nullptr) {
      return status;
    }

    if (kernel_def != nullptr) {  // Take node if this EP has a registered kernel for it.
      if (OrtStatus* st = this_ep->ep_api->EpGraphSupportInfo_AddSingleNode(graph_support_info, node);
          st != nullptr) {
        return st;
      }
    }
  }

  return nullptr;
}

EP implementation details

An EP instance (i.e., OrtEp) that needs to register operator kernels with ONNX Runtime must implement the following OrtEp::GetKernelRegistry() function:

Function Signature Description
GetKernelRegistry

Returns:OrtStatus*

Parameters:
  • OrtEp* this_ptr: The OrtEp instance.
  • const OrtKernelRegistry** kernel_registry: Output parameter set to the EP's kernel registry, which must remain valid throughout the lifetime of the EP.
Gets the execution provider's kernel registry, if any.

Remarks: A kernel registry contains kernel creation information for operator kernels supported by an EP.

Note: Implementation of this function is optional. If set to NULL, ORT assumes the EP compiles nodes.

If defined by the EP, the OrtEp::GetKernelRegistry() function is called by ONNX Runtime after creating an instance of the OrtEp in order to retrieve the EP's kernel registry.

APIs used by EP to add entries to kernel registry

An EP's kernel registry (OrtKernelRegistry) contains information necessary for the (later) creation of operator kernels supported by an EP. Conceptually, a kernel registry contains an array of "kernel creation information" elements, one per operator. Each such element consists of:

  • A kernel definition (OrtKernelDef), which specifies operator type, supported versions, type constraints, I/O memory types, etc.
  • A function of type OrtKernelCreateFunc that ORT calls to create an instance of the kernel (OrtKernelImpl).
  • Custom opaque state (provided by the OrtEp) that is passed to the OrtKernelCreateFunc.

An EP uses the following OrtEpApi::KernelRegistry_AddKernel() function to add an entry for one supported operator.

Function Signature Description
KernelRegistry_AddKernel

Returns:OrtStatus*

Parameters:
  • OrtKernelRegistry* kernel_registry: The OrtKernelRegistry instance.
  • const OrtKernelDef* kernel_def: The kernel definition, which includes operator type, version, EP name, type constraints, etc.
  • OrtKernelCreateFunc kernel_create_func: Function that creates an instance of the operator kernel as a OrtKernelImpl instance.
  • void* kernel_create_func_state: Custom state passed to the kernel creation function. Can be null.
Adds kernel creation information for a supported operator kernel to the given kernel registry.

Remarks: Refer to OrtEp::GetKernelRegistry, which returns an EP's kernel registry to ORT.
Building a kernel definition

An EP uses a kernel definition builder (OrtKernelDefBuilder) to create a kernel definition (OrtKernelDef). The following table lists some of the C APIs related to building a kernel definition. The above ONNX_OPERATOR_KERNEL_EX macro uses these APIs.

Function Signature Description
KernelDefBuilder_SetOperatorType

Returns:OrtStatus*

Parameters:
  • OrtKernelDefBuilder* kernel_def_builder: The OrtKernelDefBuilder instance.
  • const char* op_type: A null-terminated string representing the operator type.
Sets the kernel's operator type.
KernelDefBuilder_SetDomain

Returns:OrtStatus*

Parameters:
  • OrtKernelDefBuilder* kernel_def_builder: The OrtKernelDefBuilder instance.
  • const char* domain: A null-terminated string representing the operator's domain.
Sets the kernel's domain.
... ...
KernelDefBuilder_Build

Returns:OrtStatus*

Parameters:
  • OrtKernelDefBuilder* kernel_def_builder: The OrtKernelDefBuilder instance.
  • OrtKernelDef** kernel_def_out: The new OrtKernelDef instance.
Creates a OrtKernelDef instance from the given kernel definition builder.
Defining a kernel implementation

An EP defines a kernel implementation by initializing an instance of OrtKernelImpl (shown below) with function pointers for computation, release, etc.

struct OrtKernelImpl {
  uint32_t ort_version_supported;  ///< Must be initialized to ORT_API_VERSION

  /** \brief Computation function called to execute the kernel on an EP.
   *
   * \param[in] this_ptr The OrtKernelImpl instance.
   * \param[in] context The OrtKernelContext instance that provides access to the inputs and outputs.
   *
   * \snippet{doc} snippets.dox OrtStatus Return Value
   *
   * \since Version 1.24.
   */
  ORT_API2_STATUS(Compute, _In_ OrtKernelImpl* this_ptr, _In_ OrtKernelContext* context);

  /** \brief Called by ORT to release the OrtKernelImpl instance and its resources.
   *
   * \param[in] this_ptr The OrtKernelImpl instance.
   *
   * \since Version 1.24.
   */
  ORT_API_T(void, Release, _In_ OrtKernelImpl* this_ptr);
};

As shown previously, the example EP creates a Memcpy class that inherits from OrtKernelImpl and implements the above functions.

Defining a kernel creation function

An EP must provide a function of type OrtKernelCreateFunc that ORT can later call to create an instance of a kernel (OrtKernelImpl). The signature of the OrtKernelCreateFunc is shown below.

/** \brief Type definition for a function that creates an OrtKernelImpl instance for an operator kernel.
 *
 * \param[in] ctx Unused/reserved for future use.
 * \param[in] kernel_create_func_state Opaque state initially provided by the EP that registered the kernel.
 *                                     Refer to OrtEpApi::KernelRegistry_AddKernel(). May be null.
 * \param[in] info The OrtKernelInfo instance that provides access to the kernel's input and output characteristics.
 * \param[out] kernel_out Output parameter set to the new OrtKernelImpl instance.
 *
 * \snippet{doc} snippets.dox OrtStatus Return Value
 *
 * \since Version 1.24.
 */
typedef OrtStatus*(ORT_API_CALL* OrtKernelCreateFunc)(_In_ OrtKernelCreateContext* ctx,  // unused/reserved as of 1.24
                                                      _In_ void* kernel_create_func_state,
                                                      _In_ const OrtKernelInfo* info,
                                                      _Outptr_result_maybenull_ OrtKernelImpl** kernel_out);

The example EP declares kernel creation functions via use of the previously mentioned ONNX_OPERATOR_KERNEL_EX macro. If one were to expand the macro call, the kernel creation function for MemcpyFromHost would look similar to the following snippet:

OrtStatus* ORT_API_CALL CreateMemcpyKernel(OrtKernelCreateContext* /*ctx*/, void* kernel_create_func_state,
                                           const OrtKernelInfo* info, OrtKernelImpl** kernel_out) {
  *kernel_out = nullptr;

  std::unique_ptr<Memcpy> kernel;
  RETURN_IF_ERROR(Memcpy::Create(info, kernel_create_func_state, kernel));

  *kernel_out = kernel.release();
  return nullptr;
}

Motivation and Context

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This draft PR implements support for kernel-based execution providers (EPs) within the ONNX Runtime EP plugin architecture. The changes enable plugin EPs to register custom kernels directly with the ORT runtime, expanding beyond the current node-based computation model.

  • Adds comprehensive kernel registration infrastructure for plugin EPs
  • Implements memory copy kernels as examples (MemcpyFromHost/MemcpyToHost)
  • Extends the EP API with kernel definition and creation functionality

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
onnxruntime/test/framework/ep_plugin_provider_test.cc Updates test to pass kernel registry parameter
onnxruntime/test/autoep/library/kernels/utils.h Defines kernel creation utilities and macros
onnxruntime/test/autoep/library/kernels/memcpy.h Declares example Memcpy kernel interface
onnxruntime/test/autoep/library/kernels/memcpy.cc Implements example Memcpy kernel with registration
onnxruntime/test/autoep/library/kernels/data_types.h Declares MLDataTypes singleton for type management
onnxruntime/test/autoep/library/kernels/data_types.cc Implements MLDataTypes for tensor type retrieval
onnxruntime/test/autoep/library/ep_kernel_registration.h Declares kernel registration functions
onnxruntime/test/autoep/library/ep_kernel_registration.cc Implements kernel registration logic
onnxruntime/test/autoep/library/ep.h Adds kernel creation method declarations to EP
onnxruntime/test/autoep/library/ep.cc Implements kernel creation methods in example EP
onnxruntime/core/session/utils.h Declares CopyTensors utility function
onnxruntime/core/session/utils.cc Implements CopyTensors utility function
onnxruntime/core/session/provider_policy_context.cc Updates EP creation to use new factory method
onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.h Extends PluginExecutionProvider with kernel registry support
onnxruntime/core/session/plugin_ep/ep_plugin_provider_interfaces.cc Implements kernel registry initialization in plugin EP
onnxruntime/core/session/plugin_ep/ep_kernel_registration.h Declares kernel registration infrastructure
onnxruntime/core/session/plugin_ep/ep_kernel_registration.cc Implements plugin EP kernel wrapper and registration
onnxruntime/core/session/plugin_ep/ep_api.h Declares new EP API functions for kernel support
onnxruntime/core/session/plugin_ep/ep_api.cc Implements new EP API functions for kernel support
onnxruntime/core/session/onnxruntime_c_api.cc Refactors CopyTensors to use shared utility
include/onnxruntime/core/session/onnxruntime_ep_c_api.h Adds kernel-related types and API declarations
include/onnxruntime/core/session/onnxruntime_cxx_inline.h Implements C++ wrapper methods for kernel APIs
include/onnxruntime/core/session/onnxruntime_cxx_api.h Declares C++ KernelDefBuilder class
cmake/onnxruntime_unittests.cmake Updates build to include kernel source files

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@adrianlizarraga adrianlizarraga changed the title [EP ABI] [DRAFT] Support kernel-based EPs [EP ABI] [DRAFT] Initial support for kernel-based EPs Oct 2, 2025
std::pair<int, int> GetSinceVersion() const;

///< Wraps OrtEpApi::KernelDef_GetExecutionProvider
const char* GetExecutionProvider() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of the information for any getters is optional, suggest returning a status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying C API function OrtEpApi::KernelDef_GetExecutionProvider returns the const char* directly (doesn't return a status).

Sorry, I don't fully understand the comment. Is the request to make this return a status instead? What is meant by "information for any getters is optional"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a general comment not to be bound to throwing exceptions by default. In this case, the data is returned without error reporting.

return OrtApis::CreateStatus(ORT_INVALID_ARGUMENT, "Invalid arguments provided to CopyTensors.");
}

const OrtMemoryInfo* src_memory_info = nullptr;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: moved this into a shared utility function that can be used by the new API KernelInfo_CopyTensors

*
* \since Version 1.24.
*/
typedef OrtStatus*(ORT_API_CALL* OrtKernelCreateFunc)(_In_ OrtKernelCreateContext* ctx, // unused/reserved as of 1.24
Copy link
Contributor Author

@adrianlizarraga adrianlizarraga Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove this ctx parameter. It is a stand-in for the FuncManager parameter in the related KernelCreateFn used by provider-bridge EPs:

using KernelCreateFn = std::function<Status(FuncManager& func_mgr, const OpKernelInfo& info, std::unique_ptr<OpKernel>& out)>;

It doesn't look like any EPs in the ORT code base use the FuncManager parameter at all, but I kept it here (with a more generic name) just in case we find a use for it in the future. Would appreciate opinions.

/// Singleton that returns sets of OrtMLDataType instances using the public C API.
/// Analogous to the internal utilities in include/onnxruntime/core/framework/data_types.h
/// </summary>
class MLDataTypes {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering moving this to the public C++ API header (but not a singleton). Seems like all kernel-based plugin EPs would benefit from this.

@adrianlizarraga adrianlizarraga changed the title [EP ABI] [DRAFT] Initial support for kernel-based EPs [EP ABI] Initial support for kernel-based EPs Oct 3, 2025
Comment on lines +750 to +751
ORT_API2_STATUS(GetTensorMLDataType, _In_ ONNXTensorElementDataType elem_type,
_Outptr_ const OrtMLDataType** out);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently only added an API to get tensor data types. We would need to add similar APIs for sequences, maps, etc.

Also, I'm not too sure if we should keep using the term "ML data type". I kept it to remain consistent with the internal names, but perhaps we can rename?

@adrianlizarraga adrianlizarraga marked this pull request as ready for review October 3, 2025 19:12
*/
ORT_API2_STATUS(KernelDefBuilder_Build, _In_ OrtKernelDefBuilder* kernel_def_builder,
_Outptr_ OrtKernelDef** kernel_def_out);

Copy link
Contributor Author

@adrianlizarraga adrianlizarraga Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not yet add all KernelDefBuilder functions. It's missing aliasing, "may inplace". However, these things may not be used commonly and could be added later.

*/
ORT_API2_STATUS(KernelDef_GetOutputMemType, _In_ const OrtKernelDef* kernel_def,
_In_ size_t output_index, _Out_ OrtMemType* mem_type);

Copy link
Contributor Author

@adrianlizarraga adrianlizarraga Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have not added all getters for KernelDef because they are not really used by EPs. An EP retrieves a kernel def during GetCapability to check if a kernel for a node has been registered. Notably, there is only one EP (ACL EP) that actually gets a property from a KernelDef returned by a lookup, and that property is the operator type, which it could instead get from the node.

#include "utils.h"

ONNX_OPERATOR_KERNEL_EX(
MemcpyFromHost,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this EP need to register its own memcpy kernels or can it use the generic ones from the CPU EP (added in #26088)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need it. It was a way to test the kernel registration utilities. Perhaps it would be best to create a different kernel-based example EP and leave this one unchanged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use this as an example EP implementation for reference, it might be better to show EP authors that they can avoid implementing their own memcpy kernels unless they require some special behavior not provided by the generic ones.

also, the testing of the generic memcpy kernels was relying on this EP not providing its own, but we could update the test set up if needed. on a semi-related note, I don't know how well OpTester will work with an EP that has both a kernel registry and support for compiling nodes.

KernelDefBuilder& SetExecutionProvider(const char* ep_name);
KernelDefBuilder& SetInputMemType(size_t input_index, OrtMemType mem_type);
KernelDefBuilder& SetOutputMemType(size_t output_index, OrtMemType mem_type);
KernelDefBuilder& AddTypeConstraint(const char* arg_name, const OrtMLDataType* data_types);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it is one data type with this overload, right?

Suggested change
KernelDefBuilder& AddTypeConstraint(const char* arg_name, const OrtMLDataType* data_types);
KernelDefBuilder& AddTypeConstraint(const char* arg_name, const OrtMLDataType* data_type);

*
* \since Version 1.24
*/
ORT_API2_STATUS(KernelInfo_CopyTensors, _In_ const OrtKernelInfo* info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to reuse the CopyTensors API or do we need a new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing CopyTensors API takes an OrtEnv as input, which is not available to EPs (if I'm not mistaken)

const onnxruntime::KernelCreateInfo* create_info =
graph_support_info->kernel_lookup.LookUpKernel(ep_node->GetInternalNode());

*out_kernel_def = static_cast<const OrtKernelDef*>(create_info->kernel_def.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check whether the lookup fails to find anything (create_info == nullptr)?

const OrtLogger& logger,
/*out*/ std::unique_ptr<PluginExecutionProvider>& plugin_ep);

explicit PluginExecutionProvider(UniqueOrtEp ep, const OrtSessionOptions& session_options, OrtEpFactory& ep_factory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when should one use PluginExecutionProvider::PluginExecutionProvider() vs. PluginExecutionProvider::Create()?


// Table of BuildKernelCreateInfo functions for each operator
static const BuildKernelCreateInfoFn build_kernel_create_info_funcs[] = {
BuildKernelCreateInfo<void>, // Dummy to avoid table becoming empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the dummy entry was originally added to support reduced op builds for certain EPs. we probably don't need it in this example.

}

if (status != nullptr) {
ep_api.ReleaseKernelRegistry(*kernel_registry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a C++ API type for OrtKernelRegistry?

}

static void CheckFileIsEmpty(const PathString& filename) {
std::ifstream ifs{filename};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also check that the file was opened? ASSERT_TRUE(ifs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants