Skip to content

Conversation

@wooway777
Copy link
Collaborator

resolves #884

天数依赖 #633 中的改动,但该pr暂未规避对旧版本的影响,且隔离方案未确认。

天数
image
image

沐曦
image
image

摩尔
image
image

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for the add_rms_norm operator on three additional device platforms: Iluvatar, Metax, and Moore. The implementation follows existing patterns from the NVIDIA implementation, adapting it for each platform's specific requirements (MUSA for Moore, HCCU/MACA for Metax).

Key changes:

  • Enabled add_rms_norm operator for Moore and Metax by uncommenting includes and adding platform-specific registration
  • Added block size support (1024 and 2048) to NVIDIA implementation to support Iluvatar devices
  • Implemented complete Moore and Metax backend support with platform-specific kernels and data type handling

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/infiniop/ops/add_rms_norm/operator.cc Uncommented includes for metax/moore and registered them in CREATE/GET/CALCULATE/DESTROY macros
src/infiniop/ops/add_rms_norm/nvidia/add_rms_norm_nvidia.cu Added support for CUDA_BLOCK_SIZE_1024 and CUDA_BLOCK_SIZE_2048 for Iluvatar compatibility
src/infiniop/ops/add_rms_norm/moore/add_rms_norm_moore.h New header file defining Moore descriptor interface
src/infiniop/ops/add_rms_norm/moore/add_rms_norm_moore.mu New implementation for Moore platform using MUSA API with kernel launch logic and type handling
src/infiniop/ops/add_rms_norm/metax/add_rms_norm_metax.cuh New header file defining Metax descriptor interface
src/infiniop/ops/add_rms_norm/metax/add_rms_norm_metax.maca New implementation for Metax platform using HCCU/MACA API with kernel launch logic and type handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DEV] 类CUDA add rms norm

2 participants