Skip to content

softmax只能单block内规约 #392

@Johann356

Description

@Johann356

block_reduce_max_f32 和block_reduce_sum_f32在我看来只能每个block内的每个thread获得所属block内规约的数据,拿不到全局的规约结果。但是softmax规约维度很大的话要分block,作者是不是没有实现完整规约的核函数。
In my view, functions like block_reduce_max_f32 and block_reduce_sum_f32 only allow each thread to obtain the reduced data within its own block, without access to the global reduction result. However, when the reduction dimension for softmax is very large and requires splitting across blocks, I'm wondering if the author might not have implemented a kernel function for complete reduction. Would you kindly share your thoughts on this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions