转om在华为昇腾芯片上部署

**请问有没有朋友有onnx的转换脚本，我写的转换脚本转出来的onnx模型转om时在NonMaxSuppressionV6Fusion（非极大值抑制 V6 融合）这一步骤失败，原因是score 的 shape 出现了负值，有没有朋友提供一下改进思路或者提供一份onnx转换代码**
atc --model=faster_rcnn_600x600.onnx --framework=5 --output=model_rcnn --input_
format=NCHW --soc_version=Ascend310B4 --input_shape="input:1,3,600,600"
ATC start working now, please wait for a moment.
...
ATC run failed, Please check the detail log, Try 'atc --help' for more information
E20007: Failed to run graph fusion pass [NonMaxSuppressionV6Fusion]. The pass type is [built-in-ai-core-graph-pass]
        Solution: 1. If the pass code is custom, check the error log and the verification logic.  2. If the pass code is not custom, perform a complete or partial dump by using npucollect.sh and then send the dump to Huawei technical support for fault locating.
        TraceBack (most recent call last):
        The shape of score cannot be negative.[FUNC:IdxValueConstNode][FILE:non_max_suppression_fusion_pass.cc][LINE:121]
        generate const value of idx fail[FUNC:Fusion][FILE:non_max_suppression_fusion_pass.cc][LINE:211]
        Failed to run graph fusion pass [NonMaxSuppressionV6Fusion]. The pass type is [built-in-ai-core-graph-pass]
        [GraphOpt][FirstRoundFusion] Run graph fusion pass failed, pass name:NonMaxSuppressionV6Fusion, pass type:built-in-ai-core-graph-pass,                     return value is 4294967295.[FUNC:RunOnePassFusion][FILE:graph_fusion.cc][LINE:1170]
        [GraphOpt][FirstRoundFusion] MainGraph[model_rcnn]: RunGraphFusion not success.[FUNC:Fusion][FILE:graph_fusion.cc][LINE:99]
        [GraphOpt][AfterFusion]Failed to do graph fusion for graph model_rcnn. ErrNo is 4294967295.[FUNC:OptimizeOriginalGraph][FILE:fe_graph_optimizer.cc][LINE:340]
        Call OptimizeOriginalGraph failed, ret:-1, engine_name:AIcoreEngine, graph_name:model_rcnn[FUNC:OptimizeOriginalGraph][FILE:graph_optimize.cc][LINE:178]
        build graph failed, graph id:0, ret:-1[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615]
        GenerateOfflineModel execute failed.

这是我的onnx转换代码：
```
import torch
import numpy as np
from nets.frcnn import FasterRCNN  # 导入你训练时使用的自定义FasterRCNN类
from utils.utils import get_classes, get_new_img_size, preprocess_input  # 复用训练代码中的工具函数


def export_frcnn_to_onnx():
    # -------------------------- 1. 配置必要参数（与训练代码保持一致）--------------------------
    # 从训练代码复制的核心参数，确保与训练时完全匹配
    model_path = "logs/ep040-loss1.072-val_loss1.237.pth"  # 你的训练权重文件
    classes_path = "model_data/voc_classes.txt"  # 类别文件（与训练一致）
    backbone = "resnet50"  # 主干网络（训练时为resnet50）
    anchors_size = [4, 16, 32]  # 锚点尺寸（训练代码中定义的anchors_size）
    input_shape = [600, 600]  # 输入尺寸（训练代码中input_shape）
    confidence = 0.5  # 置信度阈值（不影响导出，仅用于匹配模型前向逻辑）
    nms_iou = 0.30  # NMS阈值（不影响导出，仅用于匹配模型前向逻辑）
    cuda = False  # 导出时建议用CPU，避免GPU与ONNX兼容性问题
    onnx_save_path = "faster_rcnn_600x600.onnx"  # 输出ONNX文件名
    opset_version = 11  # ONNX算子版本（11+兼容大部分部署框架）

    # -------------------------- 2. 初始化模型（复用训练时的结构）--------------------------
    # 1. 获取类别数（与训练一致：目标类别数 + 1个背景类）
    class_names, num_classes = get_classes(classes_path)
    print(f"类别数：{num_classes}（{class_names}）")

    # 2. 初始化自定义FasterRCNN模型（模式为"predict"，与预测代码一致）
    model = FasterRCNN(
        num_classes=num_classes,
        mode="predict",  # 必须设为predict，匹配前向传播逻辑
        anchor_scales=anchors_size,  # 锚点尺寸（训练时的anchors_size）
        backbone=backbone  # 主干网络（resnet50）
    )

    # 3. 加载训练权重（与预测代码一致的加载逻辑）
    device = torch.device("cuda" if cuda and torch.cuda.is_available() else "cpu")
    model.load_state_dict(torch.load(model_path, map_location=device))
    print(f"成功加载权重：{model_path}")

    # 4. 设置模型为评估模式（禁用训练相关层，如Dropout）
    model.eval()
    if cuda:
        model = torch.nn.DataParallel(model)  # 若训练时用多GPU，导出时保持一致


    # -------------------------- 3. 准备输入数据（匹配模型输入要求）--------------------------
    # 1. 模拟一张600x600的RGB图像（与训练输入尺寸一致）
    dummy_image = np.ones((input_shape[0], input_shape[1], 3), dtype=np.float32)  # (H, W, C)
    # 2. 预处理（与训练/预测时的预处理逻辑完全一致）：
    #    - 归一化（preprocess_input）
    #    - 维度转换（HWC → CHW）
    #    - 增加batch维度（CHW → BCHW）
    dummy_image = preprocess_input(dummy_image)  # 复用训练代码的归一化（如减均值、除标准差）
    dummy_input = torch.from_numpy(np.transpose(dummy_image, (2, 0, 1))).unsqueeze(0)  # (1, 3, 600, 600)
    dummy_input = dummy_input.to(device)  # 与模型设备一致


    # -------------------------- 4. 导出ONNX模型--------------------------
    # 动态维度设置：batch_size可动态（0维），高度/宽度固定为600（训练时固定输入尺寸）
    dynamic_axes = {
        "input": {0: "batch_size"},  # 输入的batch维度动态
        "roi_cls_locs": {0: "batch_size"},  # 输出1：建议框调整参数
        "roi_scores": {0: "batch_size"},    # 输出2：建议框类别得分
        "rois": {0: "batch_size"}           # 输出3：建议框坐标
    }

    # 导出ONNX
    torch.onnx.export(
        model=model,
        args=dummy_input,  # 示例输入
        f=onnx_save_path,  # 输出路径
        verbose=False,     # 不打印详细日志（True用于调试）
        opset_version=opset_version,  # 算子版本
        training=torch.onnx.TrainingMode.EVAL,  # 评估模式导出
        do_constant_folding=True,  # 启用常量折叠（优化ONNX模型）
        input_names=["input"],  # 输入节点名（便于部署时识别）
        output_names=["roi_cls_locs", "roi_scores", "rois"],  # 输出节点名（与模型前向输出对应）
        dynamic_axes=None  # 动态维度配置
    )

    print(f"ONNX模型已保存至：{onnx_save_path}")


if __name__ == "__main__":
    export_frcnn_to_onnx()
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

转om在华为昇腾芯片上部署 #222

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

转om在华为昇腾芯片上部署 #222

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions