Skip to content

转om在华为昇腾芯片上部署 #222

@zhangsan728

Description

@zhangsan728

请问有没有朋友有onnx的转换脚本,我写的转换脚本转出来的onnx模型转om时在NonMaxSuppressionV6Fusion(非极大值抑制 V6 融合)这一步骤失败,原因是score 的 shape 出现了负值,有没有朋友提供一下改进思路或者提供一份onnx转换代码
atc --model=faster_rcnn_600x600.onnx --framework=5 --output=model_rcnn --input_
format=NCHW --soc_version=Ascend310B4 --input_shape="input:1,3,600,600"
ATC start working now, please wait for a moment.
...
ATC run failed, Please check the detail log, Try 'atc --help' for more information
E20007: Failed to run graph fusion pass [NonMaxSuppressionV6Fusion]. The pass type is [built-in-ai-core-graph-pass]
Solution: 1. If the pass code is custom, check the error log and the verification logic. 2. If the pass code is not custom, perform a complete or partial dump by using npucollect.sh and then send the dump to Huawei technical support for fault locating.
TraceBack (most recent call last):
The shape of score cannot be negative.[FUNC:IdxValueConstNode][FILE:non_max_suppression_fusion_pass.cc][LINE:121]
generate const value of idx fail[FUNC:Fusion][FILE:non_max_suppression_fusion_pass.cc][LINE:211]
Failed to run graph fusion pass [NonMaxSuppressionV6Fusion]. The pass type is [built-in-ai-core-graph-pass]
[GraphOpt][FirstRoundFusion] Run graph fusion pass failed, pass name:NonMaxSuppressionV6Fusion, pass type:built-in-ai-core-graph-pass, return value is 4294967295.[FUNC:RunOnePassFusion][FILE:graph_fusion.cc][LINE:1170]
[GraphOpt][FirstRoundFusion] MainGraph[model_rcnn]: RunGraphFusion not success.[FUNC:Fusion][FILE:graph_fusion.cc][LINE:99]
[GraphOpt][AfterFusion]Failed to do graph fusion for graph model_rcnn. ErrNo is 4294967295.[FUNC:OptimizeOriginalGraph][FILE:fe_graph_optimizer.cc][LINE:340]
Call OptimizeOriginalGraph failed, ret:-1, engine_name:AIcoreEngine, graph_name:model_rcnn[FUNC:OptimizeOriginalGraph][FILE:graph_optimize.cc][LINE:178]
build graph failed, graph id:0, ret:-1[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615]
GenerateOfflineModel execute failed.

这是我的onnx转换代码:

import torch
import numpy as np
from nets.frcnn import FasterRCNN  # 导入你训练时使用的自定义FasterRCNN类
from utils.utils import get_classes, get_new_img_size, preprocess_input  # 复用训练代码中的工具函数


def export_frcnn_to_onnx():
    # -------------------------- 1. 配置必要参数(与训练代码保持一致)--------------------------
    # 从训练代码复制的核心参数,确保与训练时完全匹配
    model_path = "logs/ep040-loss1.072-val_loss1.237.pth"  # 你的训练权重文件
    classes_path = "model_data/voc_classes.txt"  # 类别文件(与训练一致)
    backbone = "resnet50"  # 主干网络(训练时为resnet50)
    anchors_size = [4, 16, 32]  # 锚点尺寸(训练代码中定义的anchors_size)
    input_shape = [600, 600]  # 输入尺寸(训练代码中input_shape)
    confidence = 0.5  # 置信度阈值(不影响导出,仅用于匹配模型前向逻辑)
    nms_iou = 0.30  # NMS阈值(不影响导出,仅用于匹配模型前向逻辑)
    cuda = False  # 导出时建议用CPU,避免GPU与ONNX兼容性问题
    onnx_save_path = "faster_rcnn_600x600.onnx"  # 输出ONNX文件名
    opset_version = 11  # ONNX算子版本(11+兼容大部分部署框架)

    # -------------------------- 2. 初始化模型(复用训练时的结构)--------------------------
    # 1. 获取类别数(与训练一致:目标类别数 + 1个背景类)
    class_names, num_classes = get_classes(classes_path)
    print(f"类别数:{num_classes}({class_names})")

    # 2. 初始化自定义FasterRCNN模型(模式为"predict",与预测代码一致)
    model = FasterRCNN(
        num_classes=num_classes,
        mode="predict",  # 必须设为predict,匹配前向传播逻辑
        anchor_scales=anchors_size,  # 锚点尺寸(训练时的anchors_size)
        backbone=backbone  # 主干网络(resnet50)
    )

    # 3. 加载训练权重(与预测代码一致的加载逻辑)
    device = torch.device("cuda" if cuda and torch.cuda.is_available() else "cpu")
    model.load_state_dict(torch.load(model_path, map_location=device))
    print(f"成功加载权重:{model_path}")

    # 4. 设置模型为评估模式(禁用训练相关层,如Dropout)
    model.eval()
    if cuda:
        model = torch.nn.DataParallel(model)  # 若训练时用多GPU,导出时保持一致


    # -------------------------- 3. 准备输入数据(匹配模型输入要求)--------------------------
    # 1. 模拟一张600x600的RGB图像(与训练输入尺寸一致)
    dummy_image = np.ones((input_shape[0], input_shape[1], 3), dtype=np.float32)  # (H, W, C)
    # 2. 预处理(与训练/预测时的预处理逻辑完全一致):
    #    - 归一化(preprocess_input)
    #    - 维度转换(HWC → CHW)
    #    - 增加batch维度(CHW → BCHW)
    dummy_image = preprocess_input(dummy_image)  # 复用训练代码的归一化(如减均值、除标准差)
    dummy_input = torch.from_numpy(np.transpose(dummy_image, (2, 0, 1))).unsqueeze(0)  # (1, 3, 600, 600)
    dummy_input = dummy_input.to(device)  # 与模型设备一致


    # -------------------------- 4. 导出ONNX模型--------------------------
    # 动态维度设置:batch_size可动态(0维),高度/宽度固定为600(训练时固定输入尺寸)
    dynamic_axes = {
        "input": {0: "batch_size"},  # 输入的batch维度动态
        "roi_cls_locs": {0: "batch_size"},  # 输出1:建议框调整参数
        "roi_scores": {0: "batch_size"},    # 输出2:建议框类别得分
        "rois": {0: "batch_size"}           # 输出3:建议框坐标
    }

    # 导出ONNX
    torch.onnx.export(
        model=model,
        args=dummy_input,  # 示例输入
        f=onnx_save_path,  # 输出路径
        verbose=False,     # 不打印详细日志(True用于调试)
        opset_version=opset_version,  # 算子版本
        training=torch.onnx.TrainingMode.EVAL,  # 评估模式导出
        do_constant_folding=True,  # 启用常量折叠(优化ONNX模型)
        input_names=["input"],  # 输入节点名(便于部署时识别)
        output_names=["roi_cls_locs", "roi_scores", "rois"],  # 输出节点名(与模型前向输出对应)
        dynamic_axes=None  # 动态维度配置
    )

    print(f"ONNX模型已保存至:{onnx_save_path}")


if __name__ == "__main__":
    export_frcnn_to_onnx()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions