Skip to content

tensorflow多流问题 #1

@zhaocc1106

Description

@zhaocc1106

首先抱歉,tensorflow仓库不能提issue,只能在这里提了:
想请教一个问题:
我使用r2.2_multistream分支代码编译so移植到我们的服务中,现在发现一个问题,就是我们的模型使用with tf.device('/GPU:0')with tf.device('/GPU:1') 水平把并行部分拆到了两个卡上进行并行计算,使用这个分支的so加载我们的模型发现只能把模型加载到一张卡,推理时只有一张卡有使用率,另外一个一直为0。这个多流技术无法支持多卡吗?
启动时确实好像能看到session启动了多个stream:

 339703 2025-03-31 17:00:29.328625: I tensorflow/core/common_runtime/direct_session.cc:190] New DirectSession, devices:
 339704 2025-03-31 17:00:29.328636: I tensorflow/core/common_runtime/direct_session.cc:192] /job:localhost/replica:0/task:0/device:CPU:0
 339705 2025-03-31 17:00:29.328642: I tensorflow/core/common_runtime/direct_session.cc:192] /job:localhost/replica:0/task:0/device:STREAM_GPU_0:1
 339706 2025-03-31 17:00:29.328649: I tensorflow/core/common_runtime/direct_session.cc:192] /job:localhost/replica:0/task:0/device:STREAM_GPU_0:0
 339707 2025-03-31 17:00:29.328655: I tensorflow/core/common_runtime/direct_session.cc:192] /job:localhost/replica:0/task:0/device:STREAM_GPU_1:1
 339708 2025-03-31 17:00:29.328661: I tensorflow/core/common_runtime/direct_session.cc:192] /job:localhost/replica:0/task:0/device:STREAM_GPU_1:0
 339709 2025-03-31 17:00:29.328667: I tensorflow/core/common_runtime/direct_session.cc:192] /job:localhost/replica:0/task:0/device:XLA_CPU:0
 339710 2025-03-31 17:00:29.328674: I tensorflow/core/common_runtime/direct_session.cc:192] /job:localhost/replica:0/task:0/device:XLA_GPU:0
 339711 2025-03-31 17:00:29.328680: I tensorflow/core/common_runtime/direct_session.cc:192] /job:localhost/replica:0/task:0/device:GPU:0
 339712 2025-03-31 17:00:29.328686: I tensorflow/core/common_runtime/direct_session.cc:192] /job:localhost/replica:0/task:0/device:GPU:1

如果能帮忙看下就太感谢了!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions