Skip to content

GPU doesn't seem to work #29

@fengredrum

Description

@fengredrum

I've set use_gpu = True, but the GPU useage is almost close to zero when running the code. When I look into tensorboard, it shows that all operations are assigned to CPU. Then I disable sess_config = tf.ConfigProto(allow_soft_placement=True) and force it running on GPU, the system console throws an error as:
`INFO:tensorflow:Start a new run and write summaries and checkpoints to E:\Code\PythonScripts\DeepRL\BatchPPO\20180308T091941-pendulum.
WARNING:tensorflow:Number of agents should divide episodes per update.
2018-03-08 09:19:41.315004: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-03-08 09:19:41.595863: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate(GHz): 1.1775
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 1.64GiB
2018-03-08 09:19:41.596493: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0, compute capability: 5.2)
INFO:tensorflow:Graph contains 42003 trainable variables.
2018-03-08 09:19:57.811479: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0, compute capability: 5.2)
Traceback (most recent call last):
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
return fn(*args)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1293, in _run_fn
self._extend_graph()
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1354, in _extend_graph
self._session, graph_def.SerializeToString(), status)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'ppo_temporary/episodes/Variable': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Switch: GPU CPU
VariableV2: CPU
Identity: CPU
Assign: CPU
RefSwitch: GPU CPU
ScatterUpdate: CPU
AssignAdd: CPU
[[Node: ppo_temporary/episodes/Variable = VariableV2container="", dtype=DT_INT32, shape=[10], shared_name="", _device="/device:GPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 163, in
tf.app.run()
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 145, in main
for score in train(config, FLAGS.env_processes):
File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 127, in train
utility.initialize_variables(sess, saver, config.logdir)
File "E:\Code\PythonScripts\DeepRL\BatchPPO\agents\scripts\utility.py", line 116, in initialize_variables
tf.global_variables_initializer()))
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
run_metadata_ptr)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
options, run_metadata)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'ppo_temporary/episodes/Variable': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Switch: GPU CPU
VariableV2: CPU
Identity: CPU
Assign: CPU
RefSwitch: GPU CPU
ScatterUpdate: CPU
AssignAdd: CPU
[[Node: ppo_temporary/episodes/Variable = VariableV2container="", dtype=DT_INT32, shape=[10], shared_name="", _device="/device:GPU:0"]]

Caused by op 'ppo_temporary/episodes/Variable', defined at:
File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 163, in
tf.app.run()
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 145, in main
for score in train(config, FLAGS.env_processes):
File "E:/Code/PythonScripts/DeepRL/BatchPPO/agents/scripts/train.py", line 113, in train
batch_env, config.algorithm, config)
File "E:\Code\PythonScripts\DeepRL\BatchPPO\agents\scripts\utility.py", line 48, in define_simulation_graph
algo = algo_cls(batch_env, step, is_training, should_log, config)
File "E:\Code\PythonScripts\DeepRL\BatchPPO\agents\ppo\algorithm.py", line 78, in init
template, len(batch_env), config.max_length, 'episodes')
File "E:\Code\PythonScripts\DeepRL\BatchPPO\agents\ppo\memory.py", line 44, in init
self._length = tf.Variable(tf.zeros(capacity, tf.int32), False)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\ops\variables.py", line 213, in init
constraint=constraint)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\ops\variables.py", line 331, in _init_from_args
name=name)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\ops\state_ops.py", line 133, in variable_op_v2
shared_name=shared_name)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 926, in _variable_v2
shared_name=shared_name, name=name)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\framework\ops.py", line 2956, in create_op
op_def=op_def)
File "D:\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\framework\ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'ppo_temporary/episodes/Variable': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Switch: GPU CPU
VariableV2: CPU
Identity: CPU
Assign: CPU
RefSwitch: GPU CPU
ScatterUpdate: CPU
AssignAdd: CPU
[[Node: ppo_temporary/episodes/Variable = VariableV2container="", dtype=DT_INT32, shape=[10], shared_name="", _device="/device:GPU:0"]]`

It seems that tensorflow does not allow assign an int type variable on GPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions