Skip to content

Enable XLA support for Tensorflow #122

@Randl

Description

@Randl

XLA can significantly increase computation speed.

I tried to measure speed up, but unfortunately didn't manage to get significant results:

$ python3 benchmark_vgg.py --batch_size 4000
WARNING:tensorflow:From benchmark_vgg.py:184: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
2017-02-20 17:14:06.476874: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-02-20 17:14:06.476908: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-02-20 17:14:08.569973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Graphics Device
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:04:00.0
Total memory: 11.91GiB
Free memory: 11.63GiB
2017-02-20 17:14:08.570458: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x3eccad0
2017-02-20 17:14:09.183220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Graphics Device
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:41:00.0
Total memory: 11.91GiB
Free memory: 11.63GiB
2017-02-20 17:14:09.183512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1
2017-02-20 17:14:09.183570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0
2017-02-20 17:14:09.183633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
2017-02-20 17:14:09.183658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y N
2017-02-20 17:14:09.183675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   N Y
2017-02-20 17:14:09.183905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:04:00.0)
2017-02-20 17:14:09.184090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Graphics Device, pci bus id: 0000:41:00.0)
2017-02-20 17:14:09.749515: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 2 visible devices
2017-02-20 17:14:09.749669: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 96 visible devices
2017-02-20 17:14:09.794250: I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices:
2017-02-20 17:14:09.794375: I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): <undefined>, <undefined>
2017-02-20 17:14:09.794871: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 2 visible devices
2017-02-20 17:14:09.794890: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 96 visible devices
2017-02-20 17:14:09.826939: I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices:
2017-02-20 17:14:09.827028: I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): Graphics Device, Compute Capability 6.0
2017-02-20 17:14:09.827054: I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (1): Graphics Device, Compute Capability 6.0
2017-02-20 17:14:14.149286: step 0, duration = 0.000
2017-02-20 17:14:14.152526: step 10, duration = 0.000
2017-02-20 17:14:14.155913: step 20, duration = 0.000
2017-02-20 17:14:14.158968: step 30, duration = 0.000
2017-02-20 17:14:14.161953: step 40, duration = 0.000
2017-02-20 17:14:14.165289: step 50, duration = 0.001
2017-02-20 17:14:14.168046: step 60, duration = 0.000
2017-02-20 17:14:14.172249: step 70, duration = 0.000
2017-02-20 17:14:14.174981: step 80, duration = 0.000
2017-02-20 17:14:14.177259: step 90, duration = 0.000
2017-02-20 17:14:14.179223: Forward across 100 steps, 0.000 +/- 0.000 sec / batch
2017-02-20 17:14:15.127072: step 0, duration = 0.006
2017-02-20 17:14:15.193918: step 10, duration = 0.006
2017-02-20 17:14:15.258036: step 20, duration = 0.006
2017-02-20 17:14:15.311999: step 30, duration = 0.006
2017-02-20 17:14:15.364200: step 40, duration = 0.005
2017-02-20 17:14:15.416405: step 50, duration = 0.005
2017-02-20 17:14:15.470125: step 60, duration = 0.006
2017-02-20 17:14:15.508636: step 70, duration = 0.003
2017-02-20 17:14:15.542784: step 80, duration = 0.003
2017-02-20 17:14:15.576780: step 90, duration = 0.003
2017-02-20 17:14:15.607214: Forward-backward across 100 steps, 0.005 +/- 0.001 sec / batch

(I've used P100 for these measurements)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions