Skip to content

Errors about copying buffers when running on GPU #6

@chrismwendt

Description

@chrismwendt

I get errors about data being left on the device when running a Func on the GPU:

main :: IO ()
main = case gpuTarget of
  Nothing -> putStrLn "no GPU target found"
  Just gpu -> do
    runOnGpu <- compileForTarget gpu $ \(buffer "original" -> original) -> do
      y <- mkVar "y"
      x <- mkVar "x"
      c <- mkVar "c"
      brightened <- define "brightened" (c, x, y) $ original ! (c, x, y) + 50
      tileGpu brightened x y 16 16
      return brightened

    imgIn :: MutableImage RealWorld PixelRGB8 <- newMutableImage 256 256
    imgOut :: MutableImage RealWorld PixelRGB8 <- newMutableImage 256 256

    withHalideBuffer @3 @Word8 imgIn $ \imgInPtr -> do
      withHalideBuffer @3 @Word8 imgOut $ \imgOutPtr -> do
        runOnGpu imgInPtr imgOutPtr

    freezeImage imgOut >>= writePng "brightened.png"

tileGpu f x y w h = do
  xo <- mkVar "xo"
  yo <- mkVar "yo"
  xi <- mkVar "xi"
  yi <- mkVar "yi"
  split TailAuto x (xo, xi) w f
  split TailAuto y (yo, yi) h f
  reorder [xi, yi, xo, yo] f
  gpuBlocks DeviceDefaultGPU (xo, yo) f
  gpuThreads DeviceDefaultGPU (xi, yi) f
  return (xo, yo, xi, yi)

*** Exception: the Buffer still references data on the device; did you forget to call copyToHost?

Call copyToHost:

      tileGpu brightened x y 16 16
      copyToHost brightened
      return brightened

*** Exception: CppStdException e "Error: Func brightened$13 is scheduled as copy_to_host/device, but has value: ((uint8)original_im$13(c, x, y) + (uint8)50)\nExpected a single call to another Func with matching dimensionality and argument order.\n"(Just "std::runtime_error")

Replace with a wrapper Func:

      tileGpu brightened x y 16 16
      wrapper <- define "wrapper" (c, x, y) $ brightened ! (c, x, y)
      return wrapper

*** Exception: CppStdException e "Error: Cannot parallelize dimension x.xi of function brightened$20 because the function is scheduled inline.\n"(Just "std::runtime_error")

computeRoot brightened:

      tileGpu brightened x y 16 16
      computeRoot brightened
      wrapper <- define "wrapper" (c, x, y) $ brightened ! (c, x, y)
      copyToHost wrapper
      return wrapper

*** Exception: the Buffer still references data on the device; did you forget to call copyToHost?

Same error as before. Let's try withCopiedToHost which operates on the buffer:

    withHalideBuffer @3 @Word8 imgIn $ \imgInPtr -> do
      withHalideBuffer @3 @Word8 imgOut $ \imgOutPtr -> do
        runOnGpu imgInPtr imgOutPtr
        withCopiedToHost imgOutPtr $ return ()

*** Exception: the Buffer still references data on the device; did you forget to call copyToHost?

Same error as before.

Using realizeOnTarget does not throw an error, but the only way to get the data back seems to be via peekToList (slow):

    imgIn :: MutableImage RealWorld PixelRGB8 <- newMutableImage 256 256
    imgOut :: MutableImage RealWorld PixelRGB8 <- newMutableImage 256 256
    asBufferParam imgIn \original -> do
      y <- mkVar "y"
      x <- mkVar "x"
      c <- mkVar "c"
      brightened <- define "brightened" (c, x, y) $ original ! (c, x, y) + 100
      tileGpu brightened x y 16 16
      realizeOnTarget gpu brightened [3, 256, 256] $ \brightenedPtr -> do
        withCopiedToHost brightenedPtr do
          print =<< peekToList brightenedPtr
          -- could write list back to imgOut, but would be slow

Here's the definition of realizeOnTarget, which calls allocaBuffer internally:

realizeOnTarget target func shape action =
withFunc func $ \func' ->
withCxxTarget target $ \target' ->
allocaBuffer target shape $ \buf -> do
let raw = castPtr buf
[C.throwBlock| void {
handle_halide_exceptions([=](){
$(Halide::Func* func')->realize($(halide_buffer_t* raw), *$(const Halide::Target* target'));
});
} |]
action buf

Perhaps there could be another function like realizeOnTargetGivenBuffer which takes a Ptr (HalideBuffer n a) to use instead of allocating one itself. I think that would enable writing directly to imgOut.

Am I going about this in the right way, or am I missing something fundamental?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions