blockio: update package description, readme, add example

jorisdral · jorisdral · commit fd135cf4f47f · 2025-07-07T12:56:09.000+02:00
diff --git a/blockio/README.md b/blockio/README.md
@@ -1,8 +1,49 @@
 # blockio
 
-This packages defines an abstract interface for batched, asynchronous I\/O,
-for use with the abstract interface for file system I\/O defined by the
-[fs-api](https://hackage.haskell.org/package/fs-api) package.
+Perform batches of disk I/O operations. Performing batches of disk I/O can lead
+to performance improvements over performing each disk I/O operation
+individually. Performing batches of disk I/O *concurrently* can lead to an even
+bigger performance improvement depending on the implementation of batched I/O.
 
-The /sim/ sub-library of this package defines /simulated/ batched, asynchronous I\/O
-for use with the [fs-sim](https://hackage.haskell.org/package/fs-sim) package.
+The batched I/O functionality in the library is separated into an *abstract
+interface* and *implementations* of that abstract interface. The advantage of
+programming against an abstract interface is that code can be agnostic to the
+implementation of the interface, allowing implementations to be freely swapped
+out. The library provides multiple implementations of batched I/O:
+platform-dependent implementations using the *real* file system (using
+asynchronous I/O), and a simulated implementation for testing purposes.
+
+See the `System.FS.BlockIO` module for an example of how to use the library.
+
+On Linux systems the *real* implementation is backed by
+[blockio-uring](https://hackage.haskell.org/package/blockio-uring), a library
+for asynchronous I/O that achieves good performance when performing batches
+concurrently. On Windows and MacOS systems the *real* implementation currently
+simply performs each I/O operation sequentially, which should achieve about the
+same performance as using non-batched I/O, but the library could be extended
+with asynchronous I/O implementations for Windows and MacOS as well. The
+simulated implementation also performs each I/O operation sequentially.
+
+As mentioned before, the batched I/O functionality is separated into an
+*abstract interface* and *implementations* of that abstract interface. The
+advantage of programming against an abstract interface is that code can be
+agnostic to the implementation of the interface. For example, we could run code
+in production using the real file system, but we could also run the same code in
+a testing environment using a simulated file system. We could even switch from a
+default implementation to a more performant implementation in production if the
+performant implementation is available. Lastly, the abstract interface allows us
+to program against the file system in a uniform manner across different
+platforms, i.e., operating systems.
+
+The `blockio` library defines the abstract interface for batched I/O. The
+library is an extension of the
+[fs-api](https://hackage.haskell.org/package/fs-api) library, which defines an
+abstract interface for (basic) file system I/O. Both `blockio` and `fs-api`
+provide an implementation of their interfaces using the real file system in
+`IO`.
+
+The `blockio:sim` sub-library defines an implementation of the abstract
+interface from `blockio` that *simulates* batched I/O. This sub-library is an
+extension of the [fs-sim](https://hackage.haskell.org/package/fs-sim) library,
+which defines an implementation of the abstract interface from `fs-api` that
+simulates (basic) file system I/O.
diff --git a/blockio/blockio.cabal b/blockio/blockio.cabal
@@ -1,14 +1,23 @@
 cabal-version:   3.4
 name:            blockio
 version:         0.1.0.0
-synopsis:        Abstract interface for batched, asynchronous I/O
+synopsis:        Perform batches of disk I/O operations.
 description:
-  This packages defines an abstract interface for batched, asynchronous I\/O,
-  for use with the abstract interface for file system I\/O defined by the
-  [fs-api](https://hackage.haskell.org/package/fs-api) package.
-
-  The /sim/ sub-library of this package defines /simulated/ batched, asynchronous I\/O
-  for use with the [fs-sim](https://hackage.haskell.org/package/fs-sim) package.
+  Perform batches of disk I\/O operations. Performing batches of disk I\/O can
+  lead to performance improvements over performing each disk I\/O operation
+  individually. Performing batches of disk I\/O /concurrently/ can lead to an
+  even bigger performance improvement depending on the implementation of batched
+  I\/O.
+
+  The batched I\/O functionality in the library is separated into an /abstract/
+  /interface/ and /implementations/ of that abstract interface. The advantage of
+  programming against an abstract interface is that code can be agnostic to the
+  implementation of the interface, allowing implementations to be freely swapped
+  out. The library provides multiple implementations of batched I\/O:
+  platform-dependent implementations using the /real/ file system (with
+  asynchronous I\/O), and a simulated implementation for testing purposes.
+
+  See the "System.FS.BlockIO" module for an example of how to use the library.
 
 license:         Apache-2.0
 license-files:
@@ -65,6 +74,7 @@ library
   import:          language, warnings
   hs-source-dirs:  src
   exposed-modules:
+    System.FS.BlockIO
     System.FS.BlockIO.API
     System.FS.BlockIO.IO
     System.FS.BlockIO.Serial.Internal
diff --git a/blockio/src-macos/System/FS/BlockIO/Internal.hs b/blockio/src-macos/System/FS/BlockIO/Internal.hs
@@ -14,7 +14,7 @@ import qualified System.Posix.Files as Unix
 import qualified System.Posix.Unistd as Unix
 
 -- | For now we use the portable serial implementation of HasBlockIO. If you
--- want to provide a proper async I/O implementation for OSX, then this is where
+-- want to provide a proper async I\/O implementation for OSX, then this is where
 -- you should put it.
 --
 -- The recommended choice would be to use the POSIX AIO API.
diff --git a/blockio/src-sim/System/FS/BlockIO/Sim.hs b/blockio/src-sim/System/FS/BlockIO/Sim.hs
@@ -92,7 +92,7 @@ unsafeFromHasFS hfs =
       hfs
   where
     -- TODO: It should be possible for the implementations and simulation to
-    -- throw an FsError when doing file I/O with misaligned byte arrays after
+    -- throw an FsError when doing file I\/O with misaligned byte arrays after
     -- hSetNoCache. Maybe they should? It might be nicest to move hSetNoCache
     -- into fs-api and fs-sim because we'd need access to the internals.
     hSetNoCache _h _b = pure ()
diff --git a/blockio/src-windows/System/FS/BlockIO/Internal.hs b/blockio/src-windows/System/FS/BlockIO/Internal.hs
@@ -17,7 +17,7 @@ import qualified System.Win32.File as Windows
 import qualified System.Win32.HardLink as Windows
 
 -- | For now we use the portable serial implementation of HasBlockIO. If you
--- want to provide a proper async I/O implementation for Windows, then this is
+-- want to provide a proper async I\/O implementation for Windows, then this is
 -- where you should put it.
 --
 -- The recommended choice would be to use the Win32 IOCP API.
diff --git a/blockio/src/System/FS/BlockIO.hs b/blockio/src/System/FS/BlockIO.hs
@@ -0,0 +1,195 @@
+module System.FS.BlockIO (
+    -- * Description
+    -- $description
+
+    -- * Re-exports
+    module System.FS.BlockIO.API
+  , module System.FS.BlockIO.IO
+
+    -- * Example
+    -- $example
+) where
+
+import           System.FS.BlockIO.API
+import           System.FS.BlockIO.IO
+
+{-------------------------------------------------------------------------------
+  Examples
+-------------------------------------------------------------------------------}
+
+{- $description
+
+  The 'HasBlockIO' record type defines an /abstract interface/. A value of a
+  'HasBlockIO' type is what we call an /instance/ of the abstract interface, and
+  an instance is produced by a function that we call an /implementation/. In
+  principle, we can have multiple instances of the same implementation.
+
+  There are currently two known implementations of the interface:
+
+  * An implementation using the real file system, which can be found in the
+    "System.FS.BlockIO.IO" module. This implementation is platform-dependent,
+    but has largely similar observable behaviour.
+
+  * An implementation using a simulated file system, which can be found in the
+    @System.FS.BlockIO.Sim@ module of the @blockio:sim@ sublibrary. This
+    implementation is uniform across platforms.
+
+  The 'HasBlockIO' abstract interface is an extension of the 'HasFS' abstract
+  interface that is provided by the
+  [@fs-api@](https://hackage.haskell.org/package/fs-api) package. Whereas
+  'HasFS' defines many primitive functions, for example for opening a file, the
+  main feature of 'HasBlockIO' is to define a function for performing batched
+  I\/O. As such, users of @blockio@ will more often than not need to pass both a
+  'HasFS' and a 'HasBlockIO' instance to their functions.
+-}
+
+{- $example
+
+  >>> import Control.Monad
+  >>> import Control.Monad.Primitive
+  >>> import Data.Primitive.ByteArray
+  >>> import Data.Vector qualified as V
+  >>> import Data.Word
+  >>> import Debug.Trace
+  >>> import System.FS.API as FS
+  >>> import System.FS.BlockIO.IO
+  >>> import System.FS.BlockIO.API
+  >>> import System.FS.IO
+
+  The main feature of the 'HasBlockIO' abstract interface is that it provides a
+  function for performing batched I\/O using 'submitIO'. Depending on the
+  implementation of the interface, performing I\/O in batches concurrently using
+  'submitIO' can be much faster than performing each I\/O operation in a
+  sequential order. We will not go into detail about this performance
+  consideration here, but more information can be found in the
+  "System.FS.BlockIO.IO" module. Instead, we will show an example of how
+  'submitIO' can be used in your own projects.
+
+  We aim to build an example that writes some contents to a file using
+  'submitIO', and then reads the contents out again using 'submitIO'. The file
+  contents will simply be bytes.
+
+  >>> type Byte = Word8
+
+  The first part of the example is to write out bytes to a given file path using
+  'submitIO'. We define a @writeFile@ function that does just that. The file is
+  assumed to not exist already.
+
+  The bytes, which are provided as separate bytes, are written into a buffer (a
+  mutable byte array). Note that the buffer should be /pinned/ memory to prevent
+  pointer aliasing. In the case of write operations, this buffer is used to
+  communicate to the backend what the bytes are that should be written to disk.
+  For simplicity, we create a separate 'IOOpWrite' instruction for each byte.
+  This instruction requires information about the write operation. In order of
+  appearence these are: the file handle to write bytes to, the offset into that
+  file, the buffer, the offset into that buffer, and the number of bytes to
+  write. Finally, all instructions are batched together and submitted in one go
+  using 'submitIO'. For each instruction, an 'IOResult' is returned, which
+  describes in this case the number of written bytes. If any of the instructions
+  failed to be performed, an error is thrown. We print the 'IOResult's to
+  standard output.
+
+  Note that in real scenarios it would be much more performant to aggregate the
+  bytes into larger chunks, and to create an instruction for each of those
+  chunks. A sensible size for those chunks would be the disk page size (4Kb for
+  example), or a multiple of that disk page size. The disk page size is
+  typically the smallest chunk of memory that can be written to or read from the
+  disk. In some cases it is also desirable or even required that the buffers are
+  aligned to the disk page size. For example, alignment is required when using
+  direct I\/O.
+
+  >>> :{
+  writeFile ::
+       HasFS IO HandleIO
+    -> HasBlockIO IO HandleIO
+    -> FsPath
+    -> [Byte]
+    -> IO ()
+  writeFile hasFS hasBlockIO file bytes = do
+      let numBytes = length bytes
+      FS.withFile hasFS file (FS.WriteMode FS.MustBeNew) $ \h -> do
+        buffer <- newPinnedByteArray numBytes
+        forM_ (zip [0..] bytes) $ \(i, byte) ->
+          let bufferOffset = fromIntegral i
+          in  writeByteArray @Byte buffer bufferOffset byte
+        results <- submitIO hasBlockIO $ V.fromList [
+            IOOpWrite h fileOffset buffer bufferOffset 1
+          | i <- take numBytes [0..]
+          , let fileOffset = fromIntegral i
+                bufferOffset = FS.BufferOffset i
+          ]
+        print results
+  :}
+
+  The second part of the example is to read a given number of bytes from a given
+  file path using 'submitIO'. We define a @readFile@ function that follows the
+  same general structure and behaviour as @writeFile@, but @readFile@ is of
+  course reading bytes instead of writing bytes.
+
+  >>> :{
+  readFile ::
+       HasFS IO HandleIO
+    -> HasBlockIO IO HandleIO
+    -> FsPath
+    -> Int
+    -> IO [Byte]
+  readFile hasFS hasBlockIO file numBytes = do
+      FS.withFile hasFS file FS.ReadMode $ \h -> do
+        buffer <- newPinnedByteArray numBytes
+        results <- submitIO hasBlockIO $ V.fromList [
+            IOOpRead h fileOffset buffer bufferOffset numBytes
+          | i <- [0..3]
+          , let fileOffset = fromIntegral i
+                bufferOffset = FS.BufferOffset i
+                numBytes = 1
+          ]
+        print results
+        forM (take numBytes [0..]) $ \i ->
+          let bufferOffset = i
+          in  readByteArray @Byte buffer i
+  :}
+
+  Now we can combine @writeFile@ and @readFile@ into a very small example called
+  @writeReadFile@, which does what we set out to do: write a few bytes to a
+  (temporary) file and read them out again using 'submitIO'. We also print the
+  bytes that were written and the bytes that were read, so that the user can
+  check by hand whether the bytes match.
+
+  >>> :{
+  writeReadFile :: HasFS IO HandleIO -> HasBlockIO IO HandleIO -> IO ()
+  writeReadFile hasFS hasBlockIO = do
+      let file = FS.mkFsPath ["simple_example.txt"]
+      let bytesIn = [1,2,3,4]
+      print bytesIn
+      writeFile hasFS hasBlockIO file bytesIn
+      bytesOut <- readFile hasFS hasBlockIO file 4
+      print bytesOut
+      FS.removeFile hasFS file
+  :}
+
+  In order to run @writeReadFile@, we will need 'HasFS' and 'HasBlockIO'
+  instances. This is where the separation between interface and implementation
+  shines: @writeReadFile@ is agnostic to the implementations of the the abstract
+  interfaces, so we could pick any implementations and slot them in. For this
+  example we will use the /real/ implementation from "System.FS.BlockIO.IO", but
+  we could have used the /simulated/ implementation from the @blockio:sim@
+  sub-library just as well. We define the @example@ function, which uses
+  'withIOHasBlockIO' to instantiate both a 'HasFS' and 'HasBlockIO' instance,
+  which we pass to 'writeReadFile'.
+
+  >>> :{
+    example :: IO ()
+    example =
+      withIOHasBlockIO (MountPoint "") defaultIOCtxParams $ \hasFS hasBlockIO ->
+        writeReadFile hasFS hasBlockIO
+  :}
+
+  Finally, we can run the example to produce some output. As we can see, the
+  input bytes match the output bytes.
+
+  >>> example
+  [1,2,3,4]
+  [IOResult 1,IOResult 1,IOResult 1,IOResult 1]
+  [IOResult 1,IOResult 1,IOResult 1,IOResult 1]
+  [1,2,3,4]
+-}
diff --git a/blockio/src/System/FS/BlockIO/API.hs b/blockio/src/System/FS/BlockIO/API.hs
@@ -65,24 +65,13 @@ import           Text.Printf
 -- again. Instead, the user should create a new instance of the interface.
 --
 -- Note: there are a bunch of functions in the interface that have nothing to do
--- with submitting large batches of I/O operations. In fact, only 'close' and
+-- with submitting large batches of I\/O operations. In fact, only 'close' and
 -- 'submitIO' are related to that. All other functions were put in this record
 -- for simplicity because the authors of the library needed them and it was more
 -- straightforward to add them here then to add them to @fs-api@. Still these
 -- unrelated functions could and should all be moved into @fs-api@ at some point
 -- in the future.
 --
--- === Implementations
---
--- There are currently two known implementations of the interface:
---
--- * An implementation using the real file system, which can be found in the
---   "System.FS.BlockIO.IO" module. This implementation is platform-dependent.
---
--- * An implementation using a simulated file system, which can be found in the
---   @System.FS.BlockIO.Sim@ module of the @blockio:sim@ sublibrary. This
---   implementation is uniform across platforms.
---
 data HasBlockIO m h = HasBlockIO {
     -- | (Idempotent) close the IO context that is required for running
     -- 'submitIO'.
@@ -206,6 +195,7 @@ ioopByteCount (IOOpWrite _ _ _ _ c) = c
 
 -- | Number of read/written bytes.
 newtype IOResult = IOResult ByteCount
+  deriving stock (Show, Eq)
   deriving newtype VP.Prim
 
 newtype instance VUM.MVector s IOResult = MV_IOResult (VP.MVector s IOResult)
diff --git a/blockio/src/System/FS/BlockIO/IO.hs b/blockio/src/System/FS/BlockIO/IO.hs
@@ -2,8 +2,9 @@
 --
 -- The implementation of the 'HasBlockIO' interface provided in this module is
 -- platform-dependent. Most importantly, on Linux, the implementation of
--- 'submitIO' is backed by @blockio-uring@: a library for asynchronous I/O. On
--- Windows and MacOS, the implementation of 'submitIO' only supports serial I/O.
+-- 'submitIO' is backed by @blockio-uring@: a library for asynchronous I\/O. On
+-- Windows and MacOS, the implementation of 'submitIO' only supports serial
+-- I\/O.
 module System.FS.BlockIO.IO (
     -- * Implementation details #impl#
     -- $impl
@@ -36,9 +37,9 @@ import           System.FS.IO (HandleIO, ioHasFS)
   reason, we include below some documentation about the effects of calling the
   interface functions on different platforms.
 
-  Note: if the @serialblockio@ Cabal flag is enabled, then the Linux implementation
-  uses a mocked context and serial I/O for 'close' and 'submitIO', just like the
-  MacOS and Windows implementations do.
+  Note: if the @serialblockio@ Cabal flag is enabled, then the Linux
+  implementation uses a mocked context and serial I\/O for 'close' and
+  'submitIO', just like the MacOS and Windows implementations do.
 
   [IO context]:  When an instance of the 'HasBlockIO' interface for Linux
     systems is initialised, an @io_uring@ context is created using the
@@ -57,11 +58,11 @@ import           System.FS.IO (HandleIO, ioHasFS)
     * MacOS: close the mocked context
     * Windows: close the mocked context
 
-  ['submitIO']: Submit a batch of I/O operations using:
+  ['submitIO']: Submit a batch of I\/O operations using:
 
     * Linux: the @submitIO@ function from the @blockio-uring@ package
-    * MacOS: serial I/O using a 'HasFS'
-    * Windows: serial I/O using a 'HasFS'
+    * MacOS: serial I\/O using a 'HasFS'
+    * Windows: serial I\/O using a 'HasFS'
 
   ['hSetNoCache']:
 
diff --git a/blockio/src/System/FS/BlockIO/Serial.hs b/blockio/src/System/FS/BlockIO/Serial.hs