Skip to content

Commit fd135cf

Browse files
committed
blockio: update package description, readme, add example
1 parent 254b95a commit fd135cf

File tree

9 files changed

+273
-36
lines changed

9 files changed

+273
-36
lines changed

blockio/README.md

Lines changed: 46 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,49 @@
11
# blockio
22

3-
This packages defines an abstract interface for batched, asynchronous I\/O,
4-
for use with the abstract interface for file system I\/O defined by the
5-
[fs-api](https://hackage.haskell.org/package/fs-api) package.
3+
Perform batches of disk I/O operations. Performing batches of disk I/O can lead
4+
to performance improvements over performing each disk I/O operation
5+
individually. Performing batches of disk I/O *concurrently* can lead to an even
6+
bigger performance improvement depending on the implementation of batched I/O.
67

7-
The /sim/ sub-library of this package defines /simulated/ batched, asynchronous I\/O
8-
for use with the [fs-sim](https://hackage.haskell.org/package/fs-sim) package.
8+
The batched I/O functionality in the library is separated into an *abstract
9+
interface* and *implementations* of that abstract interface. The advantage of
10+
programming against an abstract interface is that code can be agnostic to the
11+
implementation of the interface, allowing implementations to be freely swapped
12+
out. The library provides multiple implementations of batched I/O:
13+
platform-dependent implementations using the *real* file system (using
14+
asynchronous I/O), and a simulated implementation for testing purposes.
15+
16+
See the `System.FS.BlockIO` module for an example of how to use the library.
17+
18+
On Linux systems the *real* implementation is backed by
19+
[blockio-uring](https://hackage.haskell.org/package/blockio-uring), a library
20+
for asynchronous I/O that achieves good performance when performing batches
21+
concurrently. On Windows and MacOS systems the *real* implementation currently
22+
simply performs each I/O operation sequentially, which should achieve about the
23+
same performance as using non-batched I/O, but the library could be extended
24+
with asynchronous I/O implementations for Windows and MacOS as well. The
25+
simulated implementation also performs each I/O operation sequentially.
26+
27+
As mentioned before, the batched I/O functionality is separated into an
28+
*abstract interface* and *implementations* of that abstract interface. The
29+
advantage of programming against an abstract interface is that code can be
30+
agnostic to the implementation of the interface. For example, we could run code
31+
in production using the real file system, but we could also run the same code in
32+
a testing environment using a simulated file system. We could even switch from a
33+
default implementation to a more performant implementation in production if the
34+
performant implementation is available. Lastly, the abstract interface allows us
35+
to program against the file system in a uniform manner across different
36+
platforms, i.e., operating systems.
37+
38+
The `blockio` library defines the abstract interface for batched I/O. The
39+
library is an extension of the
40+
[fs-api](https://hackage.haskell.org/package/fs-api) library, which defines an
41+
abstract interface for (basic) file system I/O. Both `blockio` and `fs-api`
42+
provide an implementation of their interfaces using the real file system in
43+
`IO`.
44+
45+
The `blockio:sim` sub-library defines an implementation of the abstract
46+
interface from `blockio` that *simulates* batched I/O. This sub-library is an
47+
extension of the [fs-sim](https://hackage.haskell.org/package/fs-sim) library,
48+
which defines an implementation of the abstract interface from `fs-api` that
49+
simulates (basic) file system I/O.

blockio/blockio.cabal

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,23 @@
11
cabal-version: 3.4
22
name: blockio
33
version: 0.1.0.0
4-
synopsis: Abstract interface for batched, asynchronous I/O
4+
synopsis: Perform batches of disk I/O operations.
55
description:
6-
This packages defines an abstract interface for batched, asynchronous I\/O,
7-
for use with the abstract interface for file system I\/O defined by the
8-
[fs-api](https://hackage.haskell.org/package/fs-api) package.
9-
10-
The /sim/ sub-library of this package defines /simulated/ batched, asynchronous I\/O
11-
for use with the [fs-sim](https://hackage.haskell.org/package/fs-sim) package.
6+
Perform batches of disk I\/O operations. Performing batches of disk I\/O can
7+
lead to performance improvements over performing each disk I\/O operation
8+
individually. Performing batches of disk I\/O /concurrently/ can lead to an
9+
even bigger performance improvement depending on the implementation of batched
10+
I\/O.
11+
12+
The batched I\/O functionality in the library is separated into an /abstract/
13+
/interface/ and /implementations/ of that abstract interface. The advantage of
14+
programming against an abstract interface is that code can be agnostic to the
15+
implementation of the interface, allowing implementations to be freely swapped
16+
out. The library provides multiple implementations of batched I\/O:
17+
platform-dependent implementations using the /real/ file system (with
18+
asynchronous I\/O), and a simulated implementation for testing purposes.
19+
20+
See the "System.FS.BlockIO" module for an example of how to use the library.
1221

1322
license: Apache-2.0
1423
license-files:
@@ -65,6 +74,7 @@ library
6574
import: language, warnings
6675
hs-source-dirs: src
6776
exposed-modules:
77+
System.FS.BlockIO
6878
System.FS.BlockIO.API
6979
System.FS.BlockIO.IO
7080
System.FS.BlockIO.Serial.Internal

blockio/src-macos/System/FS/BlockIO/Internal.hs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ import qualified System.Posix.Files as Unix
1414
import qualified System.Posix.Unistd as Unix
1515

1616
-- | For now we use the portable serial implementation of HasBlockIO. If you
17-
-- want to provide a proper async I/O implementation for OSX, then this is where
17+
-- want to provide a proper async I\/O implementation for OSX, then this is where
1818
-- you should put it.
1919
--
2020
-- The recommended choice would be to use the POSIX AIO API.

blockio/src-sim/System/FS/BlockIO/Sim.hs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ unsafeFromHasFS hfs =
9292
hfs
9393
where
9494
-- TODO: It should be possible for the implementations and simulation to
95-
-- throw an FsError when doing file I/O with misaligned byte arrays after
95+
-- throw an FsError when doing file I\/O with misaligned byte arrays after
9696
-- hSetNoCache. Maybe they should? It might be nicest to move hSetNoCache
9797
-- into fs-api and fs-sim because we'd need access to the internals.
9898
hSetNoCache _h _b = pure ()

blockio/src-windows/System/FS/BlockIO/Internal.hs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ import qualified System.Win32.File as Windows
1717
import qualified System.Win32.HardLink as Windows
1818

1919
-- | For now we use the portable serial implementation of HasBlockIO. If you
20-
-- want to provide a proper async I/O implementation for Windows, then this is
20+
-- want to provide a proper async I\/O implementation for Windows, then this is
2121
-- where you should put it.
2222
--
2323
-- The recommended choice would be to use the Win32 IOCP API.

blockio/src/System/FS/BlockIO.hs

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
module System.FS.BlockIO (
2+
-- * Description
3+
-- $description
4+
5+
-- * Re-exports
6+
module System.FS.BlockIO.API
7+
, module System.FS.BlockIO.IO
8+
9+
-- * Example
10+
-- $example
11+
) where
12+
13+
import System.FS.BlockIO.API
14+
import System.FS.BlockIO.IO
15+
16+
{-------------------------------------------------------------------------------
17+
Examples
18+
-------------------------------------------------------------------------------}
19+
20+
{- $description
21+
22+
The 'HasBlockIO' record type defines an /abstract interface/. A value of a
23+
'HasBlockIO' type is what we call an /instance/ of the abstract interface, and
24+
an instance is produced by a function that we call an /implementation/. In
25+
principle, we can have multiple instances of the same implementation.
26+
27+
There are currently two known implementations of the interface:
28+
29+
* An implementation using the real file system, which can be found in the
30+
"System.FS.BlockIO.IO" module. This implementation is platform-dependent,
31+
but has largely similar observable behaviour.
32+
33+
* An implementation using a simulated file system, which can be found in the
34+
@System.FS.BlockIO.Sim@ module of the @blockio:sim@ sublibrary. This
35+
implementation is uniform across platforms.
36+
37+
The 'HasBlockIO' abstract interface is an extension of the 'HasFS' abstract
38+
interface that is provided by the
39+
[@fs-api@](https://hackage.haskell.org/package/fs-api) package. Whereas
40+
'HasFS' defines many primitive functions, for example for opening a file, the
41+
main feature of 'HasBlockIO' is to define a function for performing batched
42+
I\/O. As such, users of @blockio@ will more often than not need to pass both a
43+
'HasFS' and a 'HasBlockIO' instance to their functions.
44+
-}
45+
46+
{- $example
47+
48+
>>> import Control.Monad
49+
>>> import Control.Monad.Primitive
50+
>>> import Data.Primitive.ByteArray
51+
>>> import Data.Vector qualified as V
52+
>>> import Data.Word
53+
>>> import Debug.Trace
54+
>>> import System.FS.API as FS
55+
>>> import System.FS.BlockIO.IO
56+
>>> import System.FS.BlockIO.API
57+
>>> import System.FS.IO
58+
59+
The main feature of the 'HasBlockIO' abstract interface is that it provides a
60+
function for performing batched I\/O using 'submitIO'. Depending on the
61+
implementation of the interface, performing I\/O in batches concurrently using
62+
'submitIO' can be much faster than performing each I\/O operation in a
63+
sequential order. We will not go into detail about this performance
64+
consideration here, but more information can be found in the
65+
"System.FS.BlockIO.IO" module. Instead, we will show an example of how
66+
'submitIO' can be used in your own projects.
67+
68+
We aim to build an example that writes some contents to a file using
69+
'submitIO', and then reads the contents out again using 'submitIO'. The file
70+
contents will simply be bytes.
71+
72+
>>> type Byte = Word8
73+
74+
The first part of the example is to write out bytes to a given file path using
75+
'submitIO'. We define a @writeFile@ function that does just that. The file is
76+
assumed to not exist already.
77+
78+
The bytes, which are provided as separate bytes, are written into a buffer (a
79+
mutable byte array). Note that the buffer should be /pinned/ memory to prevent
80+
pointer aliasing. In the case of write operations, this buffer is used to
81+
communicate to the backend what the bytes are that should be written to disk.
82+
For simplicity, we create a separate 'IOOpWrite' instruction for each byte.
83+
This instruction requires information about the write operation. In order of
84+
appearence these are: the file handle to write bytes to, the offset into that
85+
file, the buffer, the offset into that buffer, and the number of bytes to
86+
write. Finally, all instructions are batched together and submitted in one go
87+
using 'submitIO'. For each instruction, an 'IOResult' is returned, which
88+
describes in this case the number of written bytes. If any of the instructions
89+
failed to be performed, an error is thrown. We print the 'IOResult's to
90+
standard output.
91+
92+
Note that in real scenarios it would be much more performant to aggregate the
93+
bytes into larger chunks, and to create an instruction for each of those
94+
chunks. A sensible size for those chunks would be the disk page size (4Kb for
95+
example), or a multiple of that disk page size. The disk page size is
96+
typically the smallest chunk of memory that can be written to or read from the
97+
disk. In some cases it is also desirable or even required that the buffers are
98+
aligned to the disk page size. For example, alignment is required when using
99+
direct I\/O.
100+
101+
>>> :{
102+
writeFile ::
103+
HasFS IO HandleIO
104+
-> HasBlockIO IO HandleIO
105+
-> FsPath
106+
-> [Byte]
107+
-> IO ()
108+
writeFile hasFS hasBlockIO file bytes = do
109+
let numBytes = length bytes
110+
FS.withFile hasFS file (FS.WriteMode FS.MustBeNew) $ \h -> do
111+
buffer <- newPinnedByteArray numBytes
112+
forM_ (zip [0..] bytes) $ \(i, byte) ->
113+
let bufferOffset = fromIntegral i
114+
in writeByteArray @Byte buffer bufferOffset byte
115+
results <- submitIO hasBlockIO $ V.fromList [
116+
IOOpWrite h fileOffset buffer bufferOffset 1
117+
| i <- take numBytes [0..]
118+
, let fileOffset = fromIntegral i
119+
bufferOffset = FS.BufferOffset i
120+
]
121+
print results
122+
:}
123+
124+
The second part of the example is to read a given number of bytes from a given
125+
file path using 'submitIO'. We define a @readFile@ function that follows the
126+
same general structure and behaviour as @writeFile@, but @readFile@ is of
127+
course reading bytes instead of writing bytes.
128+
129+
>>> :{
130+
readFile ::
131+
HasFS IO HandleIO
132+
-> HasBlockIO IO HandleIO
133+
-> FsPath
134+
-> Int
135+
-> IO [Byte]
136+
readFile hasFS hasBlockIO file numBytes = do
137+
FS.withFile hasFS file FS.ReadMode $ \h -> do
138+
buffer <- newPinnedByteArray numBytes
139+
results <- submitIO hasBlockIO $ V.fromList [
140+
IOOpRead h fileOffset buffer bufferOffset numBytes
141+
| i <- [0..3]
142+
, let fileOffset = fromIntegral i
143+
bufferOffset = FS.BufferOffset i
144+
numBytes = 1
145+
]
146+
print results
147+
forM (take numBytes [0..]) $ \i ->
148+
let bufferOffset = i
149+
in readByteArray @Byte buffer i
150+
:}
151+
152+
Now we can combine @writeFile@ and @readFile@ into a very small example called
153+
@writeReadFile@, which does what we set out to do: write a few bytes to a
154+
(temporary) file and read them out again using 'submitIO'. We also print the
155+
bytes that were written and the bytes that were read, so that the user can
156+
check by hand whether the bytes match.
157+
158+
>>> :{
159+
writeReadFile :: HasFS IO HandleIO -> HasBlockIO IO HandleIO -> IO ()
160+
writeReadFile hasFS hasBlockIO = do
161+
let file = FS.mkFsPath ["simple_example.txt"]
162+
let bytesIn = [1,2,3,4]
163+
print bytesIn
164+
writeFile hasFS hasBlockIO file bytesIn
165+
bytesOut <- readFile hasFS hasBlockIO file 4
166+
print bytesOut
167+
FS.removeFile hasFS file
168+
:}
169+
170+
In order to run @writeReadFile@, we will need 'HasFS' and 'HasBlockIO'
171+
instances. This is where the separation between interface and implementation
172+
shines: @writeReadFile@ is agnostic to the implementations of the the abstract
173+
interfaces, so we could pick any implementations and slot them in. For this
174+
example we will use the /real/ implementation from "System.FS.BlockIO.IO", but
175+
we could have used the /simulated/ implementation from the @blockio:sim@
176+
sub-library just as well. We define the @example@ function, which uses
177+
'withIOHasBlockIO' to instantiate both a 'HasFS' and 'HasBlockIO' instance,
178+
which we pass to 'writeReadFile'.
179+
180+
>>> :{
181+
example :: IO ()
182+
example =
183+
withIOHasBlockIO (MountPoint "") defaultIOCtxParams $ \hasFS hasBlockIO ->
184+
writeReadFile hasFS hasBlockIO
185+
:}
186+
187+
Finally, we can run the example to produce some output. As we can see, the
188+
input bytes match the output bytes.
189+
190+
>>> example
191+
[1,2,3,4]
192+
[IOResult 1,IOResult 1,IOResult 1,IOResult 1]
193+
[IOResult 1,IOResult 1,IOResult 1,IOResult 1]
194+
[1,2,3,4]
195+
-}

blockio/src/System/FS/BlockIO/API.hs

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -65,24 +65,13 @@ import Text.Printf
6565
-- again. Instead, the user should create a new instance of the interface.
6666
--
6767
-- Note: there are a bunch of functions in the interface that have nothing to do
68-
-- with submitting large batches of I/O operations. In fact, only 'close' and
68+
-- with submitting large batches of I\/O operations. In fact, only 'close' and
6969
-- 'submitIO' are related to that. All other functions were put in this record
7070
-- for simplicity because the authors of the library needed them and it was more
7171
-- straightforward to add them here then to add them to @fs-api@. Still these
7272
-- unrelated functions could and should all be moved into @fs-api@ at some point
7373
-- in the future.
7474
--
75-
-- === Implementations
76-
--
77-
-- There are currently two known implementations of the interface:
78-
--
79-
-- * An implementation using the real file system, which can be found in the
80-
-- "System.FS.BlockIO.IO" module. This implementation is platform-dependent.
81-
--
82-
-- * An implementation using a simulated file system, which can be found in the
83-
-- @System.FS.BlockIO.Sim@ module of the @blockio:sim@ sublibrary. This
84-
-- implementation is uniform across platforms.
85-
--
8675
data HasBlockIO m h = HasBlockIO {
8776
-- | (Idempotent) close the IO context that is required for running
8877
-- 'submitIO'.
@@ -206,6 +195,7 @@ ioopByteCount (IOOpWrite _ _ _ _ c) = c
206195

207196
-- | Number of read/written bytes.
208197
newtype IOResult = IOResult ByteCount
198+
deriving stock (Show, Eq)
209199
deriving newtype VP.Prim
210200

211201
newtype instance VUM.MVector s IOResult = MV_IOResult (VP.MVector s IOResult)

blockio/src/System/FS/BlockIO/IO.hs

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@
22
--
33
-- The implementation of the 'HasBlockIO' interface provided in this module is
44
-- platform-dependent. Most importantly, on Linux, the implementation of
5-
-- 'submitIO' is backed by @blockio-uring@: a library for asynchronous I/O. On
6-
-- Windows and MacOS, the implementation of 'submitIO' only supports serial I/O.
5+
-- 'submitIO' is backed by @blockio-uring@: a library for asynchronous I\/O. On
6+
-- Windows and MacOS, the implementation of 'submitIO' only supports serial
7+
-- I\/O.
78
module System.FS.BlockIO.IO (
89
-- * Implementation details #impl#
910
-- $impl
@@ -36,9 +37,9 @@ import System.FS.IO (HandleIO, ioHasFS)
3637
reason, we include below some documentation about the effects of calling the
3738
interface functions on different platforms.
3839
39-
Note: if the @serialblockio@ Cabal flag is enabled, then the Linux implementation
40-
uses a mocked context and serial I/O for 'close' and 'submitIO', just like the
41-
MacOS and Windows implementations do.
40+
Note: if the @serialblockio@ Cabal flag is enabled, then the Linux
41+
implementation uses a mocked context and serial I\/O for 'close' and
42+
'submitIO', just like the MacOS and Windows implementations do.
4243
4344
[IO context]: When an instance of the 'HasBlockIO' interface for Linux
4445
systems is initialised, an @io_uring@ context is created using the
@@ -57,11 +58,11 @@ import System.FS.IO (HandleIO, ioHasFS)
5758
* MacOS: close the mocked context
5859
* Windows: close the mocked context
5960
60-
['submitIO']: Submit a batch of I/O operations using:
61+
['submitIO']: Submit a batch of I\/O operations using:
6162
6263
* Linux: the @submitIO@ function from the @blockio-uring@ package
63-
* MacOS: serial I/O using a 'HasFS'
64-
* Windows: serial I/O using a 'HasFS'
64+
* MacOS: serial I\/O using a 'HasFS'
65+
* Windows: serial I\/O using a 'HasFS'
6566
6667
['hSetNoCache']:
6768

0 commit comments

Comments
 (0)