|
| 1 | +# AccessHandle Proposal |
| 2 | + |
| 3 | +## Authors: |
| 4 | + |
| 5 | +* Emanuel Krivoy ( [email protected]) |
| 6 | +* Richard Stotz ( [email protected]) |
| 7 | + |
| 8 | +## Participate |
| 9 | + |
| 10 | +* [Issue tracker](https://github.com/WICG/file-system-access/issues) |
| 11 | + |
| 12 | +## Table of Contents |
| 13 | + |
| 14 | +<!-- START doctoc generated TOC please keep comment here to allow auto update --> |
| 15 | +<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> |
| 16 | + |
| 17 | +- [Introduction](#introduction) |
| 18 | +- [Goals & Use Cases](#goals--use-cases) |
| 19 | +- [Non-goals](#non-goals) |
| 20 | +- [What makes the new surface fast?](#what-makes-the-new-surface-fast) |
| 21 | +- [Proposed API](#proposed-api) |
| 22 | + - [New data access surface](#new-data-access-surface) |
| 23 | + - [Locking semantics](#locking-semantics) |
| 24 | +- [Open Questions](#open-questions) |
| 25 | + - [Naming](#naming) |
| 26 | + - [Assurances on non-awaited consistency](#assurances-on-non-awaited-consistency) |
| 27 | +- [Appendix](#appendix) |
| 28 | + - [AccessHandle IDL](#accesshandle-idl) |
| 29 | +- [References & acknowledgements](#references--acknowledgements) |
| 30 | + |
| 31 | +<!-- END doctoc generated TOC please keep comment here to allow auto update --> |
| 32 | + |
| 33 | +## Introduction |
| 34 | + |
| 35 | +We propose augmenting the Origin Private File System (OPFS) with a new surface |
| 36 | +that brings very performant access to data. This new surface differs from |
| 37 | +existing ones by offering in-place and exclusive write access to a file’s |
| 38 | +content. This change, along with the ability to consistently read unflushed |
| 39 | +modifications and the availability of a synchronous variant on dedicated |
| 40 | +workers, significantly improves performance and unblocks new use cases for the |
| 41 | +File System Access API. |
| 42 | + |
| 43 | +More concretely, we would add a *createAccessHandle()* method to the |
| 44 | +*FileSystemFileHandle* object. It would return an *AccessHandle* that contains |
| 45 | +a [duplex stream](https://streams.spec.whatwg.org/#other-specs-duplex) and |
| 46 | +auxiliary methods. The readable/writable pair in the duplex stream communicates |
| 47 | +with the same backing file, allowing the user to read unflushed contents. |
| 48 | +Another new method, *createSyncAccessHandle()*, would only be exposed on Worker |
| 49 | +threads. This method would offer a more buffer-based surface with synchronous |
| 50 | +reading and writing. The creation of AccessHandle also creates a lock that |
| 51 | +prevents write access to the file across (and within the same) execution |
| 52 | +contexts. |
| 53 | + |
| 54 | +This proposal is part of our effort to integrate [Storage Foundation |
| 55 | +API](https://github.com/WICG/storage-foundation-api-explainer) into File System |
| 56 | +Access API. For more context the origins of this proposal, and alternatives |
| 57 | +considered, please check out: [Merging Storage Foundation API and the Origin |
| 58 | +Private File |
| 59 | +System](https://docs.google.com/document/d/121OZpRk7bKSF7qU3kQLqAEUVSNxqREnE98malHYwWec), |
| 60 | +[Recommendation for Augmented |
| 61 | +OPFS](https://docs.google.com/document/d/1g7ZCqZ5NdiU7oqyCpsc2iZ7rRAY1ZXO-9VoG4LfP7fM). |
| 62 | + |
| 63 | +## Goals & Use Cases |
| 64 | + |
| 65 | +Our goal is to give developers flexibility by providing generic, simple, and |
| 66 | +performant primitives upon which they can build higher-level storage |
| 67 | +components. The new surface is particularly well suited for Wasm-based |
| 68 | +libraries and applications that want to use custom storage algorithms to |
| 69 | +fine-tune execution speed and memory usage. |
| 70 | + |
| 71 | +A few examples of what could be done with *AccessHandles*: |
| 72 | + |
| 73 | +* Distribute a performant Wasm port of SQLite. This gives developers the |
| 74 | + ability to use a persistent and fast SQL engine without having to rely on |
| 75 | + the deprecated WebSQL API. |
| 76 | +* Allow a music production website to operate on large amounts of media, by |
| 77 | + relying on the new surface's performance and direct buffered access to |
| 78 | + offload sound segments to disk instead of holding them in memory. |
| 79 | +* Provide a fast and persistent [Emscripten](https://emscripten.org/) |
| 80 | + filesystem to act as generic and easily accessible storage for Wasm. |
| 81 | + |
| 82 | +## Non-goals |
| 83 | + |
| 84 | +This proposal is focused only on additions to the [Origin Private File |
| 85 | +System](https://wicg.github.io/file-system-access/#sandboxed-filesystem), and |
| 86 | +doesn't currently consider changes to the rest of File System Access API or how |
| 87 | +files in the host machine are accessed. |
| 88 | + |
| 89 | +## What makes the new surface fast? |
| 90 | + |
| 91 | +There are a few design choices that primarily contribute to the performance of |
| 92 | +AccessHandles: |
| 93 | + |
| 94 | +* Write operations are not guaranteed to be immediately persistent, rather |
| 95 | + persistency is achieved through calls to *flush()*. At the same time, data |
| 96 | + can be consistently read before flushing. This allows applications to only |
| 97 | + schedule time consuming flushes when they are required for long-term data |
| 98 | + storage, and not as a precondition to operate on recently written data. |
| 99 | +* The exclusive write lock held by the AccessHandle saves implementations |
| 100 | + from having to provide a central data access point across execution |
| 101 | + contexts. In multi-process browsers, such as Chrome, this helps avoid costly |
| 102 | + inter-process communication (IPCs) between renderer and browser processes. |
| 103 | +* Data copies are avoided when reading or writing. In the async surface this |
| 104 | + is achieved through SharedArrayBuffers and BYOB readers. In the sync |
| 105 | + surface, we rely on user-allocated buffers to hold the data. |
| 106 | + |
| 107 | +For more information on what affects the performance of similar storage APIs, |
| 108 | +see [Design considerations for the Storage Foundation |
| 109 | +API](https://docs.google.com/document/d/1cOdnvuNIWWyJHz1uu8K_9DEgntMtedxfCzShI7d01cs) |
| 110 | + |
| 111 | +## Proposed API |
| 112 | + |
| 113 | +### New data access surface |
| 114 | + |
| 115 | +```javascript |
| 116 | +// In all contexts |
| 117 | +// For details on the `mode` parameter see "Exposing AccessHandles on all |
| 118 | +// filesystems" below |
| 119 | +const handle = await file.createAccessHandle({ mode: 'in-place' }); |
| 120 | +await handle.writable.getWriter().write(buffer); |
| 121 | +const reader = handle.readable.getReader({mode: "byob"}); |
| 122 | +// Assumes seekable streams, and SharedArrayBuffer support are available |
| 123 | +await reader.read(buffer, {at: 1}); |
| 124 | + |
| 125 | +// Only in a worker context |
| 126 | +const handle = await file.createSyncAccessHandle(); |
| 127 | +var writtenBytes = handle.write(buffer); |
| 128 | +var readBytes = handle.read(buffer {at: 1}); |
| 129 | +``` |
| 130 | + |
| 131 | +As mentioned above, a new *createAccessHandle()* method would be added to |
| 132 | +*FileSystemFileHandle*. Another method, *createSyncAccessHandle()*, would be |
| 133 | +only exposed on Worker threads. An IDL description of the new interface can be |
| 134 | +found in the [Appendix](#appendix). |
| 135 | + |
| 136 | +The reason for offering a Worker-only synchronous interface, is that consuming |
| 137 | +asynchronous APIs from Wasm has severe performance implications (more details |
| 138 | +[here](https://docs.google.com/document/d/1lsQhTsfcVIeOW80dr467Auud_VCeAUv2ZOkC63oSyKo)). |
| 139 | +Since this overhead is most impactful on methods that are called often, we've |
| 140 | +only made *read()* and *write()* synchronous. This allows us to keep a simpler |
| 141 | +mental model (where the sync and async handle are identical, except reading and |
| 142 | +writing) and reduce the number of new sync methods, while avoiding the most |
| 143 | +important perfomance penalties. |
| 144 | + |
| 145 | +This proposal assumes that [seekable |
| 146 | +streams](https://github.com/whatwg/streams/issues/1128) will be available. If |
| 147 | +this doesn’t happen, we can emulate the seeking behavior by extending the |
| 148 | +default reader and writer with a *seek()* method. |
| 149 | + |
| 150 | +### Locking semantics |
| 151 | + |
| 152 | +```javascript |
| 153 | +const handle1 = await file.createAccessHandle({ mode: 'in-place' }); |
| 154 | +try { |
| 155 | + const handle2 = await file.createAccessHandle({ mode: 'in-place' }); |
| 156 | +} catch(e) { |
| 157 | + // This catch will always be executed, since there is an open access handle |
| 158 | +} |
| 159 | +await handle1.close(); |
| 160 | +// Now a new access handle may be created |
| 161 | +``` |
| 162 | + |
| 163 | +*createAccessHandle()* would take an exclusive write lock on the file that |
| 164 | +prevents the creation of any other access handles or *WritableFileStreams*. |
| 165 | +Similarly *createWritable()* would take a shared write lock that blocks the |
| 166 | +creation of access handles, but not of other writable streams. This prevents |
| 167 | +the file from being modified from multiple contexts, while still being |
| 168 | +backwards compatible with the current OPFS spec and supporting multiple |
| 169 | +*WritableFileStreams* at once. |
| 170 | + |
| 171 | +Creating a [File](https://www.w3.org/TR/FileAPI/#dfn-file) through *getFile()* |
| 172 | +would be possible when a lock is in place. The returned File behaves as it |
| 173 | +currently does in OPFS i.e., it is invalidated if file contents are changed |
| 174 | +after it was created. It is worth noting that these Files could be used to |
| 175 | +observe changes done through the new API, even if a lock is still being held. |
| 176 | + |
| 177 | +## Open Questions |
| 178 | + |
| 179 | +### Naming |
| 180 | + |
| 181 | +The exact name of the new methods hasn’t been defined. The current placeholder |
| 182 | +for data access is *createAccessHandle()* and *createSyncAccessHandle()*. |
| 183 | +*createUnflushedStreams()* and *createDuplexStream()* have been suggested. |
| 184 | + |
| 185 | +### Exposing AccessHandles on all filesystems |
| 186 | + |
| 187 | +This proposal only currently considers additions to OPFS, but it would probably |
| 188 | +be worthwhile to expand the new functionality to arbitrary file handles. While |
| 189 | +the exact behavior of *AccessHandles* outside of OPFS would need to be defined |
| 190 | +in detail, it's almost certain that the one described in this proposal should |
| 191 | +not be the default. To avoid setting it as such, we propose adding an optional |
| 192 | +*mode* string parameter to *createAccessHandle()* and |
| 193 | +*createSyncAccessHandle()*. Some possible values *mode* could take are: |
| 194 | + |
| 195 | +* 'shared': The current behavior seen in File System Access API in general, |
| 196 | + there is no locking and modifications are atomic (meaning that they would |
| 197 | + only actually change the file when the *AccessHandle* is closed). This mode |
| 198 | + would be a safe choice as a default value. |
| 199 | +* 'exclusive': An exclusive write lock is taken on the file, but modifications |
| 200 | + are still atomic. This is a useful mode for developers that want to |
| 201 | + coordinate various writing threads but still want "all or nothing" writes. |
| 202 | +* 'in-place': The behavior described in this proposal, allowing developers to |
| 203 | + use high performance access to files at the cost of not having atomic writes. |
| 204 | + It's possible that this mode would only be allowed in OPFS. |
| 205 | + |
| 206 | +Both the naming and semantics of the *mode* parameter have to be more concretely |
| 207 | +defined. |
| 208 | + |
| 209 | +### Assurances on non-awaited consistency |
| 210 | + |
| 211 | +It would be possible to clearly specify the behavior of an immediate async read |
| 212 | +operation after a non-awaited write operation, by serializing file operations |
| 213 | +(as is currently done in Storage Foundation API). We should decide if this is |
| 214 | +convenient, both from a specification and performance point of view. |
| 215 | + |
| 216 | +## Appendix |
| 217 | + |
| 218 | +### AccessHandle IDL |
| 219 | + |
| 220 | +```webidl |
| 221 | +interface FileSystemFileHandle : FileSystemHandle { |
| 222 | + Promise<File> getFile(); |
| 223 | + Promise<FileSystemWritableFileStream> createWritable(optional FileSystemCreateWritableOptions options = {}); |
| 224 | +
|
| 225 | + Promise<FileSystemAccessHandle> createAccessHandle(optional FileSystemFileHandleCreateAccessHandleOptions options = {}); |
| 226 | + [Exposed=DedicatedWorker] |
| 227 | + Promise<FileSystemSyncAccessHandle> createSyncAccessHandle(optional FileSystemFileHandleCreateAccessHandleOptions options = {}); |
| 228 | +}; |
| 229 | +
|
| 230 | +dictionary FileSystemFileHandleCreateAccessHandleOptions { |
| 231 | + AccessHandleMode mode; |
| 232 | +}; |
| 233 | +
|
| 234 | +// For more details and possible modes, see "Exposing AccessHandles on all |
| 235 | +// filesystems" above |
| 236 | +enum AccessHandleMode { "in-place" }; |
| 237 | +
|
| 238 | +interface FileSystemAccessHandle { |
| 239 | + // Assumes seekable streams are available. The |
| 240 | + // Seekable extended attribute is ad-hoc notation for this proposal. |
| 241 | + [Seekable] readonly attribute WritableStream writable; |
| 242 | + [Seekable] readonly attribute ReadableStream readable; |
| 243 | +
|
| 244 | + // Resizes the file to be size bytes long. If size is larger than the current |
| 245 | + // size the file is padded with null bytes, otherwise it is truncated. |
| 246 | + Promise<undefined> truncate([EnforceRange] unsigned long long size); |
| 247 | + // Returns the current size of the file. |
| 248 | + Promise<unsigned long long> getSize(); |
| 249 | + // Persists the changes that have been written to disk |
| 250 | + Promise<undefined> flush(); |
| 251 | + // Flushes and closes the streams, then releases the lock on the file |
| 252 | + Promise<undefined> close(); |
| 253 | +}; |
| 254 | +
|
| 255 | +[Exposed=DedicatedWorker] |
| 256 | +interface FileSystemSyncAccessHandle { |
| 257 | + unsigned long long read([AllowShared] BufferSource buffer, |
| 258 | + FilesystemReadWriteOptions options); |
| 259 | + unsigned long long write([AllowShared] BufferSource buffer, |
| 260 | + FilesystemReadWriteOptions options); |
| 261 | +
|
| 262 | + Promise<undefined> truncate([EnforceRange] unsigned long long size); |
| 263 | + Promise<unsigned long long> getSize(); |
| 264 | + Promise<undefined> flush(); |
| 265 | + Promise<undefined> close(); |
| 266 | +}; |
| 267 | +
|
| 268 | +dictionary FilesystemReadWriteOptions { |
| 269 | + [EnforceRange] unsigned long long at; |
| 270 | +}; |
| 271 | +``` |
| 272 | + |
| 273 | +## References & acknowledgements |
| 274 | + |
| 275 | +Many thanks for valuable feedback and advice from: |
| 276 | + |
| 277 | +Domenic Denicola, Marijn Kruisselbrink, Victor Costan |
0 commit comments