Skip to content

Commit 1c10cca

Browse files
authored
Merge pull request #213030 from jimmart-dev/jammart-blobfuse2-write-streaming
add blobfuse2 write streaming info
2 parents af9ae75 + b7bf77f commit 1c10cca

File tree

3 files changed

+64
-29
lines changed

3 files changed

+64
-29
lines changed

articles/storage/blobs/blobfuse2-configuration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: jimmart-dev
66
ms.service: storage
77
ms.subservice: blobs
88
ms.topic: how-to
9-
ms.date: 08/02/2022
9+
ms.date: 09/29/2022
1010
ms.author: jammart
1111
ms.reviewer: tamram
1212
---
@@ -43,7 +43,7 @@ Using a configuration file is the preferred method, but the other methods can be
4343

4444
## Configuration file
4545

46-
Creating a configuration file is the preferred method of establishing settings for BlobFuse2. Once you have provided the desired settings in the file, reference the configuration file when using the `blobfuse2 mount` or other commands. Example:
46+
Creating a configuration file is the preferred method of establishing settings for BlobFuse2. Once you have specified the desired settings in the file, reference the configuration file when using the `blobfuse2 mount` or other commands. Example:
4747

4848
````bash
4949
blobfuse2 mount ./mount --config-file=./config.yaml

articles/storage/blobs/blobfuse2-how-to-deploy.md

Lines changed: 42 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
---
2-
title: How to mount an Azure blob storage container on Linux with BlobFuse2 (preview) | Microsoft Docs
2+
title: How to mount an Azure Blob Storage container on Linux with BlobFuse2 (preview) | Microsoft Docs
33
titleSuffix: Azure Blob Storage
4-
description: How to mount an Azure blob storage container on Linux with BlobFuse2 (preview).
4+
description: How to mount an Azure Blob Storage container on Linux with BlobFuse2 (preview).
55
author: jammart
66
ms.service: storage
77
ms.subservice: blobs
88
ms.topic: how-to
9-
ms.date: 09/26/2022
9+
ms.date: 10/01/2022
1010
ms.author: jammart
1111
ms.reviewer: tamram
1212
---
1313

14-
# How to mount an Azure blob storage container on Linux with BlobFuse2 (preview)
14+
# How to mount an Azure Blob Storage container on Linux with BlobFuse2 (preview)
1515

16-
[BlobFuse2](blobfuse2-what-is.md) is a virtual file system driver for Azure Blob storage. BlobFuse2 allows you to access your existing Azure block blob data in your storage account through the Linux file system. For more details see [What is BlobFuse2? (preview)](blobfuse2-what-is.md).
16+
[BlobFuse2](blobfuse2-what-is.md) is a virtual file system driver for Azure Blob Storage. BlobFuse2 allows you to access your existing Azure block blob data in your storage account through the Linux file system. For more details see [What is BlobFuse2? (preview)](blobfuse2-what-is.md).
1717

1818
> [!IMPORTANT]
1919
> BlobFuse2 is the next generation of BlobFuse and is currently in preview.
20-
> This preview version is provided without a service level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
20+
> This preview version is provided without a service level agreement, and might not be suitable for production workloads. Certain features might not be supported or might have constrained capabilities.
2121
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
2222
>
2323
> If you need to use BlobFuse in a production environment, BlobFuse v1 is generally available (GA). For information about the GA version, see:
@@ -36,10 +36,10 @@ This guide shows you how to install and configure BlobFuse2, mount an Azure blob
3636

3737
There are 2 basic options for installing BlobFuse2:
3838

39-
1. [Install BlobFuse2 Binary](#option-1-install-blobfuse2-binary-preferred)
39+
1. [Install the BlobFuse2 Binary](#option-1-install-the-blobfuse2-binary-preferred)
4040
1. [Build it from source](#option-2-build-from-source)
4141

42-
### Option 1: Install BlobFuse2 Binary (preferred)
42+
### Option 1: Install the BlobFuse2 Binary (preferred)
4343

4444
For supported distributions see [the BlobFuse2 releases page](https://github.com/Azure/azure-storage-fuse/releases).
4545
For libfuse support information, refer to [the BlobFuse2 README](https://github.com/Azure/azure-storage-fuse/blob/main/README.md#distinctive-features-compared-to-blobfuse-v1x).
@@ -52,7 +52,7 @@ lsb_release -a
5252

5353
If there are no binaries available for your distribution, you can [build the binaries from source code](https://github.com/MicrosoftDocs/azure-docs-pr/pull/203174#option-2-build-from-source).
5454

55-
#### Install the BlobFuse2 binaries
55+
#### Install the BlobFuse2 binariesFstream
5656

5757
To install BlobFuse2:
5858

@@ -107,15 +107,15 @@ To build the BlobFuse2 binaries from source:
107107

108108
## Configure BlobFuse2
109109

110-
You can configure BlobFuse2 with a variety of settings. Some of the common settings used include:
110+
You can configure BlobFuse2 with a variety of settings. Some of the typical settings used include:
111111

112112
- Logging location and options
113-
- Temporary cache file path
113+
- Temporary file path for caching
114114
- Information about the Azure storage account and blob container to be mounted
115115

116-
The settings can be configured in a yaml configuration file, using environment variables, or as parameters passed to the BlobFuse2 commands. The preferred method is to use the yaml configuration file.
116+
The settings can be configured in a yaml configuration file, using environment variables, or as parameters passed to the BlobFuse2 commands. The preferred method is to use the configuration file.
117117

118-
For details about all of the configuration parameters for BlobFuse2, consult the complete reference material for each:
118+
For details about each of the configuration parameters for BlobFuse2 and how to specify them, consult the references below:
119119

120120
- [Complete BlobFuse2 configuration reference (preview)](blobfuse2-configuration.md)
121121
- [Configuration file reference (preview)](blobfuse2-configuration.md#configuration-file)
@@ -124,21 +124,43 @@ For details about all of the configuration parameters for BlobFuse2, consult the
124124

125125
The basic steps for configuring BlobFuse2 in preparation for mounting are:
126126

127-
1. [Configure a temporary path for caching or streaming](#configure-a-temporary-path-for-caching)
127+
1. [Configure caching](#configure-caching)
128128
1. [Create an empty directory for mounting the blob container](#create-an-empty-directory-for-mounting-the-blob-container)
129129
1. [Authorize access to your storage account](#authorize-access-to-your-storage-account)
130130

131-
### Configure a temporary path for caching
131+
### Configure caching
132132

133-
BlobFuse2 provides native-like performance by requiring a temporary path in the file system to buffer and cache any open files. For this temporary path, choose the most performant disk available, or use a ramdisk for the best performance.
133+
BlobFuse2 provides native-like performance by using local file-caching techniques. The caching configuration and behavior varies, depending on whether you are streaming large files or accessing smaller files.
134+
135+
#### Configure caching for streaming large files
136+
137+
BlobFuse2 supports streaming for both read and write operations as an alternative to disk caching for files. In streaming mode, BlobFuse2 caches blocks of large files in memory for both reading and writing. The configuration settings related to caching for streaming are under the `stream:` settings in your configuration file as follows:
138+
139+
```yml
140+
stream:
141+
block-size-mb:
142+
For read only mode, the size of each block to be cached in memory while streaming (in MB)
143+
For read/write mode: the size of newly created blocks
144+
max-buffers: The total number of buffers to store blocks in
145+
buffer-size-mb: The size for each buffer
146+
```
147+
148+
See [the sample streaming configuration file](https://github.com/Azure/azure-storage-fuse/blob/main/sampleStreamingConfig.yaml) to get started quickly with some settings for a basic streaming scenario.
149+
150+
#### Configure caching for smaller files
151+
152+
Smaller files are cached to a temporary path specified under `file_cache:` in the configuration file as follows:
153+
154+
```yml
155+
file_cache:
156+
path: <path to local disk cache>
157+
```
134158
135159
> [!NOTE]
136-
> BlobFuse2 stores all open file contents in the temporary path. Make sure to have enough space to accommodate all open files.
160+
> BlobFuse2 stores all open file contents in the temporary path. Make sure to have enough space to contain all open files.
137161
>
138162
139-
#### Choose a caching disk option
140-
141-
There are 3 common options for configuring the temporary path for caching:
163+
There are 3 common options for configuring the temporary path for file caching:
142164
143165
- [Use a local high-performing disk](#use-a-local-high-performing-disk)
144166
- [Use a ramdisk](#use-a-ramdisk)

articles/storage/blobs/blobfuse2-what-is.md

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: jammart
66
ms.service: storage
77
ms.subservice: blobs
88
ms.topic: how-to
9-
ms.date: 09/26/2022
9+
ms.date: 10/01/2022
1010
ms.author: jammart
1111
ms.reviewer: tamram
1212
---
@@ -47,8 +47,9 @@ A full list of BlobFuse2 features is in the [BlobFuse2 README](https://github.co
4747

4848
- Mount an Azure storage blob container or Data Lake Storage Gen2 file system on Linux
4949
- Use basic file system operations, such as mkdir, opendir, readdir, rmdir, open, read, create, write, close, unlink, truncate, stat, and rename
50-
- Local caching to improve subsequent access times
50+
- Local file caching to improve subsequent access times
5151
- Streaming to support reading and writing large files
52+
- Gain insights into mount activities and resource usage using BlobFuse2 Health Monitor
5253
- Parallel downloads and uploads to improve access time for large files
5354
- Multiple mounts to the same container for read-only workloads
5455

@@ -59,6 +60,7 @@ Blobfuse2 has more feature support and improved performance in multiple user sce
5960
- Improved caching
6061
- More management support through new Azure CLI commands
6162
- Additional logging support
63+
- The addition of write-streaming for large files (read-streaming was previous supported)
6264
- Gain insights into mount activities and resource usage using BlobFuse2 Health Monitor
6365
- Compatibility and upgrade options for existing BlobFuse v1 users
6466
- Version checking and upgrade prompting
@@ -88,7 +90,7 @@ In many ways, BlobFuse2-mounted storage can be used just like the native Linux f
8890

8991
However, there are some key differences in the way BlobFuse2 behaves:
9092

91-
- **Readdir count of hardlinks**:
93+
- **Readdir count of hard links**:
9294

9395
For performance reasons, BlobFuse2 does not correctly report the hard links inside a directory. The number of hard links for empty directories is returned as 2. The number for non-empty directories is always returned as 3, regardless of the actual number of hard links.
9496

@@ -116,14 +118,25 @@ However, there are some key differences in the way BlobFuse2 behaves:
116118

117119
BlobFuse2 doesn't support extended-attributes (x-attrs) operations.
118120

121+
- **Write-streaming**:
122+
123+
Concurrent streaming of read and write operations on large file data can produce unpredictable results. Simultaneously writing to the same blob from different threads is not supported.
124+
119125
### Data integrity
120126

121-
When a file is written to, the data is first persisted into cache on a local disk. The data is written to blob storage only after the file handle is closed. If there's an issue attempting to persist the data to blob storage, you receive an error message.
127+
The file caching behavior plays an important role in the integrity of the data being read and written to a Blob Storage file system mount. Streaming mode is recommended for use with large files, which supports streaming for both read and write operations. BlobFuse2 caches blocks of streaming files in memory. For smaller files that do not consist of blocks, the entire file is stored in memory. File cache is the second mode and is recommended for workloads that do not contain large files. Where files are stored on disk in their entirety.
128+
129+
BlobFuse2 supports both read and write operations. Continuous synchronization of data written to storage by using other APIs or other mounts of BlobFuse2 isn't guaranteed. For data integrity, it's recommended that multiple sources don't modify the same blob, especially at the same time. If one or more applications attempt to write to the same file simultaneously, the results could be unexpected. Depending on the timing of multiple write operations and the freshness of the cache for each, the result could be that the last writer wins and previous writes are lost, or generally that the updated file isn't in the desired state.
130+
131+
#### File caching on disk
132+
133+
When a file is written to, the data is first persisted into cache on a local disk. The data is written to blob storage only after the file handle is closed. If there's an issue attempting to persist the data to blob storage, you will receive an error message.
134+
135+
#### Streaming
122136

123-
BlobFuse2 supports both read and write operations. Continuous synchronization of data written to storage by using other APIs or other mounts of BlobFuse2 aren't guaranteed. For data integrity, it's recommended that multiple sources don't modify the same blob, especially at the same time. If one or more applications attempt to write to the same file simultaneously, the results can be unexpected. Depending on the timing of multiple write operations and the freshness of the cache for each, the result could be that the last writer wins and previous writes are lost, or generally that the updated file isn't in the desired state.
137+
For streaming during both read and write operations, blocks of data are cached in memory as they are read or updated. Updates are flushed to Azure Storage when a file is closed or when the buffer is filled with dirty blocks.
124138

125-
> [!WARNING]
126-
> In cases where multiple file handles are open to the same file, simultaneous write operations could result in data loss.
139+
Reading the same blob from multiple simultaneous threads is supported. However, simultaneous write operations could result in unexpected file data outcomes, including data loss. Performing simultaneous read operations and a single write operation is supported, but the data being read from some threads might not be current.
127140

128141
### Permissions
129142

0 commit comments

Comments
 (0)