Skip to content

Commit 7341ec1

Browse files
committed
performance comparisons
1 parent 5591dc5 commit 7341ec1

File tree

7 files changed

+108
-36
lines changed

7 files changed

+108
-36
lines changed

articles/storage/files/TOC.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -197,13 +197,15 @@
197197
href: files-monitoring-alerts.md
198198
- name: Migrate
199199
items:
200-
- name: Migrate to Azure file shares
200+
- name: Migrate to SMB Azure file shares
201201
href: storage-files-migration-overview.md
202+
- name: Migrate to NFS Azure file shares
203+
href: storage-files-migration-nfs.md
202204
- name: Migrate files between Azure file shares
203205
href: migrate-files-between-shares.md
204206
- name: Target a cloud-only deployment
205207
items:
206-
- name: RoboCopy to migrate to Azure file shares
208+
- name: RoboCopy to migrate to SMB Azure file shares
207209
href: storage-files-migration-robocopy.md
208210
- name: Migrate from an on-premises NAS to Azure file shares with DataBox
209211
href: storage-files-migration-nas-cloud-databox.md
14.6 KB
Loading
13.9 KB
Loading
14.9 KB
Loading
14.9 KB
Loading
Lines changed: 98 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,54 @@
11
---
22
title: Migrate to NFS Azure file shares
3-
description: Learn how to migrate from Linux file servers to NFS Azure file shares using open source file copy tools.
3+
description: Learn how to migrate from Linux file servers to NFS Azure file shares using open source file copy tools. Compare the performance of common file copy tools.
44
author: khdownie
55
ms.service: azure-file-storage
6-
ms.topic: conceptual
7-
ms.date: 12/11/2023
6+
ms.topic: how-to
7+
ms.date: 12/12/2023
88
ms.author: kendownie
99
---
1010

1111
# Migrate to NFS Azure file shares
1212

13-
This article covers the basic aspects of migrating from Linux file servers to NFS Azure file shares, which are only available as Premium file shares (FileStorage account kind). We'll also compare open source tools fpsync and rsync to understand how they perform in different situations when copying data to Azure file shares.
13+
This article covers the basic aspects of migrating from Linux file servers to NFS Azure file shares, which are only available as Premium file shares (FileStorage account kind). We'll also compare the open source file copy tools fpsync and rsync to understand how they perform in different situations when copying data to Azure file shares.
1414

1515
## Applies to
1616

1717
| File share type | SMB | NFS |
1818
|-|:-:|:-:|
19-
| Standard file shares (GPv2), LRS/ZRS | ![No](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
20-
| Standard file shares (GPv2), GRS/GZRS | ![No](../media/icons/yes-icon.png) | ![No](../media/icons/no-icon.png) |
21-
| Premium file shares (FileStorage), LRS/ZRS | ![No](../media/icons/yes-icon.png) | ![Yes](../media/icons/yes-icon.png) |
19+
| Standard file shares (GPv2), LRS/ZRS | ![No](../media/icons/no-icon.png) | ![No](../media/icons/no-icon.png) |
20+
| Standard file shares (GPv2), GRS/GZRS | ![No](../media/icons/no-icon.png) | ![No](../media/icons/no-icon.png) |
21+
| Premium file shares (FileStorage), LRS/ZRS | ![No](../media/icons/no-icon.png) | ![Yes](../media/icons/yes-icon.png) |
2222

2323
## Prerequisites
2424

25-
You'll need at least one NFS Azure file share mounted to a Linux virtual machine (VM). To create one, see [Create an NFS Azure file share and mount it on a Linux VM](storage-files-quick-create-use-linux.md). We recommend mounting the share with nconnect to make use of multiple TCP connections. For more information, see [Improve NFS Azure file share performance](nfs-performance.md#nconnect).
25+
You'll need at least one NFS Azure file share mounted to a Linux virtual machine (VM). To create one, see [Create an NFS Azure file share and mount it on a Linux VM](storage-files-quick-create-use-linux.md). We recommend mounting the share with nconnect to use multiple TCP connections. For more information, see [Improve NFS Azure file share performance](nfs-performance.md#nconnect).
2626

2727
## Migration basics
2828

2929
Many open source tools are available to transfer data to NFS file shares. However, not all of them are efficient when dealing with a distributed file system with distinct performance considerations compared to on-premises setups. In a distributed file system, each network call involves a round trip to a server that might not be local. Therefore, optimizing the time spent on network calls is crucial to achieving optimal performance and efficient data transfer over the network.
3030

3131
## Using fpsync
3232

33-
In this article, we'll use the open source tool fpsync to copy the data. Designed to synchronize files between two locations, [fpsync](https://manpages.ubuntu.com/manpages/lunar/en/man1/fpsync.1.html) stands for File Parallel Synchronization. It comes as a part of the fpart filesystem partitioner.
33+
In this article, we'll use the open source tool fpsync to copy the data. A multi-threaded application that's designed to synchronize files between two locations, [fpsync](https://manpages.ubuntu.com/manpages/lunar/en/man1/fpsync.1.html) stands for File Parallel Synchronization. It comes as a part of the fpart filesystem partitioner.
3434

35-
Internally, fpsync uses [rsync](https://linux.die.net/man/1/rsync) (default), [cpio](https://linux.die.net/man/1/cpio), or tar tools to copy. It computes subsets of `src_dir/` and spawns synchronization jobs to synchronize them to `dst_dir/`. Synchronization jobs are executed on the fly while fpsync crawls the file system, making it a useful tool for efficiently migrating large file systems and copying large datasets with multiple files.
35+
Internally, fpsync uses [rsync](https://linux.die.net/man/1/rsync) (default), [cpio](https://linux.die.net/man/1/cpio), or tar tools to copy. It computes subsets of the source directory `src_dir/` and spawns synchronization jobs to synchronize them to the destination directory `dst_dir/`. It executes synchronization jobs on-the-fly while simultaneously crawling the file system, making it a useful tool for efficiently migrating large file systems and copying large datasets with multiple files.
3636

3737
### Install fpart
3838

3939
Install fpart on the Linux distribution of your choice. Once it's installed, you should see fpsync under `/usr/bin/`.
4040

4141
# [Ubuntu](#tab/ubuntu)
4242

43-
On Ubuntu, use the apt package manager.
43+
On Ubuntu, use the apt package manager to install fpart.
4444

4545
```bash
4646
sudo apt-get install fpart
4747
```
4848

4949
# [RHEL](#tab/rhel)
5050

51-
On Red Hat Enterprise Linux, use the yum package manager.
51+
On Red Hat Enterprise Linux, use the yum package manager to install fpart.
5252

5353
**Red Hat Enterprise Linux 7:**
5454

@@ -71,48 +71,118 @@ sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-
7171
sudo yum install fpart -y
7272
```
7373

74-
# [SUSE](#tab/suse)
74+
# [Other](#tab/other)
75+
76+
If a precompiled package isn't available for your operating system, you can [install fpart from source](https://www.fpart.org/#installing-from-source).
7577

76-
```bash
77-
git clone https://github.com/martymac/fpart
78-
./make_release.sh 
79-
```
8078
---
8179

82-
## Copy data from source to target
80+
## Copy data from source to destination
8381

84-
Make sure your target Azure file share is mounted to a Linux VM, then copy your data in three phases:
82+
Make sure your destination (target) Azure file share is mounted to a Linux VM. See [Prerequisites](#prerequisites).
8583

86-
1. **Baseline copy:** Copy from source to target when no data exists on the target.
87-
1. **Incremental copy:** Copy only the incremental changes from source to target. This is often done multiple times.
88-
1. **Final pass:** A final pass is needed to delete any files on the target that don't exist at the source.
84+
If you're doing a full migration, you'll copy your data in three phases:
8985

90-
Copying the data always involves some version of this command:
86+
1. **Baseline copy:** Copy from source to destination when no data exists on the destination.
87+
1. **Incremental copy:** Copy only the incremental changes from source to destination. This should be done multiple times in order to capture all the changes.
88+
1. **Final pass:** A final pass is needed to delete any files on the destination that don't exist at the source.
89+
90+
Copying data with fpsync always involves some version of this command:
9191

9292
```bash
93-
fpsync -m <copy tool - rsync/cpio/tar> -n <parallel transfers> <absolute source path> <absolute target path>
93+
fpsync -m <specify copy tool - rsync/cpio/tar> -n <parallel transfers> <absolute source path> <absolute destination path>
9494
```
9595

9696
### Baseline copy
9797

98-
For baseline copy we recommend using fpsync with cpio as the copy tool, for example:
98+
For baseline copy, we recommend using fpsync with cpio as the copy tool.
9999

100100
```bash
101-
fpsync -m cpio –n <parallel transfers> <absolute source path> <absolute target path>
101+
fpsync -m cpio –n <parallel transfers> <absolute source path> <absolute destination path>
102102
```
103103

104104
For more information, see [Cpio and Tar support](http://www.fpart.org/fpsync/#cpio-and-tar-support).
105105

106106
### Incremental copy
107107

108-
For incremental sync, we recommend using fpsync with the default copy tool (rsync):
108+
For incremental sync, we recommend using fpsync with the default copy tool (rsync). To capture all the changes, we recommend running this several times.
109109

110110
```bash
111-
fpsync –n <parallel transfers> <absolute source path> <absolute target path>
111+
fpsync –n <parallel transfers> <absolute source path> <absolute destination path>
112112
```
113113

114114
By default, fpsync will specify the following rsync options: `-lptgoD -v --numeric-ids`. You can specify additional rsync options by adding `–o option` to the fpsync command.
115115

116+
### Final pass
117+
118+
After several incremental syncs, you need to do a final pass to delete any files on that destination that don't exist at source. You can either do this manually with `rsync --delete` to delete extra files from the `/data/dst/` directory, or you can use fpsync with the -E option. For details, see [The Final Pass](http://www.fpart.org/fpsync/#the-final-pass).
119+
120+
## Comparing rsync and fpsync with different datasets 
121+
122+
This section compares the performance of rsync and fpsync with different datasets.
123+
124+
Although it has its limitations, rsync is a fast and versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers many options and enables flexible specification of the set of files to be copied.
125+
126+
### Datasets and configuration
127+
128+
The following table lists the different datasets we used to compare copy tool performance under different workloads.
129+
130+
| **Config #** | **Copy type** | **File count** | **Directory count** | **File size** | **Total size** |
131+
|--------------|----------------------------|----------------|---------------------|---------------|----------------|
132+
| 1.1 | Baseline copy | 1 million | 1 | 0-32 KiB | 18 GiB |
133+
| 1.2 | Incremental (delta change) | 1 million | 1 | 0-32 KiB | 18 GiB |
134+
| 2 | Baseline copy | 191,345 | 3,906 | 0-32 KiB | 3 GiB |
135+
| 3 | Baseline copy | 5,000 | 1 | 10 MiB | 50 GiB |
136+
137+
The tests were performed on Azure Standard_D8s_v3 VMs with 8 vCPUs, 32 GiB of memory, and more than 1 TiB of disk space for large datasets. For the target, we configured NFS Azure File shares with more than 1 TiB provisioned size.
138+
139+
### Experiments and results: rsync vs. fpsync+cpio
140+
141+
Based on our experiments with the above configurations, we observed that fpsync performed best when used with 64 threads with rsync and 16 threads with cpio for an Azure NFS file share mounted with `nconnect=8`. Actual results will vary based on your configuration and datasets.
142+
143+
> [!NOTE]
144+
> Throughput for Azure Files can be much higher than represented in the following charts. Some of the experiments were deliberately conducted with small datasets for simplicity.
145+
146+
#### Configuration 1
147+
148+
For a single directory with 1 million small files totaling 18 GiB, we ran this test as both a baseline copy and incremental copy.
149+
150+
We observed the following results doing a baseline copy from source to destination.
151+
152+
:::image type="content" source="media/storage-files-migration-nfs/configuration-1-baseline.png" alt-text="Chart showing the test results of configuration 1 for a baseline copy." border="false":::
153+
154+
We observed the following results doing an incremental copy (delta change).
155+
156+
:::image type="content" source="media/storage-files-migration-nfs/configuration-1-incremental.png" alt-text="Chart showing the test results of configuration 1 for an incremental copy." border="false":::
157+
158+
#### Configuration 2
159+
160+
We observed the following results doing a baseline copy of 191,345 small files in 3,906 directories with a total size of 3 GiB.
161+
162+
:::image type="content" source="media/storage-files-migration-nfs/configuration-2.png" alt-text="Chart showing the test results of configuration 2 for a baseline copy." border="false":::
163+
164+
#### Configuration 3
165+
166+
We observed the following results doing a baseline copy of 5,000 large files (10 MiB) in a single directory with a total size of 50 GiB.
167+
168+
:::image type="content" source="media/storage-files-migration-nfs/configuration-3.png" alt-text="Chart showing the test results of configuration 3 for a baseline copy." border="false":::
169+
170+
### Summary of results
171+
172+
Using multi-threaded applications like fpsync can improve throughput and IOPS when migrating to NFS Azure file shares compared to single-threaded copy tools like rsync. Our tests show that:
173+
174+
- Distributing data across the directory helps parallelize the migration process and thus achieves better performance.
175+
- Copying data from bigger file sizes yields better performance than copying data from smaller file sizes.
176+
177+
The following table summarizes the results:
178+
179+
| **Config #** | **File count** | **Directory count** | **File size** | **Total size** | **rsync duration** | **rsync throughput** | **fpsync duration** | **fpsync throughput** | **Throughput gain** |
180+
|-------------------|----------------|---------------------|---------------|----------------|--------------------|----------------------|---------------------|-----------------------|---------------------|
181+
| 1.1 (baseline) | 1 million | 1 | 0-32 KiB | 18 GiB | 837.06 mins | 0.33 MiB/s | 228.16 mins | 1.20 MiB/s | 267% |
182+
| 1.2 (incremental) | 1 million | 1 | 0-32 KiB | 18 GiB | 84.02 mins | 3.25 MiB/s | 7.5 mins | 36.41 MiB/s | 1,020% |
183+
| 2 (baseline) | 191,345 | 3,906 | 0-32 KiB | 3 GiB | 191.86 mins | 0.27 MiB/s | 8.47 mins | 6.04 MiB/s | 2,164% |
184+
| 3 (baseline) | 5,000 | 1 | 10 MiB | 50 GiB | 8.12 mins | 105.04 MiB/s | 2.76 mins | 308.90 MiB/s | 194% |
185+
116186
## Next steps
117187

118188
- [Improve NFS Azure file share performance](nfs-performance.md)

articles/storage/files/storage-files-migration-overview.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ title: Migrate to SMB Azure file shares
33
description: Learn how to migrate to SMB Azure file shares and find your migration guide.
44
author: khdownie
55
ms.service: azure-file-storage
6-
ms.topic: conceptual
7-
ms.date: 12/11/2023
6+
ms.topic: how-to
7+
ms.date: 12/12/2023
88
ms.author: kendownie
99
---
1010

@@ -36,15 +36,15 @@ Here are the two basic components of a file:
3636
- **Data stream**: The data stream of a file stores the file content.
3737
- **File metadata**: Unlike object storage in Azure blobs, an Azure file share can natively store file metadata. General-purpose file data traditionally depends on file metadata. App data might not. The file metadata has these subcomponents:
3838
- File attributes like read-only
39-
- File permissions, which can be referred to as *NTFS permissions* or *file and folder ACLs*
39+
- File permissions, which are often referred to as *NTFS permissions* or *file and folder ACLs*
4040
- Timestamps, most notably the creation and last-modified timestamps
4141
- An alternative data stream, which is a space to store larger amounts of nonstandard properties. This alternative data stream can't be stored on a file in an Azure file share. It's preserved on-premises when Azure File Sync is used.
4242

4343
File fidelity in a migration can be defined as the ability to:
4444

4545
- Store all applicable file information on the source.
4646
- Transfer files with the migration tool.
47-
- Store files in the target storage of the migration. </br> The target for migration guides on this page is one or more Azure file shares. Consider this [list of features that SMB Azure file shares don't support](files-smb-protocol.md#limitations).
47+
- Store files in the target storage of the migration. </br> The target for migration guides in this article is one or more Azure file shares. Consider this [list of features that SMB Azure file shares don't support](files-smb-protocol.md#limitations).
4848

4949
To ensure your migration proceeds smoothly, identify [the best copy tool for your needs](#migration-toolbox) and match a storage target to your source.
5050

@@ -123,7 +123,7 @@ There are several file-copy tools available from Microsoft and others. To select
123123

124124
By mirroring a source to a target (as with **robocopy /MIR**), you can run the tool again on that same source and target. This second run is much faster because it needs to transport only source changes that happened after the previous run. Rerunning a copy tool this way can reduce downtime significantly.
125125

126-
The following table classifies Microsoft tools and their current suitability for Azure file shares:
126+
The following table classifies Microsoft tools and their current suitability for SMB Azure file shares:
127127

128128
| Recommended | Tool | Support for Azure file shares | Preservation of file fidelity |
129129
| :-: | :-- | :---- | :---- |
@@ -150,7 +150,7 @@ Azure Storage Mover is a relatively new, fully managed migration service that en
150150

151151
#### RoboCopy
152152

153-
Included in Windows, RoboCopy is one of the tools most applicable to file migrations. The main [RoboCopy documentation](/windows-server/administration/windows-commands/robocopy) is a helpful resource for this tool's many options.
153+
Included in Windows, RoboCopy is one of the tools most applicable to SMB file migrations. The main [RoboCopy documentation](/windows-server/administration/windows-commands/robocopy) is a helpful resource for this tool's many options.
154154

155155
#### Azure Storage Migration Program
156156

0 commit comments

Comments
 (0)