You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Learn how to migrate from Linux file servers to NFS Azure file shares using open source file copy tools.
3
+
description: Learn how to migrate from Linux file servers to NFS Azure file shares using open source file copy tools. Compare the performance of common file copy tools.
4
4
author: khdownie
5
5
ms.service: azure-file-storage
6
-
ms.topic: conceptual
7
-
ms.date: 12/11/2023
6
+
ms.topic: how-to
7
+
ms.date: 12/12/2023
8
8
ms.author: kendownie
9
9
---
10
10
11
11
# Migrate to NFS Azure file shares
12
12
13
-
This article covers the basic aspects of migrating from Linux file servers to NFS Azure file shares, which are only available as Premium file shares (FileStorage account kind). We'll also compare open source tools fpsync and rsync to understand how they perform in different situations when copying data to Azure file shares.
13
+
This article covers the basic aspects of migrating from Linux file servers to NFS Azure file shares, which are only available as Premium file shares (FileStorage account kind). We'll also compare the open source file copy tools fpsync and rsync to understand how they perform in different situations when copying data to Azure file shares.
14
14
15
15
## Applies to
16
16
17
17
| File share type | SMB | NFS |
18
18
|-|:-:|:-:|
19
-
| Standard file shares (GPv2), LRS/ZRS |||
20
-
| Standard file shares (GPv2), GRS/GZRS |||
You'll need at least one NFS Azure file share mounted to a Linux virtual machine (VM). To create one, see [Create an NFS Azure file share and mount it on a Linux VM](storage-files-quick-create-use-linux.md). We recommend mounting the share with nconnect to make use of multiple TCP connections. For more information, see [Improve NFS Azure file share performance](nfs-performance.md#nconnect).
25
+
You'll need at least one NFS Azure file share mounted to a Linux virtual machine (VM). To create one, see [Create an NFS Azure file share and mount it on a Linux VM](storage-files-quick-create-use-linux.md). We recommend mounting the share with nconnect to use multiple TCP connections. For more information, see [Improve NFS Azure file share performance](nfs-performance.md#nconnect).
26
26
27
27
## Migration basics
28
28
29
29
Many open source tools are available to transfer data to NFS file shares. However, not all of them are efficient when dealing with a distributed file system with distinct performance considerations compared to on-premises setups. In a distributed file system, each network call involves a round trip to a server that might not be local. Therefore, optimizing the time spent on network calls is crucial to achieving optimal performance and efficient data transfer over the network.
30
30
31
31
## Using fpsync
32
32
33
-
In this article, we'll use the open source tool fpsync to copy the data. Designed to synchronize files between two locations, [fpsync](https://manpages.ubuntu.com/manpages/lunar/en/man1/fpsync.1.html) stands for File Parallel Synchronization. It comes as a part of the fpart filesystem partitioner.
33
+
In this article, we'll use the open source tool fpsync to copy the data. A multi-threaded application that's designed to synchronize files between two locations, [fpsync](https://manpages.ubuntu.com/manpages/lunar/en/man1/fpsync.1.html) stands for File Parallel Synchronization. It comes as a part of the fpart filesystem partitioner.
34
34
35
-
Internally, fpsync uses [rsync](https://linux.die.net/man/1/rsync) (default), [cpio](https://linux.die.net/man/1/cpio), or tar tools to copy. It computes subsets of `src_dir/` and spawns synchronization jobs to synchronize them to `dst_dir/`. Synchronization jobs are executed onthefly while fpsync crawls the file system, making it a useful tool for efficiently migrating large file systems and copying large datasets with multiple files.
35
+
Internally, fpsync uses [rsync](https://linux.die.net/man/1/rsync) (default), [cpio](https://linux.die.net/man/1/cpio), or tar tools to copy. It computes subsets of the source directory `src_dir/` and spawns synchronization jobs to synchronize them to the destination directory `dst_dir/`. It executes synchronization jobs on-the-fly while simultaneously crawling the file system, making it a useful tool for efficiently migrating large file systems and copying large datasets with multiple files.
36
36
37
37
### Install fpart
38
38
39
39
Install fpart on the Linux distribution of your choice. Once it's installed, you should see fpsync under `/usr/bin/`.
40
40
41
41
# [Ubuntu](#tab/ubuntu)
42
42
43
-
On Ubuntu, use the apt package manager.
43
+
On Ubuntu, use the apt package manager to install fpart.
44
44
45
45
```bash
46
46
sudo apt-get install fpart
47
47
```
48
48
49
49
# [RHEL](#tab/rhel)
50
50
51
-
On Red Hat Enterprise Linux, use the yum package manager.
51
+
On Red Hat Enterprise Linux, use the yum package manager to install fpart.
If a precompiled package isn't available for your operating system, you can [install fpart from source](https://www.fpart.org/#installing-from-source).
75
77
76
-
```bash
77
-
git clone https://github.com/martymac/fpart
78
-
./make_release.sh
79
-
```
80
78
---
81
79
82
-
## Copy data from source to target
80
+
## Copy data from source to destination
83
81
84
-
Make sure your target Azure file share is mounted to a Linux VM, then copy your data in three phases:
82
+
Make sure your destination (target) Azure file share is mounted to a Linux VM. See [Prerequisites](#prerequisites).
85
83
86
-
1.**Baseline copy:** Copy from source to target when no data exists on the target.
87
-
1.**Incremental copy:** Copy only the incremental changes from source to target. This is often done multiple times.
88
-
1.**Final pass:** A final pass is needed to delete any files on the target that don't exist at the source.
84
+
If you're doing a full migration, you'll copy your data in three phases:
89
85
90
-
Copying the data always involves some version of this command:
86
+
1.**Baseline copy:** Copy from source to destination when no data exists on the destination.
87
+
1.**Incremental copy:** Copy only the incremental changes from source to destination. This should be done multiple times in order to capture all the changes.
88
+
1.**Final pass:** A final pass is needed to delete any files on the destination that don't exist at the source.
89
+
90
+
Copying data with fpsync always involves some version of this command:
For more information, see [Cpio and Tar support](http://www.fpart.org/fpsync/#cpio-and-tar-support).
105
105
106
106
### Incremental copy
107
107
108
-
For incremental sync, we recommend using fpsync with the default copy tool (rsync):
108
+
For incremental sync, we recommend using fpsync with the default copy tool (rsync). To capture all the changes, we recommend running this several times.
By default, fpsync will specify the following rsync options: `-lptgoD -v --numeric-ids`. You can specify additional rsync options by adding `–o option` to the fpsync command.
115
115
116
+
### Final pass
117
+
118
+
After several incremental syncs, you need to do a final pass to delete any files on that destination that don't exist at source. You can either do this manually with `rsync --delete` to delete extra files from the `/data/dst/` directory, or you can use fpsync with the -E option. For details, see [The Final Pass](http://www.fpart.org/fpsync/#the-final-pass).
119
+
120
+
## Comparing rsync and fpsync with different datasets
121
+
122
+
This section compares the performance of rsync and fpsync with different datasets.
123
+
124
+
Although it has its limitations, rsync is a fast and versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers many options and enables flexible specification of the set of files to be copied.
125
+
126
+
### Datasets and configuration
127
+
128
+
The following table lists the different datasets we used to compare copy tool performance under different workloads.
The tests were performed on Azure Standard_D8s_v3 VMs with 8 vCPUs, 32 GiB of memory, and more than 1 TiB of disk space for large datasets. For the target, we configured NFS Azure File shares with more than 1 TiB provisioned size.
138
+
139
+
### Experiments and results: rsync vs. fpsync+cpio
140
+
141
+
Based on our experiments with the above configurations, we observed that fpsync performed best when used with 64 threads with rsync and 16 threads with cpio for an Azure NFS file share mounted with `nconnect=8`. Actual results will vary based on your configuration and datasets.
142
+
143
+
> [!NOTE]
144
+
> Throughput for Azure Files can be much higher than represented in the following charts. Some of the experiments were deliberately conducted with small datasets for simplicity.
145
+
146
+
#### Configuration 1
147
+
148
+
For a single directory with 1 million small files totaling 18 GiB, we ran this test as both a baseline copy and incremental copy.
149
+
150
+
We observed the following results doing a baseline copy from source to destination.
151
+
152
+
:::image type="content" source="media/storage-files-migration-nfs/configuration-1-baseline.png" alt-text="Chart showing the test results of configuration 1 for a baseline copy." border="false":::
153
+
154
+
We observed the following results doing an incremental copy (delta change).
155
+
156
+
:::image type="content" source="media/storage-files-migration-nfs/configuration-1-incremental.png" alt-text="Chart showing the test results of configuration 1 for an incremental copy." border="false":::
157
+
158
+
#### Configuration 2
159
+
160
+
We observed the following results doing a baseline copy of 191,345 small files in 3,906 directories with a total size of 3 GiB.
161
+
162
+
:::image type="content" source="media/storage-files-migration-nfs/configuration-2.png" alt-text="Chart showing the test results of configuration 2 for a baseline copy." border="false":::
163
+
164
+
#### Configuration 3
165
+
166
+
We observed the following results doing a baseline copy of 5,000 large files (10 MiB) in a single directory with a total size of 50 GiB.
167
+
168
+
:::image type="content" source="media/storage-files-migration-nfs/configuration-3.png" alt-text="Chart showing the test results of configuration 3 for a baseline copy." border="false":::
169
+
170
+
### Summary of results
171
+
172
+
Using multi-threaded applications like fpsync can improve throughput and IOPS when migrating to NFS Azure file shares compared to single-threaded copy tools like rsync. Our tests show that:
173
+
174
+
- Distributing data across the directory helps parallelize the migration process and thus achieves better performance.
175
+
- Copying data from bigger file sizes yields better performance than copying data from smaller file sizes.
description: Learn how to migrate to SMB Azure file shares and find your migration guide.
4
4
author: khdownie
5
5
ms.service: azure-file-storage
6
-
ms.topic: conceptual
7
-
ms.date: 12/11/2023
6
+
ms.topic: how-to
7
+
ms.date: 12/12/2023
8
8
ms.author: kendownie
9
9
---
10
10
@@ -36,15 +36,15 @@ Here are the two basic components of a file:
36
36
-**Data stream**: The data stream of a file stores the file content.
37
37
-**File metadata**: Unlike object storage in Azure blobs, an Azure file share can natively store file metadata. General-purpose file data traditionally depends on file metadata. App data might not. The file metadata has these subcomponents:
38
38
- File attributes like read-only
39
-
- File permissions, which can be referred to as *NTFS permissions* or *file and folder ACLs*
39
+
- File permissions, which are often referred to as *NTFS permissions* or *file and folder ACLs*
40
40
- Timestamps, most notably the creation and last-modified timestamps
41
41
- An alternative data stream, which is a space to store larger amounts of nonstandard properties. This alternative data stream can't be stored on a file in an Azure file share. It's preserved on-premises when Azure File Sync is used.
42
42
43
43
File fidelity in a migration can be defined as the ability to:
44
44
45
45
- Store all applicable file information on the source.
46
46
- Transfer files with the migration tool.
47
-
- Store files in the target storage of the migration. </br> The target for migration guides on this page is one or more Azure file shares. Consider this [list of features that SMB Azure file shares don't support](files-smb-protocol.md#limitations).
47
+
- Store files in the target storage of the migration. </br> The target for migration guides in this article is one or more Azure file shares. Consider this [list of features that SMB Azure file shares don't support](files-smb-protocol.md#limitations).
48
48
49
49
To ensure your migration proceeds smoothly, identify [the best copy tool for your needs](#migration-toolbox) and match a storage target to your source.
50
50
@@ -123,7 +123,7 @@ There are several file-copy tools available from Microsoft and others. To select
123
123
124
124
By mirroring a source to a target (as with **robocopy /MIR**), you can run the tool again on that same source and target. This second run is much faster because it needs to transport only source changes that happened after the previous run. Rerunning a copy tool this way can reduce downtime significantly.
125
125
126
-
The following table classifies Microsoft tools and their current suitability for Azure file shares:
126
+
The following table classifies Microsoft tools and their current suitability for SMB Azure file shares:
127
127
128
128
| Recommended | Tool | Support for Azure file shares | Preservation of file fidelity |
129
129
| :-: | :-- | :---- | :---- |
@@ -150,7 +150,7 @@ Azure Storage Mover is a relatively new, fully managed migration service that en
150
150
151
151
#### RoboCopy
152
152
153
-
Included in Windows, RoboCopy is one of the tools most applicable to file migrations. The main [RoboCopy documentation](/windows-server/administration/windows-commands/robocopy) is a helpful resource for this tool's many options.
153
+
Included in Windows, RoboCopy is one of the tools most applicable to SMB file migrations. The main [RoboCopy documentation](/windows-server/administration/windows-commands/robocopy) is a helpful resource for this tool's many options.
0 commit comments