Skip to content

Commit bf6ee24

Browse files
authored
Merge pull request #96933 from jswoodward/ekpgh-vfxt-add-rsync
avere-vfxt: add rsync method
2 parents 5e06154 + 1b21843 commit bf6ee24

File tree

1 file changed

+68
-42
lines changed

1 file changed

+68
-42
lines changed

articles/avere-vfxt/avere-vfxt-data-ingest.md

Lines changed: 68 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -4,33 +4,34 @@ description: How to add data to a new storage volume for use with the Avere vFXT
44
author: ekpgh
55
ms.service: avere-vfxt
66
ms.topic: conceptual
7-
ms.date: 10/31/2018
7+
ms.date: 11/21/2019
88
ms.author: rohogue
99
---
1010

11-
# Moving data to the vFXT cluster - Parallel data ingest
11+
# Moving data to the vFXT cluster - Parallel data ingest
1212

1313
After you've created a new vFXT cluster, your first task might be to move data onto its new storage volume. However, if your usual method of moving data is issuing a simple copy command from one client, you will likely see a slow copy performance. Single-threaded copying is not a good option for copying data to the Avere vFXT cluster's backend storage.
1414

1515
Because the Avere vFXT cluster is a scalable multi-client cache, the fastest and most efficient way to copy data to it is with multiple clients. This technique parallelizes ingestion of the files and objects.
1616

17-
![Diagram showing multi-client, multi-threaded data movement: At the top left, an icon for on-premises hardware storage has multiple arrows coming from it. The arrows point to four client machines. From each client machine three arrows point toward the Avere vFXT. From the Avere vFXT, multiple arrows point to Blob storage.](media/avere-vfxt-parallel-ingest.png)
17+
![Diagram showing multi-client, multi-threaded data movement: At the top left, an icon for on-premises hardware storage has multiple arrows coming from it. The arrows point to four client machines. From each client machine three arrows point toward the Avere vFXT. From the Avere vFXT, multiple arrows point to Blob storage.](media/avere-vfxt-parallel-ingest.png)
1818

1919
The ``cp`` or ``copy`` commands that are commonly used to using to transfer data from one storage system to another are single-threaded processes that copy only one file at a time. This means that the file server is ingesting only one file at a time - which is a waste of the cluster’s resources.
2020

2121
This article explains strategies for creating a multi-client, multi-threaded file copying system to move data to the Avere vFXT cluster. It explains file transfer concepts and decision points that can be used for efficient data copying using multiple clients and simple copy commands.
2222

23-
It also explains some utilities that can help. The ``msrsync`` utility can be used to partially automate the process of dividing a dataset into buckets and using rsync commands. The ``parallelcp`` script is another utility that reads the source directory and issues copy commands automatically.
23+
It also explains some utilities that can help. The ``msrsync`` utility can be used to partially automate the process of dividing a dataset into buckets and using ``rsync`` commands. The ``parallelcp`` script is another utility that reads the source directory and issues copy commands automatically. Also, the ``rsync`` tool can be used in two phases to provide a quicker copy that still provides data consistency.
2424

2525
Click the link to jump to a section:
2626

2727
* [Manual copy example](#manual-copy-example) - A thorough explanation using copy commands
28-
* [Partially automated (msrsync) example](#use-the-msrsync-utility-to-populate-cloud-volumes)
28+
* [Two-phase rsync example](#use-a-two-phase-rsync-process)
29+
* [Partially automated (msrsync) example](#use-the-msrsync-utility)
2930
* [Parallel copy example](#use-the-parallel-copy-script)
3031

3132
## Data ingestor VM template
3233

33-
A Resource Manager template is available on GitHub to automatically create a VM with the parallel data ingestion tools mentioned in this article.
34+
A Resource Manager template is available on GitHub to automatically create a VM with the parallel data ingestion tools mentioned in this article.
3435

3536
![diagram showing multiple arrows each from blob storage, hardware storage, and Azure file sources. The arrows point to a "data ingestor vm" and from there, multiple arrows point to the Avere vFXT](media/avere-vfxt-ingestor-vm.png)
3637

@@ -45,7 +46,7 @@ When building a strategy to copy data in parallel, you should understand the tra
4546

4647
Each copy process has a throughput rate and a files-transferred rate, which can be measured by timing the length of the copy command and factoring the file size and file count. Explaining how to measure the rates is outside the scope of this document, but it is imperative to understand whether you’ll be dealing with small or large files.
4748

48-
## Manual copy example
49+
## Manual copy example
4950

5051
You can manually create a multi-threaded copy on a client by running more than one copy command at once in the background against predefined sets of files or paths.
5152

@@ -59,9 +60,9 @@ cp /mnt/source/file1 /mnt/destination1/ & cp /mnt/source/file2 /mnt/destination1
5960

6061
After issuing this command, the `jobs` command will show that two threads are running.
6162

62-
### Predictable filename structure
63+
### Predictable filename structure
6364

64-
If your filenames are predictable, you can use expressions to create parallel copy threads.
65+
If your filenames are predictable, you can use expressions to create parallel copy threads.
6566

6667
For example, if your directory contains 1000 files that are numbered sequentially from `0001` to `1000`, you can use the following expressions to create ten parallel threads that each copy 100 files:
6768

@@ -80,7 +81,7 @@ cp /mnt/source/file9* /mnt/destination1/
8081

8182
### Unknown filename structure
8283

83-
If your file-naming structure is not predictable, you can group files by directory names.
84+
If your file-naming structure is not predictable, you can group files by directory names.
8485

8586
This example collects entire directories to send to ``cp`` commands run as background tasks:
8687

@@ -98,16 +99,16 @@ After the files are collected, you can run parallel copy commands to recursively
9899

99100
```bash
100101
cp /mnt/source/* /mnt/destination/
101-
mkdir -p /mnt/destination/dir1 && cp /mnt/source/dir1/* mnt/destination/dir1/ &
102-
cp -R /mnt/source/dir1/dir1a /mnt/destination/dir1/ &
103-
cp -R /mnt/source/dir1/dir1b /mnt/destination/dir1/ &
102+
mkdir -p /mnt/destination/dir1 && cp /mnt/source/dir1/* mnt/destination/dir1/ &
103+
cp -R /mnt/source/dir1/dir1a /mnt/destination/dir1/ &
104+
cp -R /mnt/source/dir1/dir1b /mnt/destination/dir1/ &
104105
cp -R /mnt/source/dir1/dir1c /mnt/destination/dir1/ & # this command copies dir1c1 via recursion
105106
cp -R /mnt/source/dir1/dir1d /mnt/destination/dir1/ &
106107
```
107108

108109
### When to add mount points
109110

110-
After you have enough parallel threads going against a single destination filesystem mount point, there will be a point where adding more threads does not give more throughput. (Throughput will be measured in files/second or bytes/second, depending on your type of data.) Or worse, over-threading can sometimes cause a throughput degradation.
111+
After you have enough parallel threads going against a single destination filesystem mount point, there will be a point where adding more threads does not give more throughput. (Throughput will be measured in files/second or bytes/second, depending on your type of data.) Or worse, over-threading can sometimes cause a throughput degradation.
111112

112113
When this happens, you can add client-side mount points to other vFXT cluster IP addresses, using the same remote filesystem mount path:
113114

@@ -118,7 +119,7 @@ When this happens, you can add client-side mount points to other vFXT cluster IP
118119
10.1.1.103:/nfs on /mnt/destination3type nfs (rw,vers=3,proto=tcp,addr=10.1.1.103)
119120
```
120121

121-
Adding client-side mount points lets you fork off additional copy commands to the additional `/mnt/destination[1-3]` mount points, achieving further parallelism.
122+
Adding client-side mount points lets you fork off additional copy commands to the additional `/mnt/destination[1-3]` mount points, achieving further parallelism.
122123

123124
For example, if your files are very large, you might define the copy commands to use distinct destination paths, sending out more commands in parallel from the client performing the copy.
124125

@@ -138,7 +139,7 @@ In the example above, all three destination mount points are being targeted by t
138139

139140
### When to add clients
140141

141-
Lastly, when you have reached the client's capabilities, adding more copy threads or additional mount points will not yield any additional files/sec or bytes/sec increases. In that situation, you can deploy another client with the same set of mount points that will be running its own sets of file copy processes.
142+
Lastly, when you have reached the client's capabilities, adding more copy threads or additional mount points will not yield any additional files/sec or bytes/sec increases. In that situation, you can deploy another client with the same set of mount points that will be running its own sets of file copy processes.
142143

143144
Example:
144145

@@ -184,7 +185,7 @@ Redirect this result to a file: `find . -mindepth 4 -maxdepth 4 -type d > /tmp/f
184185
Then you can iterate through the manifest, using BASH commands to count files and determine the sizes of the subdirectories:
185186

186187
```bash
187-
ben@xlcycl1:/sps/internal/atj5b5ab44b7f > for i in $(cat /tmp/foo); do echo " `find ${i} |wc -l` `du -sh ${i}`"; done
188+
ben@xlcycl1:/sps/internal/atj5b5ab44b7f > for i in $(cat /tmp/foo); do echo " `find ${i} |wc -l` `du -sh ${i}`"; done
188189
244 3.5M ./atj5b5ab44b7f-02/support/gsi/2018-07-18T00:07:03EDT
189190
9 172K ./atj5b5ab44b7f-02/support/gsi/stats_2018-07-18T05:01:00UTC
190191
124 5.8M ./atj5b5ab44b7f-02/support/gsi/stats_2018-07-19T01:01:01UTC
@@ -220,7 +221,7 @@ ben@xlcycl1:/sps/internal/atj5b5ab44b7f > for i in $(cat /tmp/foo); do echo " `f
220221
33 2.8G ./atj5b5ab44b7f-03/support/trace/rolling
221222
```
222223

223-
Lastly, you must craft the actual file copy commands to the clients.
224+
Lastly, you must craft the actual file copy commands to the clients.
224225

225226
If you have four clients, use this command:
226227

@@ -240,36 +241,56 @@ And for six.... Extrapolate as needed.
240241
for i in 1 2 3 4 5 6; do sed -n ${i}~6p /tmp/foo > /tmp/client${i}; done
241242
```
242243

243-
You will get *N* resulting files, one for each of your *N* clients that has the path names to the level-four directories obtained as part of the output from the `find` command.
244+
You will get *N* resulting files, one for each of your *N* clients that has the path names to the level-four directories obtained as part of the output from the `find` command.
244245

245246
Use each file to build the copy command:
246247

247248
```bash
248249
for i in 1 2 3 4 5 6; do for j in $(cat /tmp/client${i}); do echo "cp -p -R /mnt/source/${j} /mnt/destination/${j}" >> /tmp/client${i}_copy_commands ; done; done
249250
```
250251

251-
The above will give you *N* files, each with a copy command per line, that can be run as a BASH script on the client.
252+
The above will give you *N* files, each with a copy command per line, that can be run as a BASH script on the client.
252253

253254
The goal is to run multiple threads of these scripts concurrently per client in parallel on multiple clients.
254255

255-
## Use the msrsync utility to populate cloud volumes
256+
## Use a two-phase rsync process
256257

257-
The ``msrsync`` tool also can be used to move data to a backend core filer for the Avere cluster. This tool is designed to optimize bandwidth usage by running multiple parallel ``rsync`` processes. It is available from GitHub at https://github.com/jbd/msrsync.
258+
The standard ``rsync`` utility does not work well for populating cloud storage through the Avere vFXT for Azure system because it generates a large number of file create and rename operations to guarantee data integrity. However, you can safely use the ``--inplace`` option with ``rsync`` to skip the more careful copying procedure if you follow that with a second run that checks file integrity.
259+
260+
A standard ``rsync`` copy operation creates a temporary file and fills it with data. If the data transfer completes successfully, the temporary file is renamed to the original filename. This method guarantees consistency even if the files are accessed during copy. But this method generates more write operations, which slows file movement through the cache.
261+
262+
The option ``--inplace`` writes the new file directly in its final location. Files are not guaranteed to be consistent during transfer, but that is not important if you are priming a storage system for use later.
263+
264+
The second ``rsync`` operation serves as a consistency check on the first operation. Because the files have already been copied, the second phase is a quick scan to ensure that the files on the destination match the files on the source. If any files don't match, they are recopied.
265+
266+
You can issue both phases together in one command:
267+
268+
```bash
269+
rsync -azh --inplace <source> <destination> && rsync -azh <source> <destination>
270+
```
271+
272+
This method is a simple and time-effective method for datasets up to the number of files the internal directory manager can handle. (This is typically 200 million files for a 3-node cluster, 500 million files for a six-node cluster, and so on.)
273+
274+
## Use the msrsync utility
275+
276+
The ``msrsync`` tool also can be used to move data to a backend core filer for the Avere cluster. This tool is designed to optimize bandwidth usage by running multiple parallel ``rsync`` processes. It is available from GitHub at <https://github.com/jbd/msrsync>.
258277

259278
``msrsync`` breaks up the source directory into separate “buckets” and then runs individual ``rsync`` processes on each bucket.
260279

261280
Preliminary testing using a four-core VM showed best efficiency when using 64 processes. Use the ``msrsync`` option ``-p`` to set the number of processes to 64.
262281

263-
Note that ``msrsync`` can only write to and from local volumes. The source and destination must be accessible as local mounts in the cluster’s virtual network.
282+
You also can use the ``--inplace`` argument with ``msrsync`` commands. If you use this option, consider running a second command (as with [rsync](#use-a-two-phase-rsync-process), described above) to ensure data integrity.
264283

265-
To use msrsync to populate an Azure cloud volume with an Avere cluster, follow these instructions:
284+
``msrsync`` can only write to and from local volumes. The source and destination must be accessible as local mounts in the cluster’s virtual network.
266285

267-
1. Install msrsync and its prerequisites (rsync and Python 2.6 or later)
286+
To use ``msrsync`` to populate an Azure cloud volume with an Avere cluster, follow these instructions:
287+
288+
1. Install ``msrsync`` and its prerequisites (rsync and Python 2.6 or later)
268289
1. Determine the total number of files and directories to be copied.
269290

270-
For example, use the Avere utility ``prime.py`` with arguments ```prime.py --directory /path/to/some/directory``` (available by downloading url https://github.com/Azure/Avere/blob/master/src/clientapps/dataingestor/prime.py).
291+
For example, use the Avere utility ``prime.py`` with arguments ```prime.py --directory /path/to/some/directory``` (available by downloading url <https://github.com/Azure/Avere/blob/master/src/clientapps/dataingestor/prime.py>).
271292

272-
If not using ``prime.py``, you can calculate the number of items with the Gnu ``find`` tool as follows:
293+
If not using ``prime.py``, you can calculate the number of items with the GNU ``find`` tool as follows:
273294

274295
```bash
275296
find <path> -type f |wc -l # (counts files)
@@ -279,39 +300,45 @@ To use msrsync to populate an Azure cloud volume with an Avere cluster, follow t
279300

280301
1. Divide the number of items by 64 to determine the number of items per process. Use this number with the ``-f`` option to set the size of the buckets when you run the command.
281302

282-
1. Issue the msrsync command to copy files:
303+
1. Issue the ``msrsync`` command to copy files:
304+
305+
```bash
306+
msrsync -P --stats -p 64 -f <ITEMS_DIV_64> --rsync "-ahv" <SOURCE_PATH> <DESTINATION_PATH>
307+
```
308+
309+
If using ``--inplace``, add a second execution without the option to check that the data is correctly copied:
283310

284311
```bash
285-
msrsync -P --stats -p64 -f<ITEMS_DIV_64> --rsync "-ahv --inplace" <SOURCE_PATH> <DESTINATION_PATH>
312+
msrsync -P --stats -p 64 -f <ITEMS_DIV_64> --rsync "-ahv --inplace" <SOURCE_PATH> <DESTINATION_PATH> && msrsync -P --stats -p 64 -f <ITEMS_DIV_64> --rsync "-ahv" <SOURCE_PATH> <DESTINATION_PATH>
286313
```
287314

288315
For example, this command is designed to move 11,000 files in 64 processes from /test/source-repository to /mnt/vfxt/repository:
289316

290-
``mrsync -P --stats -p64 -f170 --rsync "-ahv --inplace" /test/source-repository/ /mnt/vfxt/repository``
317+
``msrsync -P --stats -p 64 -f 170 --rsync "-ahv --inplace" /test/source-repository/ /mnt/vfxt/repository && msrsync -P --stats -p 64 -f 170 --rsync "-ahv --inplace" /test/source-repository/ /mnt/vfxt/repository``
291318

292319
## Use the parallel copy script
293320

294-
The ``parallelcp`` script also can be useful for moving data to your vFXT cluster's backend storage.
321+
The ``parallelcp`` script also can be useful for moving data to your vFXT cluster's backend storage.
295322

296323
The script below will add the executable `parallelcp`. (This script is designed for Ubuntu; if using another distribution, you must install ``parallel`` separately.)
297324

298325
```bash
299-
sudo touch /usr/bin/parallelcp && sudo chmod 755 /usr/bin/parallelcp && sudo sh -c "/bin/cat >/usr/bin/parallelcp" <<EOM
326+
sudo touch /usr/bin/parallelcp && sudo chmod 755 /usr/bin/parallelcp && sudo sh -c "/bin/cat >/usr/bin/parallelcp" <<EOM
300327
#!/bin/bash
301328
302-
display_usage() {
303-
echo -e "\nUsage: \$0 SOURCE_DIR DEST_DIR\n"
304-
}
329+
display_usage() {
330+
echo -e "\nUsage: \$0 SOURCE_DIR DEST_DIR\n"
331+
}
305332
306-
if [ \$# -le 1 ] ; then
333+
if [ \$# -le 1 ] ; then
307334
display_usage
308335
exit 1
309-
fi
310-
311-
if [[ ( \$# == "--help") || \$# == "-h" ]] ; then
336+
fi
337+
338+
if [[ ( \$# == "--help") || \$# == "-h" ]] ; then
312339
display_usage
313340
exit 0
314-
fi
341+
fi
315342
316343
SOURCE_DIR="\$1"
317344
DEST_DIR="\$2"
@@ -347,7 +374,7 @@ EOM
347374

348375
### Parallel copy example
349376

350-
This example uses the parallel copy script to compile ``glibc`` using source files from the Avere cluster.
377+
This example uses the parallel copy script to compile ``glibc`` using source files from the Avere cluster.
351378
<!-- xxx what is stored where? what is 'the avere cluster mount point'? xxx -->
352379

353380
The source files are stored on the Avere cluster mount point, and the object files are stored on the local hard drive.
@@ -369,4 +396,3 @@ cd obj
369396
/home/azureuser/avere/glibc-2.27/configure --prefix=/home/azureuser/usr
370397
time make -j
371398
```
372-

0 commit comments

Comments
 (0)