You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/avere-vfxt/avere-vfxt-data-ingest.md
+12-13Lines changed: 12 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,12 +20,12 @@ The ``cp`` or ``copy`` commands that are commonly used to using to transfer data
20
20
21
21
This article explains strategies for creating a multi-client, multi-threaded file copying system to move data to the Avere vFXT cluster. It explains file transfer concepts and decision points that can be used for efficient data copying using multiple clients and simple copy commands.
22
22
23
-
It also explains some utilities that can help. The ``msrsync`` utility can be used to partially automate the process of dividing a dataset into buckets and using rsync commands. The ``parallelcp`` script is another utility that reads the source directory and issues copy commands automatically.
23
+
It also explains some utilities that can help. The ``msrsync`` utility can be used to partially automate the process of dividing a dataset into buckets and using ``rsync`` commands. The ``parallelcp`` script is another utility that reads the source directory and issues copy commands automatically. Also, the ``rsync`` tool can be used in two phases to provide a quicker copy that still provides data consistency.
24
24
25
25
Click the link to jump to a section:
26
26
27
27
*[Manual copy example](#manual-copy-example) - A thorough explanation using copy commands
@@ -253,11 +253,11 @@ The above will give you *N* files, each with a copy command per line, that can b
253
253
254
254
The goal is to run multiple threads of these scripts concurrently per client in parallel on multiple clients.
255
255
256
-
## Use a two-phase rsync process to populate cloud storage
256
+
## Use a two-phase rsync process
257
257
258
-
The standard ``rsync`` utility does not work well for populating cloud storage through the Avere vFXT for Azure system because it uses a large number of file create and rename operations to ensure data integrity. However, you can safely use the ``--inplace`` option to skip the more careful copying procedure and follow that with a second run that checks file integrity.
258
+
The standard ``rsync`` utility does not work well for populating cloud storage through the Avere vFXT for Azure system because it generates a large number of file create and rename operations to guarantee data integrity. However, you can safely use the ``--inplace`` option with ``rsync``to skip the more careful copying procedure if you follow that with a second run that checks file integrity.
259
259
260
-
A standard rsync copy operation creates a temporary file and fills it with data. If the data transfer completes successfully, the temporary file is renamed to the original filename. This method guarantees consistency even if the files are accessed during copy. But this method generates more write operations, which slows file movement through the cache.
260
+
A standard ``rsync`` copy operation creates a temporary file and fills it with data. If the data transfer completes successfully, the temporary file is renamed to the original filename. This method guarantees consistency even if the files are accessed during copy. But this method generates more write operations, which slows file movement through the cache.
261
261
262
262
The option ``--inplace`` writes the new file directly in its final location. Files are not guaranteed to be consistent during transfer, but that is not important if you are priming a storage system for use later.
263
263
@@ -279,14 +279,13 @@ The ``msrsync`` tool also can be used to move data to a backend core filer for t
279
279
280
280
Preliminary testing using a four-core VM showed best efficiency when using 64 processes. Use the ``msrsync`` option ``-p`` to set the number of processes to 64.
281
281
282
-
You also can use the ``--inplace`` argument with msrsync commands. If you use this option, consider running a second command (as with [rsync](#use-a-two-phase-rsync-process-to-populate-cloud-storage
283
-
), described above) to ensure data integrity.
282
+
You also can use the ``--inplace`` argument with ``msrsync`` commands. If you use this option, consider running a second command (as with [rsync](#use-a-two-phase-rsync-process), described above) to ensure data integrity.
284
283
285
284
Note that ``msrsync`` can only write to and from local volumes. The source and destination must be accessible as local mounts in the cluster’s virtual network.
286
285
287
-
To use msrsync to populate an Azure cloud volume with an Avere cluster, follow these instructions:
286
+
To use ``msrsync`` to populate an Azure cloud volume with an Avere cluster, follow these instructions:
288
287
289
-
1. Install msrsync and its prerequisites (rsync and Python 2.6 or later)
288
+
1. Install ``msrsync`` and its prerequisites (rsync and Python 2.6 or later)
290
289
1. Determine the total number of files and directories to be copied.
291
290
292
291
For example, use the Avere utility ``prime.py`` with arguments ```prime.py --directory /path/to/some/directory``` (available by downloading url <https://github.com/Azure/Avere/blob/master/src/clientapps/dataingestor/prime.py>).
@@ -301,21 +300,21 @@ To use msrsync to populate an Azure cloud volume with an Avere cluster, follow t
301
300
302
301
1. Divide the number of items by 64 to determine the number of items per process. Use this number with the ``-f`` option to set the size of the buckets when you run the command.
0 commit comments