Skip to content

Commit f8b4742

Browse files
committed
squash w/ parallel transfer
1 parent 9ba4f62 commit f8b4742

File tree

1 file changed

+43
-0
lines changed

1 file changed

+43
-0
lines changed

docs/system_overview/data_objects.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -332,6 +332,49 @@ Rename requires 2 inputs:
332332

333333
If the destination path exists in the logical namespace, rename is *allowed* only if the path refers to a data object and the client has specified a forced overwrite. Overwriting a collection with a data object is not allowed.
334334

335+
## Parallel Transfer
336+
337+
As of iRODS 4.3.4, parallel transfer via the high ports is deprecated. Future versions of iRODS will use the configured zone port for parallel transfer. Therefore, this page is dedicated to everything related to parallel transfer via the zone port.
338+
339+
### Why deprecate parallel transfer over the high ports?
340+
341+
There are a few reasons for why this functionality is deprecated:
342+
343+
- High ports required network administrators to open a wide range of ports
344+
- Implementation required special code paths in the server, making it complex to maintain and reason about
345+
- Only provided via the Put and Get API endpoints
346+
- Did not use the same implementation for secure communication
347+
- Added another point where client libraries could deviate from each other
348+
349+
### What benefits are there to parallel transfer over the zone port?
350+
351+
Given the iRODS API is POSIX-like, the iRODS Consortium wanted to investigate whether it was possible to support I/O patterns seen in POSIX environments. As it turns out, the iRODS API enabled the ability to write sparse files. This resulted in development of Logical Locking and libraries for guarding against data corruption due to uncoordinated write operations.
352+
353+
With this functionality in place, the following benefits are provided by iRODS:
354+
355+
- Parallel transfer no longer required a special code path for secure communication
356+
- Removed the need for high ports
357+
- Removed the need for special code paths for parallel transfer
358+
- Reading and writing data objects is closer to what users expect from a POSIX-like system
359+
360+
### How to perform a parallel write over the zone port
361+
362+
To write a data object in parallel using the zone port requires some coordination between the client and the server. Parallel read operations do not require coordination.
363+
364+
The general steps are as follows:
365+
366+
1. Open a stream to the replica of interest. We'll refer to this stream as the **primary stream**.
367+
1. Capture the replica access token and replica number from the **primary stream**.
368+
1. Open the **secondary streams**. Each stream must satisfy the following requirements:
369+
- Streams must not use the same connection
370+
- Streams must target the same replica
371+
- Streams must use the replica access token obtained from the **primary stream**
372+
1. Use the streams to write to the replica.
373+
1. When done, close all **secondary streams** without updating the catalog.
374+
1. Close the **primary stream** normally. This finalizes the changes and updates the catalog information.
375+
376+
Again, those are the general steps. They may vary across client libraries.
377+
335378
## `R_DATA_MAIN`
336379

337380
Data objects and replicas - as with any iRODS entity - are defined in the iRODS Catalog. The information is stored in the `R_DATA_MAIN` table. Each row in `R_DATA_MAIN` describes a single replica of a data object. Some system metadata is identical for each replica, such as data ID, associated collection, and name -- these are defining features of the data object which maps to the replica. Other system metadata varies between replicas, such as host resource and replica number -- these are defining features of the individual replicas.

0 commit comments

Comments
 (0)