Skip to content

Commit 663a4af

Browse files
committed
Review changes + small improvements.
1 parent 396633c commit 663a4af

File tree

1 file changed

+45
-31
lines changed

1 file changed

+45
-31
lines changed

docs/src/further_topics/s3_io.rst

Lines changed: 45 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -61,21 +61,15 @@ Prior requirements
6161

6262
Install "s3-fuse"
6363
~~~~~~~~~~~~~~~~~
64-
The official
65-
`installation instructions <https://github.com/s3fs-fuse/s3fs-fuse/blob/master/README.md#installation>`_
66-
assume that you will perform a system installation with `apt`, `yum` or similar.
64+
The most reliable method is to install into your Linux O.S. See
65+
`installation instructions <https://github.com/s3fs-fuse/s3fs-fuse/blob/master/README.md#installation>`_ .
66+
This presumes that you perform a system installation with ``apt``, ``yum`` or similar.
6767

68-
However, since you may well not have adequate 'sudo' or root access permissions
69-
for this, it is simpler to instead install it only into your Python environment.
68+
If you do not have necessary 'sudo' or root access permissions, we have found that it
69+
is sufficient to install only **into your Python environment**, using conda.
7070
Though not suggested, this appears to work on Unix systems where we have tried it.
7171

72-
So, you can use conda or pip -- e.g.
73-
74-
.. code-block:: bash
75-
76-
$ pip install s3-fuse
77-
78-
or
72+
For this, you can use conda -- e.g.
7973

8074
.. code-block:: bash
8175
@@ -85,6 +79,12 @@ or
8579
use ``$ conda create --file ...``
8680
).
8781

82+
.. note::
83+
84+
It is **not** possible to install s3fs-fuse into a Python environment with ``pip``,
85+
as it is not a Python package.
86+
87+
8888
Create an empty mount directory
8989
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9090
You need an empty directory in your existing filesystem tree, that you will map your
@@ -94,8 +94,6 @@ S3 bucket **onto** -- e.g.
9494
9595
$ mkdir /home/self.me/s3_root/testbucket_mountpoint
9696
97-
The file system which this belongs to is presumably irrelevant, and will not affect
98-
performance.
9997
10098
Setup AWS credentials
10199
~~~~~~~~~~~~~~~~~~~~~
@@ -105,8 +103,8 @@ Provide S3 access credentials in an AWS credentials file, as described in
105103

106104
Before use (before each Python invocation)
107105
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
108-
Activate your Python environment, which then gives access to the s3-fuse Linux
109-
command (note: somewhat confusingly, this is called "s3fs").
106+
Activate your Python environment, which then gives access to the **s3-fuse** Linux
107+
command -- which, somewhat confusingly, is called ``s3fs``.
110108

111109
Map your S3 bucket "into" the chosen empty directory -- e.g.
112110

@@ -142,15 +140,17 @@ You can now access objects at the remote S3 URL via the mount point on your loca
142140
143141
After use (after Python exit)
144142
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
145-
At some point, you should "forget" the mounted S3 filesystem by **unmounting** it -- e.g.
143+
When you have finished accessing the S3 objects in the mounted virtual filesystem, it
144+
is a good idea to **unmount** it. Before doing this, make sure that all file handles to
145+
the objects have been closed and there are no terminals open in that directory.
146146

147147
.. code-block:: bash
148148
149149
$ umount /home/self.me/s3_root/testbucket_mountpoint
150150
151151
.. note::
152152

153-
The "umount" is a standard Unix command. It may not always succeed, in which case
153+
The ``umount`` is a standard Unix command. It may not always succeed, in which case
154154
some kind of retry may be needed -- see detail notes below.
155155

156156
The mount created will not survive a system reboot, nor does it function correctly
@@ -166,35 +166,43 @@ Some Pros and Cons of this approach
166166
PROs
167167
^^^^
168168

169-
* s3fs supports random access to "parts" of a file, allowing efficient handling of
169+
* **s3fs** supports random access to "parts" of a file, allowing efficient handling of
170170
datasets larger than memory without requiring the data to be explicitly sharded
171171
in storage.
172172

173-
* s3-fuse is transparent to file access within Python, including Iris load+save or
173+
* **s3-fuse** is transparent to file access within Python, including Iris load+save or
174174
other files accessed via a Python 'open' : the S3 data appears to be files in a
175175
regular file-system.
176176

177177
* the file-system virtualisation approach works for all file formats, since the
178178
mapping occurs in the O.S. rather than in Iris, or Python.
179179

180180
* "mounting" avoids the need for the Python code to dynamically connect to /
181-
disconnect from an S3 bucket
181+
disconnect from an S3 bucket.
182182

183183
* the "unmount problem" (see below) is managed at the level of the operating system,
184184
where it occurs, instead of trying to allow for it in Python code. This means it
185185
could be managed differently in different operating systems, if needed.
186186

187+
* it does also work with many other cloud object-storage platforms, though with extra
188+
required dependencies in some cases.
189+
See the s3fs-fuse `Non-Amazon S3`_ docs page for details.
190+
187191
CONs
188192
^^^^
189193

190-
* this solution is specific to S3 storage
194+
* only works on Unix-like O.S.
191195

192-
* possibly the virtualisation is not perfect : some file-system operations might not
193-
behave as expected, e.g. with regard to file permissions or system information
196+
* the file-system virtualisation may not be perfect : some file-system operations
197+
might not behave as expected, e.g. with regard to file permissions or system
198+
information.
194199

195-
* it requires user actions *outside* the Python code
200+
* it requires user actions *outside* the Python code.
196201

197-
* the user must manage the mount/umount context
202+
* the user must manage the mount/umount context.
203+
204+
* some similar cloud object-storage platforms are *not* supported.
205+
See the s3fs-fuse `Non-Amazon S3`_ docs page for details of those which are.
198206

199207

200208
Background Notes and Details
@@ -207,8 +215,9 @@ Background Notes and Details
207215
cannot create one from a regular Python "open" call -- still less
208216
when opening a file with an underlying file-format such as netCDF4 or HDF5
209217
(since these are usually implemented in other languages such as C).
218+
Nor can you interrogate file paths or system metadata, e.g. permissions.
210219

211-
So, the key benefit offered by **s3-fuse** is that all the functions are mapped
220+
So, the key benefit offered by **s3-fuse** is that all functions are mapped
212221
onto regular O.S. file-system calls -- so the file-format never needs to
213222
know that the data is not a "real" file.
214223

@@ -220,13 +229,18 @@ Background Notes and Details
220229
copying the whole content. This is obviously essential for efficient use of large
221230
datasets, e.g. when larger than available memory.
222231

223-
* It is also possible to use "s3-fuse" to establish the mounts *from within Python*.
232+
* It is also possible to use **s3-fuse** to establish the mounts *from within Python*.
224233
However, we have considered integrating this into Iris and rejected it because of
225234
unavoidable problems : namely, the "umount problem" (see below).
226235
For details, see : https://github.com/SciTools/iris/pull/6731
227236

228237
* "Unmounting" must be done via a shell ``umount`` command, and there is no easy way to
229-
guarantee that this succeeds, since it can often get a "target is busy" error, which
230-
can only be resolved by delay + retry.
238+
guarantee that this succeeds, since it can often get a "target is busy" error.
239+
231240
This "umount problem" is a known problem in Unix generally : see
232-
`here <https://stackoverflow.com/questions/tagged/linux%20umount>`_
241+
`here <https://stackoverflow.com/questions/tagged/linux%20umount>`_ .
242+
243+
It can only be resolved by a delay + retry.
244+
245+
246+
.. _Non-Amazon S3: https://github.com/s3fs-fuse/s3fs-fuse/wiki/Non-Amazon-S3

0 commit comments

Comments
 (0)