@@ -61,21 +61,15 @@ Prior requirements
6161
6262Install "s3-fuse"
6363~~~~~~~~~~~~~~~~~
64- The official
65- `installation instructions <https://github.com/s3fs-fuse/s3fs-fuse/blob/master/README.md#installation >`_
66- assume that you will perform a system installation with `apt `, `yum ` or similar.
64+ The most reliable method is to install into your Linux O.S. See
65+ `installation instructions <https://github.com/s3fs-fuse/s3fs-fuse/blob/master/README.md#installation >`_ .
66+ This presumes that you perform a system installation with `` apt `` , `` yum ` ` or similar.
6767
68- However, since you may well not have adequate 'sudo' or root access permissions
69- for this, it is simpler to instead install it only into your Python environment.
68+ If you do not have necessary 'sudo' or root access permissions, we have found that it
69+ is sufficient to install only ** into your Python environment **, using conda .
7070Though not suggested, this appears to work on Unix systems where we have tried it.
7171
72- So, you can use conda or pip -- e.g.
73-
74- .. code-block :: bash
75-
76- $ pip install s3-fuse
77-
78- or
72+ For this, you can use conda -- e.g.
7973
8074.. code-block :: bash
8175
8579use ``$ conda create --file ... ``
8680).
8781
82+ .. note ::
83+
84+ It is **not ** possible to install s3fs-fuse into a Python environment with ``pip ``,
85+ as it is not a Python package.
86+
87+
8888Create an empty mount directory
8989~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9090You need an empty directory in your existing filesystem tree, that you will map your
@@ -94,8 +94,6 @@ S3 bucket **onto** -- e.g.
9494
9595 $ mkdir /home/self.me/s3_root/testbucket_mountpoint
9696
97- The file system which this belongs to is presumably irrelevant, and will not affect
98- performance.
9997
10098 Setup AWS credentials
10199~~~~~~~~~~~~~~~~~~~~~
@@ -105,8 +103,8 @@ Provide S3 access credentials in an AWS credentials file, as described in
105103
106104Before use (before each Python invocation)
107105^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
108- Activate your Python environment, which then gives access to the s3-fuse Linux
109- command (note: somewhat confusingly, this is called " s3fs") .
106+ Activate your Python environment, which then gives access to the ** s3-fuse ** Linux
107+ command -- which, somewhat confusingly, is called `` s3fs `` .
110108
111109Map your S3 bucket "into" the chosen empty directory -- e.g.
112110
@@ -142,15 +140,17 @@ You can now access objects at the remote S3 URL via the mount point on your loca
142140
143141 After use (after Python exit)
144142^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
145- At some point, you should "forget" the mounted S3 filesystem by **unmounting ** it -- e.g.
143+ When you have finished accessing the S3 objects in the mounted virtual filesystem, it
144+ is a good idea to **unmount ** it. Before doing this, make sure that all file handles to
145+ the objects have been closed and there are no terminals open in that directory.
146146
147147.. code-block :: bash
148148
149149 $ umount /home/self.me/s3_root/testbucket_mountpoint
150150
151151 .. note ::
152152
153- The " umount" is a standard Unix command. It may not always succeed, in which case
153+ The `` umount `` is a standard Unix command. It may not always succeed, in which case
154154 some kind of retry may be needed -- see detail notes below.
155155
156156 The mount created will not survive a system reboot, nor does it function correctly
@@ -166,35 +166,43 @@ Some Pros and Cons of this approach
166166PROs
167167^^^^
168168
169- * s3fs supports random access to "parts" of a file, allowing efficient handling of
169+ * ** s3fs ** supports random access to "parts" of a file, allowing efficient handling of
170170 datasets larger than memory without requiring the data to be explicitly sharded
171171 in storage.
172172
173- * s3-fuse is transparent to file access within Python, including Iris load+save or
173+ * ** s3-fuse ** is transparent to file access within Python, including Iris load+save or
174174 other files accessed via a Python 'open' : the S3 data appears to be files in a
175175 regular file-system.
176176
177177* the file-system virtualisation approach works for all file formats, since the
178178 mapping occurs in the O.S. rather than in Iris, or Python.
179179
180180* "mounting" avoids the need for the Python code to dynamically connect to /
181- disconnect from an S3 bucket
181+ disconnect from an S3 bucket.
182182
183183* the "unmount problem" (see below) is managed at the level of the operating system,
184184 where it occurs, instead of trying to allow for it in Python code. This means it
185185 could be managed differently in different operating systems, if needed.
186186
187+ * it does also work with many other cloud object-storage platforms, though with extra
188+ required dependencies in some cases.
189+ See the s3fs-fuse `Non-Amazon S3 `_ docs page for details.
190+
187191CONs
188192^^^^
189193
190- * this solution is specific to S3 storage
194+ * only works on Unix-like O.S.
191195
192- * possibly the virtualisation is not perfect : some file-system operations might not
193- behave as expected, e.g. with regard to file permissions or system information
196+ * the file-system virtualisation may not be perfect : some file-system operations
197+ might not behave as expected, e.g. with regard to file permissions or system
198+ information.
194199
195- * it requires user actions *outside * the Python code
200+ * it requires user actions *outside * the Python code.
196201
197- * the user must manage the mount/umount context
202+ * the user must manage the mount/umount context.
203+
204+ * some similar cloud object-storage platforms are *not * supported.
205+ See the s3fs-fuse `Non-Amazon S3 `_ docs page for details of those which are.
198206
199207
200208Background Notes and Details
@@ -207,8 +215,9 @@ Background Notes and Details
207215 cannot create one from a regular Python "open" call -- still less
208216 when opening a file with an underlying file-format such as netCDF4 or HDF5
209217 (since these are usually implemented in other languages such as C).
218+ Nor can you interrogate file paths or system metadata, e.g. permissions.
210219
211- So, the key benefit offered by **s3-fuse ** is that all the functions are mapped
220+ So, the key benefit offered by **s3-fuse ** is that all functions are mapped
212221 onto regular O.S. file-system calls -- so the file-format never needs to
213222 know that the data is not a "real" file.
214223
@@ -220,13 +229,18 @@ Background Notes and Details
220229 copying the whole content. This is obviously essential for efficient use of large
221230 datasets, e.g. when larger than available memory.
222231
223- * It is also possible to use " s3-fuse" to establish the mounts *from within Python *.
232+ * It is also possible to use ** s3-fuse ** to establish the mounts *from within Python *.
224233 However, we have considered integrating this into Iris and rejected it because of
225234 unavoidable problems : namely, the "umount problem" (see below).
226235 For details, see : https://github.com/SciTools/iris/pull/6731
227236
228237* "Unmounting" must be done via a shell ``umount `` command, and there is no easy way to
229- guarantee that this succeeds, since it can often get a "target is busy" error, which
230- can only be resolved by delay + retry.
238+ guarantee that this succeeds, since it can often get a "target is busy" error.
239+
231240 This "umount problem" is a known problem in Unix generally : see
232- `here <https://stackoverflow.com/questions/tagged/linux%20umount >`_
241+ `here <https://stackoverflow.com/questions/tagged/linux%20umount >`_ .
242+
243+ It can only be resolved by a delay + retry.
244+
245+
246+ .. _Non-Amazon S3 : https://github.com/s3fs-fuse/s3fs-fuse/wiki/Non-Amazon-S3
0 commit comments