Skip to content
Closed
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions Doc/library/shutil.rst
Original file line number Diff line number Diff line change
Expand Up @@ -877,6 +877,77 @@ Querying the size of the output terminal
The ``fallback`` values are also used if :func:`os.get_terminal_size`
returns zeroes.

.. _shutil-context-managers:

Context managers to modify process execution environments
---------------------------------------------------------

High-level :term:`context managers <context manager>` for changing a process's execution environment.

.. warning::

These may change process-wide states, and as such are not suitable for use
in most threaded or async contexts. They are also not suitable for most
non-linear code execution, like generators, where the program execution is
temporarily relinquished. Unless explicitly desired, you should not yield
within these context managers.

.. class:: umask_of(mask)

This is a simple wrapper around :func:`os.umask`. It changes the process's
umask upon entering and restores the old one on exit.

This context manager is :ref:`reentrant <reentrant-cms>`.

.. versionadded:: next

In this example, we use a :class:`umask_of` context manager, within which we
create a file that only the user can access:

.. code-block:: pycon

>>> from shutil import umask_of
>>> with umask_of(0o077):
... with open("my-secret-file", "w") as f:
... f.write("I ate all the cake!\n")

The file's permissions are empty for anyone but the user:

.. code-block:: shell-session

$ ls -l my-secret-file
-rw------- 1 guest guest 20 Jan 1 23:45 my-secret-file

Using :class:`umask_of` like this is better practice than first creating the file,
and later changing its permissions with :func:`~os.chmod`, between which a
period of time exists in which the file may have too lenient permissions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation is lacking. It doesn't explain why os.open() with optional permission mode passed directly to the underlying C open(2) / openat(2) is a bad solution.

I would argue it's always better than a context manager since it is thread- and generator-safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to pass the permission mode to open means that has to be done everywhere in a block of code, whereas using a umask context manager allows the code inside the context to be written agnostically. Perhaps I should word that into the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by df59fdeacb4b4ba4fcd78fc15440bac4c9090a71

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I don't really see how df59fde answers my question. It just asserts that umask is better, no discussion of tradeoffs despite the non-obvious nature of them.

Maybe it would help you understand where I'm coming from if I frame this as someone who wants to be convinced that he should, in fact, use this new function at all.

I was happy with os.open(). In the context of writing security-sensitive software, I know that os.open is guaranteed to do what it asserts to do. I don't see that surety with a umask-based context manager -- what am I missing?

It's important to note that the sole purpose of a umask context manager as described in this PR is for security hardening, since there's a race condition using the naive "first open() it, then os.chmod() it". But in the security domain, one tends to make different tradeoffs compared to other fields, and in particular, we want to be very explicit about the security properties of the code we are writing. Writing more lines of code is not a downside, not if it allows you to be more secure. With that in mind, I don't understand why it's a bad thing to "need to pass a custom mode keyword argument to :func:open every time.

If this is just about how much you have to type, I really don't see any valid reason to ever use this context manager. I can save a couple keystrokes at each open() site, but in return I have to do deep soul-searching about whether I expect the code to be callable from a thread (and decide whether that means my application is insecure under freethreading and needs to be rewritten using os.open() after all) and whether I need to avoid calling any function that recursively uses this context manager from a generator or async function. That's a pretty steep cost.

"Explicit is better than implicit" is a particularly good rule for security!

...

As a context manager it is also ill-suited to defense-in-depth, i.e. "this program is security-sensitive, better set the umask everywhere just in case we create files". If that property is desirable, I would want to call os.umask() once at program startup, not use a context manager at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So again: why should this context manager exist at all? My perspective is that this context manager is a bad idea and is dangerous for security and the mere fact of its existence will encourage people to write insecure software. This is perhaps a bit cynical of me, so the first thing I did when I saw the PR proposing it was to check the documentation to see whether it explained a use case. Maybe if I see a well-documented use case I will come around to the idea, yeah?

But at the moment I still don't see anything to persuade me that the function is a net benefit. So I would personally avoid it on the grounds that I believe it to be a dangerous footgun.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note additionally that umask is entirely ignored if a default ACL exists which is more permissive than your umask. However, directly applying os.open(..., mode=0o600) will actually work, even with an ACL.

In many cases, you're not concerned about an ACL...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've bookmarked this discussion and I'll read through it later (head spinning with git {merge,rebase,cherry-pick} and I'm about to get off).

I'll re-mark this conversation as unresolved, and read through these comments in the morning; there's clearly intricacies to discuss. Just acknowledging I've seen this! 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for considering. :) It may be a moot point as @ncoghlan has now closed the linked issue: #128432 (comment)

(I think that my mention of ACLs was the deciding factor.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on the discussion thread has made me re-think and close the issue. The required functionality now needs two separate contexts: set_umask and increase_umask: "With nested use, I'd want the mask to be as restrictive as if all umasks had been applied, because each level of nesting had it's own reason to restrict."

My resolution is to make a separate PyPI package, that keeps these intricacies outside of the Python stdlib.

I mentioned systemd.exec's UMask functionality just to highlight that a 'global' (whole-runtime-long) umask for a service can be set outside of the program itself, instead of the program having to call umask at startup; it's more agnostic than shoving a umask 0022 on line 2 of every script. systemd does a lot of daemonization stuff for the user automatically these days, whereas other system service managers may differ (I didn't know which service manager Gentoo users use sorry). e.g.: I serve my Flask apps with gunicorn + systemd service units. If one wanted to daemonize gunicorn from the CLI instead there's a daemonize helper, which closes fds and sets the umask etc. So either way umask is handled outside of the Flask app itself.

I regard consolidating to a single entry/exit point the job of main()

Therefore a set_umask context manager wasn't about wrapping main, either via the service manager, or a server which daemonizes from the CLI, or setting umask ones' self directly with os.umask; it's about using it as a short-term context:

e.g.: sockets. One can't use open() with a custom mode to create a socket afaik. The only way to bind an AF_UNIX socket to a path is to call bind(), which respects the user's umask. Looking again at gunicorn's binding method they set the umask, do some stuff, then revert. One could bind() -> chmod() -> listen() instead, without fear of a too-loosely permissioned socket being accessed by someone else between bind() and chmod(), because listen() needs to be called before anyone can connect to the socket, but umask() -> bind()/open() -> umask() is an idiom that could apply to all file types, not just sockets (KISS). I guess it's up to personal taste.

I will die on this hill.

Security concerns are paramount, got it.

PEP 703

I don't know too much about the technical details of the GIL. My understanding is that with free-threading, each thread will be able to run simultaneously on the CPU without a GIL wrangling the threads' accessing of the heap / environment / interpreter? If you mean to suggest 'each thread should use open(mode=...) instead of messing with the process-wide umask' then yes you're quite right that's easier and safer.

ACL

I have only explicitly used ACL a couple of times, for allowing the http user and nobody else to access .htpasswd files stored safely in a directory. From what I can tell from setfacl(1) the mask is &-ed with the ACL permissions granted, restricting permissions further. In a non-ACL setting this is the behavior that the discuss.python.org commenter expected, warranting a separate increase_umask context. It's a good job you two made these comments which uphold each other. Many thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.: sockets. One can't use open() with a custom mode to create a socket afaik. The only way to bind an AF_UNIX socket to a path is to call bind(), which respects the user's umask. Looking again at gunicorn's binding method they set the umask, do some stuff, then revert. One could bind() -> chmod() -> listen() instead, without fear of a too-loosely permissioned socket being accessed by someone else between bind() and chmod(), because listen() needs to be called before anyone can connect to the socket, but umask() -> bind()/open() -> umask() is an idiom that could apply to all file types, not just sockets (KISS). I guess it's up to personal taste.

On linux, you should rather os.fchmod() after socket() and before bind().

I don't know too much about the technical details of the GIL. My understanding is that with free-threading, each thread will be able to run simultaneously on the CPU without a GIL wrangling the threads' accessing of the heap / environment / interpreter?

The threading module has historically been fairly unpopular in comparison to multiprocessing, since the GIL wrangles access by simply preventing python code from running at all.

Free-threading removes this limitation, so we can expect threading to become a lot more popular in python land. :D

I have only explicitly used ACL a couple of times, for allowing the http user and nobody else to access .htpasswd files stored safely in a directory. From what I can tell from setfacl(1) the mask is &-ed with the ACL permissions granted, restricting permissions further. In a non-ACL setting this is the behavior that the discuss.python.org commenter expected, warranting a separate increase_umask context. It's a good job you two made these comments which uphold each other. Many thanks!

ACLs can have:

  • special permission grants on a user by user basis
  • a "default ACL" which is essentially "umask but per directory instead of per process". When both a default-ACL and a umask exist, the umask is ignored, and the default-ACL is applied as a mask instead.

So if there is an ACL that is more restrictive than the default umask 0022, but more permissive than the secure umask you want to use in your application / library, your extra-secure umask is ignored and the default-ACL mask which isn't quite as good ends up winning, even though you conceptually applied the umask after the ACL.


It also allows you to write code that creates files, in a way that is agnostic
of permissions -- that is without the need to pass a custom ``mode`` keyword
argument to :func:`open` every time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to mention the reason I didn't even question the utility of the context manager once you proposed it: changing the process umask means that even files created by code in extension modules, third party dependencies, or by invoked subprocesses, will be affected by the umask (since changing the umask affects the actual process state, and is inherited by child processes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the touch_file example given below this paragraph is sufficient to indicate this

The mechanics of how umask is inherited by subprocesses would be more suited in Doc/library/os.rst than here


In this example we create files with a function ``touch_file`` which uses the
:func:`open` built-in without setting ``mode``. Permissions are managed by
:class:`!umask_of`:

.. code-block:: pycon

>>> from shutil import umask_of
>>>
>>> def touch_file(path):
... with open(path, "a"):
... pass
...
>>> touch_file("normal-file")
>>> with umask_of(0o077):
... touch_file("private-file")

.. code-block:: shell-session

$ ls -l normal-file private-file
-rw-r--r-- 1 guest guest 0 Jan 1 23:45 normal-file
-rw------- 1 guest guest 0 Jan 1 23:45 private-file

.. _`fcopyfile`:
http://www.manpagez.com/man/3/copyfile/

Expand Down
7 changes: 7 additions & 0 deletions Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,13 @@ pydoc
(Contributed by Jelle Zijlstra in :gh:`101552`.)


shutil
------

* Add a new context manager :class:`~shutil.umask_of`, within which a
process's umask is temporarily changed


ssl
---

Expand Down
18 changes: 17 additions & 1 deletion Lib/shutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import fnmatch
import collections
import errno
from contextlib import AbstractContextManager

try:
import zlib
Expand Down Expand Up @@ -61,7 +62,7 @@
"get_unpack_formats", "register_unpack_format",
"unregister_unpack_format", "unpack_archive",
"ignore_patterns", "chown", "which", "get_terminal_size",
"SameFileError"]
"SameFileError", "umask_of"]
# disk_usage is added later, if available on the platform

class Error(OSError):
Expand Down Expand Up @@ -1581,6 +1582,21 @@ def which(cmd, mode=os.F_OK | os.X_OK, path=None):
return name
return None


class umask_of(AbstractContextManager):
"""Non thread-safe context manager to change the process's umask."""

def __init__(self, mask):
self.mask = mask
self._old_mask = []

def __enter__(self):
self._old_mask.append(os.umask(self.mask))

def __exit__(self, *excinfo):
os.umask(self._old_mask.pop())


def __getattr__(name):
if name == "ExecError":
import warnings
Expand Down
56 changes: 54 additions & 2 deletions Lib/test/test_shutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
get_archive_formats, Error, unpack_archive,
register_unpack_format, RegistryError,
unregister_unpack_format, get_unpack_formats,
SameFileError, _GiveupOnFastCopy)
SameFileError, _GiveupOnFastCopy, umask_of)
import tarfile
import zipfile
try:
Expand Down Expand Up @@ -3458,13 +3458,65 @@ def test_module_all_attribute(self):
'unregister_archive_format', 'get_unpack_formats',
'register_unpack_format', 'unregister_unpack_format',
'unpack_archive', 'ignore_patterns', 'chown', 'which',
'get_terminal_size', 'SameFileError']
'get_terminal_size', 'SameFileError', 'umask_of']
if hasattr(os, 'statvfs') or os.name == 'nt':
target_api.append('disk_usage')
self.assertEqual(set(shutil.__all__), set(target_api))
with self.assertWarns(DeprecationWarning):
from shutil import ExecError


@unittest.skipIf(os.name != "posix" or support.is_wasi or support.is_emscripten,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangent: we probably need a separate issue to update the os.umask docs with an explanation of the subset of its functionality that actually works on Windows (if any).

"need proper os.umask()")
class TestUmaskOf(unittest.TestCase):
# make target masks in here sufficiently exotic, away from 0o022

mask_private = 0o777
mask_public = 0o000

def get_mask(self):
os.umask(mask := os.umask(0))
return mask

def test_simple(self):
old_mask = self.get_mask()
target_mask = self.mask_private
self.assertNotEqual(old_mask, target_mask)

with umask_of(target_mask):
self.assertEqual(self.get_mask(), target_mask)
self.assertEqual(self.get_mask(), old_mask)

def test_reentrant(self):
old_mask = self.get_mask()
target_mask_1 = self.mask_private
target_mask_2 = self.mask_public
self.assertNotIn(old_mask, (target_mask_1, target_mask_2))
umask1, umask2 = umask_of(target_mask_1), umask_of(target_mask_2)

with umask1:
self.assertEqual(self.get_mask(), target_mask_1)
with umask2:
self.assertEqual(self.get_mask(), target_mask_2)
with umask1:
self.assertEqual(self.get_mask(), target_mask_1)
self.assertEqual(self.get_mask(), target_mask_2)
self.assertEqual(self.get_mask(), target_mask_1)
self.assertEqual(self.get_mask(), old_mask)

def test_exception(self):
old_mask = self.get_mask()
target_mask = self.mask_private
self.assertNotEqual(old_mask, target_mask)

try:
with umask_of(target_mask):
self.assertEqual(self.get_mask(), target_mask)
raise RuntimeError("boom")
except RuntimeError as re:
self.assertEqual(str(re), "boom")
self.assertEqual(self.get_mask(), old_mask)


if __name__ == '__main__':
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Add a :class:`~shutil.umask_of` context manager to :mod:`shutil`, to provide a
stdlib context manager for changing a process's umask
Loading