|
| 1 | +PEP: 778 |
| 2 | +Title: Supporting Symlinks in Wheels |
| 3 | +Author: Emma Harper Smith < [email protected]> |
| 4 | +Sponsor: Barry Warsaw < [email protected]> |
| 5 | +PEP-Delegate: Paul Moore < [email protected]> |
| 6 | +Discussions-To: https://discuss.python.org/t/pep-778-supporting-symlinks-in-wheels/53824 |
| 7 | +Status: Deferred |
| 8 | +Type: Standards Track |
| 9 | +Topic: Packaging |
| 10 | +Requires: 777 |
| 11 | +Created: 18-May-2024 |
| 12 | +Post-History: `10-Oct-2024 <https://discuss.python.org/t/pep-778-supporting-symlinks-in-wheels/53824>`__ |
| 13 | + |
| 14 | +Abstract |
| 15 | +======== |
| 16 | + |
| 17 | +Wheels currently do not handle symlinks well, copying content instead of making symlinks when |
| 18 | +installed. To properly handle distributing libraries in wheels, we propose a new ``LINKS`` |
| 19 | +metadata file to handle symlinks in a platform portable manner. This specification requires |
| 20 | +a new wheel major version, discussed in :pep:`777`. |
| 21 | + |
| 22 | +PEP Deferral |
| 23 | +============ |
| 24 | + |
| 25 | +This PEP has been deferred until a better compatibility story for major changes to the wheel |
| 26 | +format is established. Once a compatibility story is established for wheels which allows backwards |
| 27 | +incompatible behavior in an unobtrusive way, the following points should be addressed in this PEP: |
| 28 | + |
| 29 | +- Re-focus this topic to just symlinks for shared libraries on POSIX platforms, perhaps tied to |
| 30 | + platform tags? |
| 31 | +- Should the symlinks be materialized as file attributes in the archive or a ``LINKS`` file? |
| 32 | + Could it be encoded in ``RECORD``? |
| 33 | +- Clarify that this PEP is insufficient to be useful for :pep:`660` editable installs since it will no |
| 34 | + longer be cross platform. |
| 35 | +- Describe fallback behavior in instances where symlinks are unavailable on POSIX platforms. |
| 36 | + |
| 37 | +Motivation |
| 38 | +========== |
| 39 | + |
| 40 | +Today, symlinks in wheels get created as copies of files, as `the zipfile module |
| 41 | +<https://docs.python.org/3/library/zipfile.html>`_ in CPython `does not support handling symlinks |
| 42 | +in-place <https://github.com/python/cpython/issues/82102>`_ for security reasons. |
| 43 | + |
| 44 | +This `presents problems to projects that would like to ship large compiled libraries |
| 45 | +<https://pypackaging-native.github.io/other_issues/#lack-of-support-for-symlinks-in-wheels>`_ in |
| 46 | +wheels, as they must choose to either greatly increase the install size of the project on disk, |
| 47 | +or omit the symlink and potentially break some downstream use cases. |
| 48 | + |
| 49 | +To ship a library that can properly be loaded for runtime use or build time linking on POSIX, a |
| 50 | +library should follow the conventions of POSIX-style loader and linker search. The two main file names for |
| 51 | +the loader to use is the "soname" and the "real name". The "soname" is a file like |
| 52 | +``libfoo.so.3`` where ``3`` is a number that is incremented when the interface of the library |
| 53 | +changes. The "real name" is a file named like ``libfoo.so.3.1.4``, where the extra version |
| 54 | +information lets the loader find a specific version of a library. Finally, when compiling code to |
| 55 | +link against a library, the linker searches for a "linker name", named like ``libfoo.so``. A more |
| 56 | +detailed description is available in `this Linux documentation on shared libraries |
| 57 | +<https://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html>`_. To fully support all |
| 58 | +runtime and build time use cases, a project requires shipping all 3 files. Normally, this is |
| 59 | +handled on POSIX platforms by using symlinks, so that the library is not duplicated on disk 3 times. |
| 60 | + |
| 61 | +Returning to Python packaging, there are many popular projects which ship binary libraries, such as |
| 62 | +``numpy``, ``scipy``, and ``pyarrow``. Other site-packages ``dlopen`` libraries in other wheels, such as |
| 63 | +``pytorch`` and ``jax``. These projects currently rely on a single library in the wheel, but |
| 64 | +this can cause the linker to find the wrong library if there are system libraries that have a |
| 65 | +"real name" library version available. |
| 66 | + |
| 67 | +There is also the potential benefit that symlinks in wheels would allow for simpler editable |
| 68 | +installs by simply placing a symlink in the user's ``site-packages`` directory, but this PEP |
| 69 | +leaves that as an open question to be explored in a future PEP. |
| 70 | + |
| 71 | +Rationale |
| 72 | +========= |
| 73 | + |
| 74 | +To support the 3 main namings of a library used in loading and library linking on POSIX, we |
| 75 | +propose adding support for symlinks in Python wheels. To allow for tracking symlinks made, and to |
| 76 | +potentially support other platforms that may not support POSIX symlinks directly, we propose the |
| 77 | +use of a new wheel metadata file ``LINKS``, which will exist in the ``.dist-info`` directory alongside |
| 78 | +``METADATA``, ``RECORD``, and other metadata files. |
| 79 | + |
| 80 | +Using a ``LINKS`` file will allow for more cross-platform uses of symlink-like usage. On Windows, |
| 81 | +symlinks require either `a group policy allowing the user to make symlinks |
| 82 | +<https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-10/security/threat-protection/security-policy-settings/create-symbolic-links>`_ |
| 83 | +(e.g. by enabling `Dev Mode |
| 84 | +<https://learn.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development>`_) |
| 85 | +or Administrative permissions. This means that it may be the case that symlinks are unsupported on |
| 86 | +some user systems. By using a ``LINKS`` file, installers will be able to potentially use other |
| 87 | +methods for handling symlinks, such as junctions on Windows, where otherwise the installer would |
| 88 | +have to fail. |
| 89 | + |
| 90 | +This PEP also describes checks that installers must make when installing an updated wheel. These |
| 91 | +checks exist to handle security risks from allowing wheels to install symlinks. For more |
| 92 | +information on why these checks are important, see `Security Implications`_. |
| 93 | + |
| 94 | +Specification |
| 95 | +============= |
| 96 | + |
| 97 | +Wheel Major Version Bump |
| 98 | +------------------------ |
| 99 | + |
| 100 | +This PEP requires a wheel major version bump, so the ``Wheel-Version`` for wheels generated with |
| 101 | +``LINKS`` **MUST** be at least version ``2.0``, so that older installers do not silently fail to |
| 102 | +install symlinks and break user environments. For more see :pep:`777`. |
| 103 | + |
| 104 | +New ``LINKS`` Metadata File |
| 105 | +--------------------------- |
| 106 | + |
| 107 | +To enable cross-platform symlinks, this PEP introduces a new wheel metadata file, ``LINKS``. An |
| 108 | +example of a ``LINKS`` file is below:: |
| 109 | + |
| 110 | + my_package/libfoo.so.3.1.4,my_package/libfoo.so.3 |
| 111 | + my_package/libfoo.so.3,my_package/libfoo.so |
| 112 | + |
| 113 | +The format of ``LINKS``, as can seen above, is ``source_path,target_path`` where ``source_path`` |
| 114 | +is a path relative to the root of any namespace or package root in the wheel. ``target_path`` is a |
| 115 | +*non-dangling* path (i.e. a path that exists on the filesystem after the wheel's contents are |
| 116 | +extracted) in a package or namespace of any package in the wheel. This means that if a wheel |
| 117 | +contains multiple packages, all paths in packages in the wheel are acceptable. |
| 118 | + |
| 119 | +Installer Behavior Specification |
| 120 | +-------------------------------- |
| 121 | + |
| 122 | +Installers **MUST** resolve the paths of any link contained in the ``LINKS`` file *before* |
| 123 | +deciding if any ``source_path`` or ``target_path`` are valid. Installers **MUST** verify that |
| 124 | +``source_path`` and ``target_path`` are located inside any namespace or package coming from the |
| 125 | +wheel. Installers **MUST** reject cyclic symlinks in wheels. Installers **MAY** error if a long |
| 126 | +chain of symlinks (symlinks pointing to symlinks many times repeated) exceeds a limit set by the |
| 127 | +installer. |
| 128 | + |
| 129 | +Installers **MUST** follow the following steps when handling a wheel with symlinks: |
| 130 | + |
| 131 | +1. Check for the existence of a ``LINKS`` file in the ``.dist-info``. If it does not exist, |
| 132 | + no further steps are required. |
| 133 | +2. Extract all files in the wheel packages and data directory as in wheel 1.x. |
| 134 | +3. Verify that for each ``source_path`` and ``target_path`` pairs, the ``target_path`` exists in |
| 135 | + one of the package namespaces just extracted. |
| 136 | +4. Next, check that the installer can make some kind of link for each pair in the site directory. |
| 137 | + If the installer cannot make a link for the file/folder ``target_path`` for the current |
| 138 | + platform, an error **MUST** be raised. An example of a failure mode would be a POSIX symlink to |
| 139 | + a file target, where the installer is running on Windows and the installer cannot make |
| 140 | + symlinks but can make junctions. In this case the installer **MUST** error because it cannot |
| 141 | + handle the link. |
| 142 | +5. Finally, the installer **MUST** add a platform-relevant link between ``source_path`` and |
| 143 | + ``target_path``. |
| 144 | + |
| 145 | +Installers **MUST NOT** by default copy files instead of generating a symlink when handling |
| 146 | +symlinks. Installers **MAY** have such behavior available under an alternate configuration or |
| 147 | +command line flag. |
| 148 | + |
| 149 | +Build Backend Specification |
| 150 | +--------------------------- |
| 151 | + |
| 152 | +When creating a wheel, build backends **MUST** treat symlinks in the same way as its target when |
| 153 | +deciding whether to include the symlink in a wheel. Build backends **MUST** verify that there are |
| 154 | +no dangling symlinks in the ``LINKS`` file. Build backends **SHOULD** recognize platform-relevant |
| 155 | +symlinks that would be included in builds. On POSIX systems this is typically symlinks, on Windows this |
| 156 | +includes symlinks and junctions. |
| 157 | + |
| 158 | +Backwards Compatibility |
| 159 | +======================= |
| 160 | + |
| 161 | +Introducing symlinks would require an increment to the wheel format major version. This would mean |
| 162 | +new wheels that use the new wheel format would raise an error on older installer tools, per the |
| 163 | +`wheel specification |
| 164 | +<https://packaging.python.org/en/latest/specifications/binary-distribution-format/#file-contents>`_. |
| 165 | + |
| 166 | +Please see :pep:`777` on "Wheel 2.0". |
| 167 | + |
| 168 | +Security Implications |
| 169 | +===================== |
| 170 | + |
| 171 | +Symlinks can be quite dangerous if not handled carefully. A simple example would be if a user were |
| 172 | +to run ``sudo pip install malicious``, and there were no protections, then the malicious package |
| 173 | +could overwrite ``/etc/shadow`` and replace the password hash on the system, allowing malicious |
| 174 | +logins. |
| 175 | + |
| 176 | +This PEP lists several requirements on checks to run by installers on symlinks in wheels to ensure |
| 177 | +attacks like the one described above cannot happen. This means it is **critical** that installers |
| 178 | +carefully implement these security safeguards and prevent malicious use on package installation. |
| 179 | + |
| 180 | +In particular, the following checks **MUST** be made by installers: |
| 181 | + |
| 182 | +1. That the symlinks do not point outside of any packages or namespaces coming from the wheel |
| 183 | +2. That the symlinks are not dangling (the target exists at install time) |
| 184 | +3. That the symlinks are not cyclical, stopping after a certain depth of checking to avoid denial |
| 185 | + of service requests |
| 186 | + |
| 187 | +Do not follow symlinks on removal. |
| 188 | + |
| 189 | +How to Teach This |
| 190 | +================= |
| 191 | + |
| 192 | +End users should, once the changes have propagated through the ecosystem, transparently experience |
| 193 | +the benefits of symlinks in wheels. It is important for installers to give clear error messages if |
| 194 | +symlinks are unsupported on the platform, and explain why installation has failed. |
| 195 | + |
| 196 | +For people building libraries, documentation on ``packaging.python.org`` should describe the use |
| 197 | +cases and caveats (especially platform support) of symlinks in wheels. Otherwise it should be |
| 198 | +handled transparently by build backends in the same way any normal file would be handled. |
| 199 | + |
| 200 | +Reference Implementation |
| 201 | +======================== |
| 202 | + |
| 203 | +TODO |
| 204 | + |
| 205 | +Rejected Ideas |
| 206 | +============== |
| 207 | + |
| 208 | +Just Use POSIX Symlinks Everywhere |
| 209 | +---------------------------------- |
| 210 | + |
| 211 | +This PEP wants to allow for ``LINKS`` to be used for a potential future :pep:`660` editable |
| 212 | +installation. This future PEP should support Windows, so it may need to use junctions. |
| 213 | + |
| 214 | +Don't Use Junctions in ``LINKS`` |
| 215 | +-------------------------------- |
| 216 | + |
| 217 | +Junctions are a limited way to support symlinks between folders on Windows. They do not support |
| 218 | +files. This PEP allows for junctions as users may wish to only link folders to a different |
| 219 | +location, and future :pep:`660` implementations may need to rely on this feature. |
| 220 | + |
| 221 | +Put symlinks in the ``RECORD`` Metadata File |
| 222 | +-------------------------------------------- |
| 223 | + |
| 224 | +While this could be done, it would clutter the ``RECORD`` file. Furthermore the most |
| 225 | +straightforward implementation would place the target at the end of the record. This would |
| 226 | +make it harder to scan across the line and visually see what symlinks exist in the wheel. |
| 227 | + |
| 228 | +Library Maintainers Should Use Python to Locate Libraries |
| 229 | +--------------------------------------------------------- |
| 230 | + |
| 231 | +Using Python to locate libraries would be much easier. However, some libraries like ``libtorch`` |
| 232 | +are used by extension modules and themselves require loading dependencies. Some compiled libraries |
| 233 | +cannot use Python to find their loader dependencies. |
| 234 | + |
| 235 | +Include Support for Hardlinks |
| 236 | +----------------------------- |
| 237 | + |
| 238 | +This PEP does not specify any behavior around hardlinks. This is intentional. This is left as an |
| 239 | +extension to a future PEP. |
| 240 | + |
| 241 | +Open Issues |
| 242 | +=========== |
| 243 | + |
| 244 | +PEP 660 and Deferring Editable Installation Support |
| 245 | +--------------------------------------------------- |
| 246 | + |
| 247 | +This PEP leaves the specification and implementation of a :pep:`660` editable installation |
| 248 | +mechanism as unresolved for a later PEP; should that be specified in this PEP? |
| 249 | + |
| 250 | +Security |
| 251 | +-------- |
| 252 | + |
| 253 | +This PEP needs to be reviewed to make sure it would not allow for new security vulnerabilities. |
| 254 | +Are there other restrictions we should place on the source or target of symlinks to protect users? |
| 255 | + |
| 256 | +Allow inter-package symlinks |
| 257 | +---------------------------- |
| 258 | + |
| 259 | +This could be useful for projects that want to shard dependencies such as large libraries between |
| 260 | +wheels but make them available in the main parent wheel. |
| 261 | + |
| 262 | +The Format of ``LINKS`` |
| 263 | +----------------------- |
| 264 | + |
| 265 | +Currently the format is derived from ``RECORD``, but perhaps a better format exists. |
| 266 | + |
| 267 | +Previous Discussion |
| 268 | +=================== |
| 269 | + |
| 270 | +https://discuss.python.org/t/symbolic-links-in-wheels/1945/25 |
| 271 | + |
| 272 | + |
| 273 | +Copyright |
| 274 | +========= |
| 275 | + |
| 276 | +This document is placed in the public domain or under the |
| 277 | +CC0-1.0-Universal license, whichever is more permissive. |
0 commit comments