Skip to content

Comments

Do not overwrite if two packages provide the same file#423

Open
ahmadsharif1 wants to merge 8 commits intoconda:mainfrom
ahmadsharif1:ahmads_fix
Open

Do not overwrite if two packages provide the same file#423
ahmadsharif1 wants to merge 8 commits intoconda:mainfrom
ahmadsharif1:ahmads_fix

Conversation

@ahmadsharif1
Copy link

@ahmadsharif1 ahmadsharif1 commented Aug 11, 2025

This should fix issue: #422.

Description

  1. I fixed the silent failure issue in NoArchive where Exceptions were suppressed. In __exit__, it was returning self which is a truthy value and it would suppress exceptions. That explains why conda-pack was silently failing. Now we pass which returns None and propagates exceptions up the stack.
  2. I changed _add() in NoArchive to now try os.link and fallback to shutil.copy2 if that fails. Also if a target exists, it will delete it before running the link or copy, and thus overwrite it similar to other formats.
  3. This approach fixes another issue that I discovered where adding the conda-unpack script was throwing an error because we would create a tempfile on a different filesystem as the filesystem of the dir where we were creating the packed output. So now NoArchive checks for an exception for os.link on every _add() call instead of assuming all files live on the same filesystem.

Because of (3) above, now it prints this message when 2 packages provide the same file:

Packing environment at '/home/ahmads/conda_envs/myenv' to '/tmp/foo2'
[#########                               ] | 22% Completed |  2.7s
Traceback (most recent call last):
  File "/home/ahmads/conda_envs/myenv/lib/python3.10/site-packages/conda_pack/cli.py", line 153, in main
    pack(name=args.name,
  File "/home/ahmads/conda_envs/myenv/lib/python3.10/site-packages/conda_pack/core.py", line 606, in pack
    return env.pack(
  File "/home/ahmads/conda_envs/myenv/lib/python3.10/site-packages/conda_pack/core.py", line 432, in pack
    packer.add(f)
  File "/home/ahmads/conda_envs/myenv/lib/python3.10/site-packages/conda_pack/core.py", line 1164, in add
    self.archive.add(file.source, file.target)
  File "/home/ahmads/conda_envs/myenv/lib/python3.10/site-packages/conda_pack/formats.py", line 245, in add
    self._add(source, target)
  File "/home/ahmads/conda_envs/myenv/lib/python3.10/site-packages/conda_pack/formats.py", line 511, in _add
    self.copy_func(source, target_abspath)
  File "/home/ahmads/conda_envs/myenv/lib/python3.10/shutil.py", line 434, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/home/ahmads/conda_envs/myenv/lib/python3.10/shutil.py", line 252, in copyfile
    os.symlink(os.readlink(src), dst)
FileExistsError: [Errno 17] File exists: 'libexpat.so.1.9.2' -> '/tmp/foo2/lib/libexpat.so.1'

@conda-bot
Copy link
Contributor

We require contributors to sign our Contributor License Agreement and we don't have one on file for @ahmadsharif1.

In order for us to review and merge your code, please e-sign the Contributor License Agreement PDF. We then need to manually verify your signature, merge the PR (conda/infrastructure#1188), and ping the bot to refresh the PR.

self.copy_func = partial(os.link, follow_symlinks=False)
else:
self.copy_func = partial(shutil.copy2, follow_symlinks=False)
if os.lstat(source).st_dev == os.lstat(os.path.dirname(target_abspath)).st_dev:
Copy link

@kiukchung kiukchung Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seem to be cases where os.lstat(path).st_dev returns the same device id for both source and target but os.link(source, target) fails with OSError: [Errno 18] Invalid cross-device link:. So this check is not enough.

A more robust check is to catch the invalid cross-device link OSError and fall back to shutil.copy2. Since you don't want to do this for every file, it suffices to do this once between the first source (safe to assume that the the whole conda_prefix dir is on a single device) and the directory of target.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I found was archive.add is called from here as well:

def _write_text_file(self, fpath, ftext, executable=False):

which is called from here:

https://github.com/conda/conda-pack/blob/9a27fbbd57bf0abfd3dc0765ce7fc1b27dfdf960/conda_pack/core.py#L1290C18-L1290C34

So although the whole conda env is from a single dir, this script is created in /tmp which could be on a different device which triggers the error. I haven't benchmarked checking once vs. every time, but I have to assume that reading the device is much cheaper than doing the link or copy?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimistically calling os.link() is a fine strategy when the source and target are on the same device. But when they are not, we'd end up raising and catching an exception for every single file in the conda env. We should at least warnings.warn(...) to alert the user to use an --output directory on the same device as the conda env.

@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Aug 19, 2025
@ahmadsharif1
Copy link
Author

Thanks for the feedback @kiukchung . Please take another look.

@ahmadsharif1
Copy link
Author

@xhochy can you please take a look at the diff to see if it is acceptable to merge?

@xhochy
Copy link
Contributor

xhochy commented Sep 23, 2025

pre-commit.ci fix

try:
os.link(source, target_abspath, follow_symlinks=False)
except OSError as e:
if not self.printed_warning:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of only remembering that you printed the warning, could you please rewrite it so that on subsequant tries shutil.copy2 is directly used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't quite understand the request.

We always call shutil.copy2. It's just that we guard the warning with the boolean because the warning text is different every time and warnings.warn will keep printing them as the file paths are different in every message.

Do you want me to have an if/else block with 2 shutil.copy2 function calls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have something along the lines of:

if link_failed_previously:
  shutil.
else:
  try os.link

Sorry for only pseudocode, but I'm on mobile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed [bot] added once the contributor has signed the CLA

Projects

Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

4 participants