Do not overwrite if two packages provide the same file#423
Do not overwrite if two packages provide the same file#423ahmadsharif1 wants to merge 8 commits intoconda:mainfrom
Conversation
This should fix issue: conda#422.
|
We require contributors to sign our Contributor License Agreement and we don't have one on file for @ahmadsharif1. In order for us to review and merge your code, please e-sign the Contributor License Agreement PDF. We then need to manually verify your signature, merge the PR (conda/infrastructure#1188), and ping the bot to refresh the PR. |
conda_pack/formats.py
Outdated
| self.copy_func = partial(os.link, follow_symlinks=False) | ||
| else: | ||
| self.copy_func = partial(shutil.copy2, follow_symlinks=False) | ||
| if os.lstat(source).st_dev == os.lstat(os.path.dirname(target_abspath)).st_dev: |
There was a problem hiding this comment.
there seem to be cases where os.lstat(path).st_dev returns the same device id for both source and target but os.link(source, target) fails with OSError: [Errno 18] Invalid cross-device link:. So this check is not enough.
A more robust check is to catch the invalid cross-device link OSError and fall back to shutil.copy2. Since you don't want to do this for every file, it suffices to do this once between the first source (safe to assume that the the whole conda_prefix dir is on a single device) and the directory of target.
There was a problem hiding this comment.
What I found was archive.add is called from here as well:
Line 1220 in 9a27fbb
which is called from here:
So although the whole conda env is from a single dir, this script is created in /tmp which could be on a different device which triggers the error. I haven't benchmarked checking once vs. every time, but I have to assume that reading the device is much cheaper than doing the link or copy?
There was a problem hiding this comment.
optimistically calling os.link() is a fine strategy when the source and target are on the same device. But when they are not, we'd end up raising and catching an exception for every single file in the conda env. We should at least warnings.warn(...) to alert the user to use an --output directory on the same device as the conda env.
1c430ca to
d0d52bc
Compare
|
Thanks for the feedback @kiukchung . Please take another look. |
|
@xhochy can you please take a look at the diff to see if it is acceptable to merge? |
|
pre-commit.ci fix |
| try: | ||
| os.link(source, target_abspath, follow_symlinks=False) | ||
| except OSError as e: | ||
| if not self.printed_warning: |
There was a problem hiding this comment.
Instead of only remembering that you printed the warning, could you please rewrite it so that on subsequant tries shutil.copy2 is directly used?
There was a problem hiding this comment.
Sorry, I don't quite understand the request.
We always call shutil.copy2. It's just that we guard the warning with the boolean because the warning text is different every time and warnings.warn will keep printing them as the file paths are different in every message.
Do you want me to have an if/else block with 2 shutil.copy2 function calls?
There was a problem hiding this comment.
I would like to have something along the lines of:
if link_failed_previously:
shutil.
else:
try os.link
Sorry for only pseudocode, but I'm on mobile.
This should fix issue: #422.
Description
__exit__, it was returning self which is a truthy value and it would suppress exceptions. That explains why conda-pack was silently failing. Now we pass which returns None and propagates exceptions up the stack._add()inNoArchiveto now tryos.linkand fallback toshutil.copy2if that fails. Also if a target exists, it will delete it before running the link or copy, and thus overwrite it similar to other formats.conda-unpackscript was throwing an error because we would create a tempfile on a different filesystem as the filesystem of the dir where we were creating the packed output. So nowNoArchivechecks for an exception foros.linkon every_add()call instead of assuming all files live on the same filesystem.Because of (3) above, now it prints this message when 2 packages provide the same file: