-
Notifications
You must be signed in to change notification settings - Fork 48
Description
Originally posted by @cscutcher in #494
For me the big selling point of the library is accepting UPaths but being completely agnostic about what backend is in use. It's probably not possible to be 100% agnostic, but I think a good starting place would be if there were at least clear definitions documented for the meaning of
UPath(""),UPath("/"),UPath(".")andUPath("..").
All 4 examples you provide will return PosixUPath and WindowsUPath instances depending on your operating system. This is because the provided first argument is a non-uri-like path and the protocol keyword parameter is unset.
Both PosixUPath and WindowsUPath can basically be thought of as a pathlib.Path subclass with the additional attributes/methods that UPath provides.
In all four cases, before I started thinking about it in the context of UPath, I would have naively said I had a clear understanding of what those paths mean. However, once I started thinking about specifics, especially in the case of UPath's filesystem agnostic approach, I realised I was a bit clueless!
For context, in case it's helpful in your design considerations, I was using the memory backend primarily for testing, so in my case I really wanted behaviour to be as similar as possible to the local filesystem. In the end I awkwardly subclassed
MemoryFileSystemandMemoryPathso I work around this issue, but also to implement symlink support which I believe is missing inMemoryFileSystem. I imagine it's possible that my choice of usingMemoryFileSystemas a mock local filesystem, goes against the original intent for it, so maybe I was doomed from the start!
The fsspec MemoryFileSystem is indeed most commonly used as a testing filesystem. So your intuition was right here. In general I would avoid symlinks if you want cross-filesystem compatible interactions.
This is because on object store and many of the other filesystems symlinks don't exist. On some like http filesystems for example you could interpret redirects as symlinks, but if you go into the details it's non-trivial.
Another small comment, but how to handle relative paths in general seems an interesting challenge. I'm sure there are good reasons why this isn't the case, but it seems to me that relative paths shouldn't necessarily be tied to any protocol or backend. I can see why making
UPath("foo/bar")implicitly a path relative to cwd on the local file system, would be necessary to make.openetc work as a user might expect, but it would be nice to be able to have an explicitly relative path.
To me, in a backend agnostic world, a path likefoo/barshould only get tied to a specific backend when it's combined with some absolute path object, but on it's own it only states "the subdirectorybar, which is the subdirectory offoo" which should be possible to apply to any filesystem backend equally, if that makes sense.
Unfortunately, relative paths can't fully be decoupled from their filesystem implementations. This all stems from the fact, that (1) fsspec paths are always absolute and (2) they actually have no strict definition of what these paths can be. So a relative path foo/../bar, or foo//bar would mean something different on a local filesystem, vs an s3 bucket.
All that being said, you have a few options to get what you want:
- make a relative UPath: (while not supported directly from the constructor, you can make one via relative_to)
>>> from upath import UPath >>> UPath("s3://bucket/foo/bar").relative_to(UPath("s3://bucket/")) <relative S3Path 'foo/bar'>
- consistently use resolve() before file access to ensure
.and..are handled in a pathlib like interpretation:>>> from upath import UPath >>> UPath("bucket/foo/bar", protocol="s3").joinpath("../abc").resolve() S3Path('bucket/foo/abc', protocol='s3')
- in internal projects that require loads of path traversals, I usually tend to define all relative locations as
PurePosixPathinstances, and allow to provide a base UPath to determine the root of the absolute filesystem location.