Skip to content

Conversation

@severo
Copy link
Contributor

@severo severo commented Nov 29, 2024

This PR implements the concept of filesystem and source.

A source is a URL (https://hyperparam.blob.core.windows.net/hyperparam/starcoderdata-js-00000-of-00065.parquet), or a path (folder/file.txt or even an empty path: ``).

A filesystem can check if it supports a given source, and if so, it provides features to handle it: determine if it's a file or a directory, split the source to populate the breadcrumb navigation, get the list of files in case of a directory, get the resolvable URL in case of a file, etc.

I implemented:

  • HttpFileSystem: supports any URL, and assumes it's a file
  • HyparamFileSystem: supports any path like folder1/text.txt, and handles files and folders

but it's extensible to:

It would also help handle multiple authentications depending on the file system.

The PR is quite big, sorry about that! I think it should help provide a better abstraction that parsedKey (#27) and help decouple the app and the components, since we use dependency inversion to create the URLs, for example (so: no more hardcode /files?key=, which fixes #22) .

@severo
Copy link
Contributor Author

severo commented Nov 29, 2024

I will add more tests, but I wanted to share as soon as possible to get comments about the concepts.

Copy link
Contributor

@platypii platypii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over all I like the refactor. The prop changes are good. The routing changes are good. 👍

The only part I don't really like is using classes for filesystem and source. I would prefer a more functional-programming style where the Source is a type union, and then a bunch of util functions operate on that. Similar to the file operations in utils.ts. Mostly an aesthetic choice. And it might be a little awkward here. But using serializable, immutable objects, and then pure functions operating on those objects, often makes life easier, either with web workers, react, or for unit testing. I could be convinced here though.

@severo
Copy link
Contributor Author

severo commented Dec 2, 2024

The main reason I used classes is the abstract concept. I can define a pattern that any FileSystem (and any Source) must fulfill, and the components can use filesystems and sources knowing that they will consistently implement a set of methods. The file systems are independent, and through inheritance, an app can implement a new file system, and the components can use it directly. If we define FileSystem only as the union of the file systems we have implemented, we lose the ability for a client to create new ones.

If we decide to only support a given set of filesystems and not allow extending it, indeed, we could switch to a type union and pure functions. Otherwise, I'm not sure how to implement it easily (should we have a global registry of filesystems, ie a "service locator", and provide a method to register a new file system? it seems complex).

What do you think?

Copy link
Contributor

@platypii platypii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not a fan of the abstract classes. It will become a problem if it needs to be sent to a webworker, since classes aren't serializable. I'd lean toward an interface with the methods (getFileName, etc) and then have implementations of the interface. This is how I implemented local/s3 filesystems in the other repo.

All that being said, I say we merge this. These are good changes, and we can make smaller refactors later if we want.

@severo
Copy link
Contributor Author

severo commented Dec 2, 2024

OK! So, let's move from classes to interfaces in another PR.

@severo severo merged commit 2c17bb1 into master Dec 2, 2024
4 checks passed
@severo severo deleted the remove-hardcoded-paths branch December 2, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[components] remove hardcoded backend routes

2 participants