[SVCS-552] folder_file_op cut down revalidate calls.#311
[SVCS-552] folder_file_op cut down revalidate calls.#311Johnetordoff wants to merge 5 commits intoCenterForOpenScience:developfrom
Conversation
cslzchen
left a comment
There was a problem hiding this comment.
Looks good and I will take over the ticket/PR. Two minor issues:
- Need to update Google Cloud as well which has been added recently.
- This change affects all path-based provider and will cause conflicts for sure. Proceed with caution 🚔 .
|
|
||
| @property | ||
| @abc.abstractmethod | ||
| def id(self) -> str: |
There was a problem hiding this comment.
Please update the DocStr to mention what this id is for both types.
| folder = await dest_provider.create_folder(dest_path, folder_precheck=False) | ||
|
|
||
| dest_path = await dest_provider.revalidate_path(dest_path.parent, dest_path.name, folder=dest_path.is_dir) | ||
| folder_path = dest_provider.path_from_metadata(dest_path.parent, folder) |
There was a problem hiding this comment.
Rename folder to folder_metadata (including the assign statement two lines above).
|
|
||
| @property | ||
| def id(self): | ||
| return self.build_path() |
There was a problem hiding this comment.
Since id is exactly the same, why not using return self.path? This is what other providers do.
- This id consolidates both path-based ID and id- based ID under a single property. - It is identical to the path object identifier which is used to revalidate paths.
- In addititon, remove superfluous condition statement `if not futures: continue`
6732325 to
a27b568
Compare
- `self.id` is identical to `self.path` since GCS is a path-based provider
- Intermediate commit. Please rebase or ammend. - Discuss the todos and comments, fix issues if any. Update docstring, comments and remove todos. - Need travis ci.
8adfb5c to
b61b2ec
Compare
cslzchen
left a comment
There was a problem hiding this comment.
@Johnetordoff and CC @felliott
I rebased and updated the code with extra comments for each provider affected. Need some further discussion with you before I can continue. Thanks.
| return 'googlecloud' | ||
|
|
||
| @property | ||
| def id(self) -> str: |
There was a problem hiding this comment.
Add GoogleCloud to pass tests.
|
|
||
| @property | ||
| @abc.abstractmethod | ||
| def id(self) -> str: |
| return 'googlecloud' | ||
|
|
||
| # GoogleCloud is a path-based provider. However, it never calls ``revalidate_path()`` since it | ||
| # is the backend storage provider for OSFStorage. |
There was a problem hiding this comment.
Similar to CloudFiles (when it was the storage provider), GoogleCloud never uses revalidate_path(). We should set the id to be None or empty so we know it is never used accidentally or cause any confusion in the further.
| return 'github' | ||
|
|
||
| # GitHub is a path-based provider. However, it has its own ``revalidate_path()``. | ||
| # TODO: Investigate if this is an issue. |
There was a problem hiding this comment.
GitHub may not work because your fix calls (functionally, not literally) the default revalidate_path() instead of the customized one.
| return 'dropbox' | ||
|
|
||
| # TODO: Is Dropbox path-based instead of id-based? | ||
| # TODO: Dropbox does not have a ``revalidate_path()``, should we use path instead of id? |
There was a problem hiding this comment.
There is no revalidate_path() for dropbox which makes guess that it is a path based (or at lease both)?
|
|
||
| # TODO: Should this be in ``DataverseFileMetadata``? | ||
| # TODO: Does ``self.raw['id']`` exists for ``DataverseDatasetMetadata``? | ||
| # TODO: DataVerse's ``revalidate_path()`` looks very different. Is this an issue? |
There was a problem hiding this comment.
Similar to GitHub, your fix calls (functionally, not literally) revalidate_path() differently than the original way. This might be an issue.
|
|
||
| # CloudFiles is a path-based provider. However, it never called ``revalidate_path()`` even when | ||
| # it was the backend storage provider for OSFStorage. The provider is no longer active until it | ||
| # is upgraded to an addon provider for the users. |
Ticket
https://openscience.atlassian.net/browse/SVCS-552
This PR replaces #287
Purpose
folder_file_op makes roughly a trillion metadata calls every time it moves a folder's children. This fix makes it easy to get a path from it's metadata object by giving the metadata an id which corresponds with the path identifier. Since we can now convert a metadata object into a path easily we no longer need to re-validate each time.
Changes
Side effects
None that I know of.
QA Notes
Copy/moves between providers should be unchanged or slightly faster.
Deployment Notes
None.