-
Notifications
You must be signed in to change notification settings - Fork 72
[SYNPY-1351, SYNPY-1613] Implement 'Wiki2' model into OOP #1206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
@BryanFauble @thomasyu888 I put this draft PR to show tentative interface of the wiki2 service. Protocol, tutorials and test scripts are wip. Feel free to leave comments if you envision it differently. |
A tutorial should be created showing how to use the Wiki objects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start to the changes!
Feel free to reach out to me with questions, and re-request review when you'd like me to take a look at this PR again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition of the protocols
markdown: Optional[str] = None | ||
"""The markdown content of this page.""" | ||
|
||
attachments: List[Dict[str, Any]] = field(default_factory=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of returning a Dict here, could we instead return back a WikiAttachment dataclass? If one doesn't exist you'd need to create it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the attachments just a list of file path. For attachment-related endpoints, only url is given. Do you still think this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the Rest API Docs for the V2WikiPage
: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/v2/wiki/V2WikiPage.html
On these lines you are setting:
markdown
attachments
owner_id
wiki_version
But - These fields don't exist on the Rest API docs. When you call the fill_from_dict
, these would always be wiped out and replaced with None
or []
, since Synapse would never return these values.
I understand how these fields can be used during the store operations - But, how would a user read for these values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, V2WikiPage doesn't have the attributes you listed having them in the fill_from_dict would not be needed. I was using these attributes to assist user to list out parameters that can be used for api call, such as GET /entity/{ownerId}/wiki2, the user specifies owner_id when constructing the wikipage object and call get to get the wikipage for the owner_id.
file_path = self._to_gzip_file( | ||
wiki_content=attachment, synapse_client=client | ||
) | ||
file_handle = await upload_file_handle( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an improvement that can be made here. Suppose you have 10 attachments. This code as written will upload all 10 attachments sequentially one after the other.
Using asyncio.as_completed, you can execute the upload process in parallel for all 10 attachments. A significant performance gain.
There are examples In the code to show how it can be used.
for completed_task in asyncio.as_completed(tasks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added gather
function to run tasks in parallel since I would like to get the file handle the same order as the given file path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, using the gather should work fine. My only request is that you wrap the call to upload_file_handle
in this context manager:
async with synapse_client._get_parallel_file_transfer_semaphore(
asyncio_event_loop=asyncio.get_running_loop()
):
file_handle = await upload_file_handle(
syn=synapse_client,
parent_entity_id=self.owner_id,
path=file_path,
)
The reason why can be found in the method docstring:
synapsePythonClient/synapseclient/client.py
Lines 562 to 599 in e56540e
def _get_parallel_file_transfer_semaphore( | |
self, asyncio_event_loop: asyncio.AbstractEventLoop | |
) -> asyncio.Semaphore: | |
""" | |
Retrieve the semaphore for the Synapse client. Or create a new one if it does | |
not exist. This semaphore is used to limit the number of files that can actively | |
enter the uploading/downloading process. | |
This is expected to be called from within an AsyncIO loop. | |
By default the number of files that can enter the "uploading" state will be | |
limited to 2 * max_threads. This is to ensure that the files that are entering | |
into the "uploading" state will have priority to finish. Additionally, it means | |
that there should be a good spread of files getting up to the "uploading" | |
state, entering the "uploading" state, and finishing the "uploading" state. | |
If we break these states down into large components they would look like: | |
- Before "uploading" state: HTTP rest calls to retrieve what data Synapse has | |
- Entering "uploading" state: MD5 calculation and HTTP rest calls to determine | |
how/where to upload a file to. | |
- During "uploading" state: Uploading the file to a storage provider. | |
- After "uploading" state: HTTP rest calls to finalize the upload. | |
This has not yet been applied to parallel file downloads. That will take place | |
later on. | |
""" | |
if ( | |
hasattr(self, "_parallel_file_transfer_semaphore") | |
and asyncio_event_loop in self._parallel_file_transfer_semaphore | |
and self._parallel_file_transfer_semaphore[asyncio_event_loop] is not None | |
): | |
return self._parallel_file_transfer_semaphore[asyncio_event_loop] | |
self._parallel_file_transfer_semaphore.update( | |
{asyncio_event_loop: asyncio.Semaphore(max(self.max_threads * 2, 1))} | |
) | |
return self._parallel_file_transfer_semaphore[asyncio_event_loop] |
synapseclient/models/wiki.py
Outdated
The created wiki page. | ||
|
||
Raises: | ||
ValueError: If owner_id is not provided or if required fields are missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each of the method for both the async and non-async functions should have at least 1 example to show how to use and call the function, along with the required imports. Ideally one could copy/paste the code, minimally change something, and run it.
raise ValueError("Must provide download_location to download a file.") | ||
|
||
# construct PresignedUrlInfo for downloading | ||
presigned_url_info = PresignedUrlInfo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like you need to construct this object because you turn around and directly use the attributes off the object to pass to subsequent function calls. You could just directly pass those values to the function calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress! Please re-request review when you are ready for another round.
…nd split it int small functions
Problem:
Solution:
Testing:
(wip)