[SYNPY-1351, SYNPY-1613] Implement 'Wiki2' model into OOP #1206

danlu1 · 2025-06-11T20:55:17Z

Problem:

WikiPage Services 2 is not implemented in the client.
No asynchronous get_paginated function in client module.
download_from_url and download_from_url_multi_threaded are not supporting download directly via presigned url.

Solution:

Add a new Wiki2 service to Python Client
Add rest_get_paginated_async function to client module.
Update download_from_url and download_from_url_multi_threaded functions to allow file download directly via presigned url.

Testing:

(wip)

danlu1 · 2025-06-11T21:03:30Z

@BryanFauble @thomasyu888 I put this draft PR to show tentative interface of the wiki2 service. Protocol, tutorials and test scripts are wip. Feel free to leave comments if you envision it differently.

synapseclient/models/wiki.py

synapseclient/api/wiki_service.py

BryanFauble · 2025-06-11T21:19:23Z

A tutorial should be created showing how to use the Wiki objects:
https://sagebionetworks.jira.com/wiki/spaces/SYNPY/pages/3864690727/SYNPY-1549+Python+Client+New+Service+Implementation+SOP#Tutorials

synapseclient/models/wiki.py

BryanFauble

This is a great start to the changes!

Feel free to reach out to me with questions, and re-request review when you'd like me to take a look at this PR again.

synapseclient/api/wiki_service.py

thomasyu888 · 2025-06-13T14:41:14Z

synapseclient/models/protocols/wikipage_protocol.py

Great addition of the protocols

synapseclient/models/wiki.py

BryanFauble · 2025-06-25T17:30:32Z

synapseclient/models/wiki.py

+    markdown: Optional[str] = None
+    """The markdown content of this page."""
+
+    attachments: List[Dict[str, Any]] = field(default_factory=list)


Instead of returning a Dict here, could we instead return back a WikiAttachment dataclass? If one doesn't exist you'd need to create it

Here the attachments just a list of file path. For attachment-related endpoints, only url is given. Do you still think this is needed?

Looking at the Rest API Docs for the V2WikiPage: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/v2/wiki/V2WikiPage.html

On these lines you are setting:

markdown

attachments

owner_id

wiki_version

But - These fields don't exist on the Rest API docs. When you call the fill_from_dict, these would always be wiped out and replaced with None or [], since Synapse would never return these values.

I understand how these fields can be used during the store operations - But, how would a user read for these values?

You are right, V2WikiPage doesn't have the attributes you listed having them in the fill_from_dict would not be needed. I was using these attributes to assist user to list out parameters that can be used for api call, such as GET /entity/{ownerId}/wiki2, the user specifies owner_id when constructing the wikipage object and call get to get the wikipage for the owner_id.

synapseclient/models/wiki.py

BryanFauble · 2025-06-25T17:37:07Z

synapseclient/models/wiki.py

+                    file_path = self._to_gzip_file(
+                        wiki_content=attachment, synapse_client=client
+                    )
+                    file_handle = await upload_file_handle(


There is an improvement that can be made here. Suppose you have 10 attachments. This code as written will upload all 10 attachments sequentially one after the other.

Using asyncio.as_completed, you can execute the upload process in parallel for all 10 attachments. A significant performance gain.

There are examples In the code to show how it can be used.

synapsePythonClient/synapseclient/models/mixins/access_control.py

Line 94 in 186552f

for completed_task in asyncio.as_completed(tasks):

Added gather function to run tasks in parallel since I would like to get the file handle the same order as the given file path.

Gotcha, using the gather should work fine. My only request is that you wrap the call to upload_file_handle in this context manager:

async with synapse_client._get_parallel_file_transfer_semaphore( asyncio_event_loop=asyncio.get_running_loop() ): file_handle = await upload_file_handle( syn=synapse_client, parent_entity_id=self.owner_id, path=file_path, )

The reason why can be found in the method docstring:

synapsePythonClient/synapseclient/client.py

Lines 562 to 599 in e56540e

def _get_parallel_file_transfer_semaphore(

self, asyncio_event_loop: asyncio.AbstractEventLoop

) -> asyncio.Semaphore:

"""

Retrieve the semaphore for the Synapse client. Or create a new one if it does

not exist. This semaphore is used to limit the number of files that can actively

enter the uploading/downloading process.

This is expected to be called from within an AsyncIO loop.

By default the number of files that can enter the "uploading" state will be

limited to 2 * max_threads. This is to ensure that the files that are entering

into the "uploading" state will have priority to finish. Additionally, it means

that there should be a good spread of files getting up to the "uploading"

state, entering the "uploading" state, and finishing the "uploading" state.

If we break these states down into large components they would look like:

- Before "uploading" state: HTTP rest calls to retrieve what data Synapse has

- Entering "uploading" state: MD5 calculation and HTTP rest calls to determine

how/where to upload a file to.

- During "uploading" state: Uploading the file to a storage provider.

- After "uploading" state: HTTP rest calls to finalize the upload.

This has not yet been applied to parallel file downloads. That will take place

later on.

"""

if (

hasattr(self, "_parallel_file_transfer_semaphore")

and asyncio_event_loop in self._parallel_file_transfer_semaphore

and self._parallel_file_transfer_semaphore[asyncio_event_loop] is not None

):

return self._parallel_file_transfer_semaphore[asyncio_event_loop]

self._parallel_file_transfer_semaphore.update(

{asyncio_event_loop: asyncio.Semaphore(max(self.max_threads * 2, 1))}

)

return self._parallel_file_transfer_semaphore[asyncio_event_loop]

synapseclient/models/wiki.py

BryanFauble · 2025-06-25T17:45:54Z

synapseclient/models/wiki.py

+            The created wiki page.
+
+        Raises:
+            ValueError: If owner_id is not provided or if required fields are missing.


Each of the method for both the async and non-async functions should have at least 1 example to show how to use and call the function, along with the required imports. Ideally one could copy/paste the code, minimally change something, and run it.

synapseclient/models/wiki.py

BryanFauble · 2025-06-25T17:51:08Z

synapseclient/models/wiki.py

+                raise ValueError("Must provide download_location to download a file.")
+
+            # construct PresignedUrlInfo for downloading
+            presigned_url_info = PresignedUrlInfo(


It doesn't seem like you need to construct this object because you turn around and directly use the attributes off the object to pass to subsequent function calls. You could just directly pass those values to the function calls.

synapseclient/models/wiki.py

BryanFauble

Great progress! Please re-request review when you are ready for another round.

…nd split it int small functions

synapseclient/models/wiki.py

danlu1 added 3 commits June 11, 2025 13:44

add Low-level Functionality to interact with wikipage2 endpoints

fef882b

remove unused module

977413b

resort

b2a5840

danlu1 requested a review from a team as a code owner June 11, 2025 20:55

danlu1 marked this pull request as draft June 11, 2025 20:56

BryanFauble reviewed Jun 11, 2025

View reviewed changes

synapseclient/models/wiki.py Outdated Show resolved Hide resolved

BryanFauble reviewed Jun 11, 2025

View reviewed changes

synapseclient/api/wiki_service.py Outdated Show resolved Hide resolved