Implement remote table registration calls#28
Conversation
kkoz
left a comment
There was a problem hiding this comment.
In the README, we should decide on whether or not we're capitalizing TileDB
README.md
Outdated
| validation process to ensure that the table seen by the server is the same one that | ||
| the user has requested the API to register. This is achieved by writing a "SecretToken" | ||
| key to the tiledb array metadata. The tiledb file seen by the server must have a key | ||
| matching the one provided in the registration call managed by omero2pandas. |
There was a problem hiding this comment.
The point of the SecretToken is not really validation - it's to ensure the client attempting to register a file has read access to that file.
| write_path = str(write_path) | ||
| output_path = str(output_path) | ||
| # Use a default chunk size if not set | ||
| chunk_size = chunk_size or 1000 |
There was a problem hiding this comment.
Is the default chunk_size in the function definition sufficient here?
There was a problem hiding this comment.
Functions above this one permit explicitly supplying chunk_size=None which is interpreted as "determine automatically". For most modes this process involves inspecting the input table to figure out how many rows the API can process at once, but this is largely irrelevant for local TileDB conversion.
It'd be possible to resolve this in the calls to create_tiledb by not passing chunk_size at all if it was None, but having to unpack this created more mess than just having a fallback here.
| content = _check_response(result) | ||
| ann_id = content["data"]["file_annotation"] | ||
| LOGGER.info(f"Registered table successfully as FileAnnotation {ann_id}") | ||
| return ann_id |
There was a problem hiding this comment.
Is there a reason we return the annotation ID instead of the Original File ID? In most places we use Original File ID for table identification.
There was a problem hiding this comment.
We use the annotation ID in omero table register
There was a problem hiding this comment.
We also do this elsewhere in the library. This was chosen on the basis that getting an OriginalFile given the FileAnnotation is easier than figuring out the FileAnnotation starting from the OriginalFile.
Co-authored-by: Kevin Kozlowski <kevin@glencoesoftware.com>
|
Testing: First I attempted to run the following: client = omero.client('localhost', 4064)
df = pd.read_csv(csv_path)
res = omero2pandas.upload_table(
df, 'o2p Test Table',
local_path='/home/kevin/testdata/tables/o2ptest.tiledb',
omero_connector=client,
username='kevin', password='mypass')
print(res)And got Seems supplying your own I then attempted the following: And got the following error: This is almost certainly because my local install is not configured for https. Do we want to have an option to support this or are we comfortable only supporting https-enabled servers? I proceeded with my testing by replacing However, I then got the following error: Not sure what's happening here - I'll look into it a bit more. There was also some unexpected naming behavior I encountered during testing. |
|
It looks like the |
This may be an issue with the csv being formatted for import with |
|
Confirmed this was the issue. I don't think most users would expect this tool to work with the |
Should be fixed now, handling the different client objects we may get is tricky so I'm glad I put a check in there 😄
Path parsing in Python without adding dependencies is more clunky than I'd like, so for now I'd hard-coded in https as a requirement. If we want to support http we could stick in an extra argument or something? Happy to hear ideas, especially those that don't involve urllib.
Nice catch, I missed that TileDBs are seen as directories by the os. For convenience I'd wanted to let the user supply folder names instead of full paths to dump the TileDB based on the table name instead. Pushed a tweak that should catch the TileDB nesting issue, but if that's not sufficient we can require absolute paths to .tiledb files as the inputs. Support for an
Fixed
👍 |
kkoz
left a comment
There was a problem hiding this comment.
Error handling and functionality working as expected.
|
I'm getting that same df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
omero2pandas.upload_table(df, "Table", parent_type='Image', parent_id=36445,
local_path="table.tiledb",
remote_path="/data/marc/tables/table1.tiledb")I've tried adding Auth here is through |
@mabruce It looks like you're trying to save to |
|
Yes, my assumption was that this would work like with |
|
Yes, you'd have to perform the steps individually to achieve that workflow. The wrapper is primarily intended for scenarios where storage is mounted on both the client and server machines. |

This PR adds the second stage of remote table registration intended for use with the omero-plus API.
Changes as follows:
omero2pandas.remote.create_tiledb(wasregister_table)omero-pyalready requiresrequestsso no new dependencies were needed