SNOW-2057867 refactor BindUploadAgent to make it work for Python sprocs#2303
Conversation
- relax the condition of using stage solution for array bind (so we will use bind_size >= threshold instead of bind_size > threshold) - use _upload_stream to implement BindUploadAgent ### Tests - the new test_direct_file_operation_utils.py to validate parse_file_operation for _upload and _upload_stream - existing test test_bindings.py to make sure the change does not break existing bind upload logic
|
Update: the following content is obsolete now, because we simplified this PR and removed these premature optimization in latest commit. (@sfc-gh-sfan 's comment, copied from the old PR #2300 (comment)):
Context: For public connector, we have the same interface and we want to keep the same user journey. Note that server is not able to parse and derive the base file name for us. So we do need to do the split. |
|
(@sfc-gh-sfan 's comment, copied from the old PR #2300 (comment)) Yes, this ( ) is the existing public connector implementation, note that thefile_stream=f is passed to execute, which does stream uploading instead of file uploading
|
|
(@sfc-gh-sfan 's comment, copied from the old PR #2300 (comment)) Removed |
| stage_location, unprefixed_local_file_name = stage_location.rsplit( | ||
| "/", maxsplit=1 |
There was a problem hiding this comment.
for the file name splitting: it is an existing thing for sprocs and we want to keep the same behavior here. (I provided more context down below)
Is this path used in stored proc? I somehow had the impression we override parse_file_operation
There was a problem hiding this comment.
Your impression is right, we don't use this code in sproc execution. But we will still rsplit in public connector though, because the interface of _upload_stream looks like
def _upload_stream(
self,
input_stream: IO[bytes],
stage_location: str,
options: dict[str, Any],
_do_reset: bool = True,
) -> None:We need rsplit stage_location to get the file name
|
@sfc-gh-sfan FYI I have removed the premature optimization, to simplify this change |
- We use TemporaryDirectory in this test. It's output path is single-back-slashed and could contain special characters. But `_upload` API expects normalized path. So we should do normalization and quote th path before passing it to _upload in test
- this has no effect for regular / non-sproc use cases because it is the default option - sproc require it to be explicitly present, so we need it here (we will have a future server side change to make it optional as well for sproc)
|
Hi @sfc-gh-sfan commit a3c66cf is the extra fix that I mentioned earlier. Could you please take a look and let me know if it looks good to you |
LGTM |
sfc-gh-mmishchenko
left a comment
There was a problem hiding this comment.
Looks good, besides that private methods/fields naming convention is violated now.
|
|
||
| with self._connection.cursor() as cursor: | ||
| # Send constructed SQL to server and get back parsing result. | ||
| processed_params = cursor._connection._process_params_qmarks(params, cursor) |
There was a problem hiding this comment.
Both _connection field and _process_params_qmarks method so far have followed private _ naming convention. Probably some sort of name refactoring is good to have here.
There was a problem hiding this comment.
I see, I guess your suggestion to do renaming like _process_params_qmarks -> process_params_qmarks ? I am a bit worried about confusing external customers, by making _process_params_qmarks public. Because normally they will directly use execute to run a SQL, without worrying about these binding param utils.
And here we are more or less using private / internal methods to implement it. Would it be less awkward, if I change to something like
processed_params = self._connection._process_params_qmarks(params, cursor)
(Such that it is more like we are accessing some internal fields of this current class, and therefore more natural)
|
The only failures in merge gate of "continuous integration" are
Those are irrelevant to this change |
Tests
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes #NNNN
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.
(Optional) PR for stored-proc connector: