-
Notifications
You must be signed in to change notification settings - Fork 1
feat(zb-write): Support write mode in Zonal File #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: internal-main
Are you sure you want to change the base?
Changes from 11 commits
cce7e99
1a855e0
c38f1ee
c104f3d
8427749
64f2513
feea4d3
0c719f5
ca976d6
866d61e
9fdcbe0
81fb615
5112276
0c7ae3f
83581b2
bed5bc1
57ca20a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1864,6 +1864,7 @@ def __init__( | |
| self.bucket = bucket | ||
| self.key = key | ||
| self.acl = acl | ||
| self.consistency = consistency | ||
| self.checker = get_consistency_checker(consistency) | ||
|
|
||
| if "a" in self.mode: | ||
|
|
@@ -2073,6 +2074,18 @@ def _convert_fixed_key_metadata(metadata, *, from_google=False): | |
|
|
||
|
|
||
| async def upload_chunk(fs, location, data, offset, size, content_type): | ||
| from google.cloud.storage._experimental.asyncio.async_appendable_object_writer import ( | ||
| AsyncAppendableObjectWriter, | ||
| ) | ||
|
|
||
| from .extended_gcsfs import ExtendedGcsFileSystem | ||
| from .extended_gcsfs import upload_chunk as ext_upload_chunk | ||
|
|
||
| if isinstance(fs, ExtendedGcsFileSystem) and isinstance( | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets add clear documentation here whats the new behaviour, condition is added instead of overriding functionality in ExtendedFileSystem to avoid change in imports etc. Same for similar methods
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added. |
||
| location, AsyncAppendableObjectWriter | ||
| ): | ||
|
|
||
| return await ext_upload_chunk(fs, location, data, offset, size, content_type) | ||
| head = {} | ||
| l = len(data) | ||
| range = "bytes %i-%i/%i" % (offset, offset + l - 1, size) | ||
|
|
@@ -2101,6 +2114,22 @@ async def initiate_upload( | |
| mode="overwrite", | ||
| kms_key_name=None, | ||
| ): | ||
| from .extended_gcsfs import ExtendedGcsFileSystem | ||
| from .extended_gcsfs import initiate_upload as ext_initiate_upload | ||
|
|
||
| if isinstance(fs, ExtendedGcsFileSystem) and await fs._is_zonal_bucket(bucket): | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why isinstance needed here? isnt is_zonal check sufficient to trigger new functionality
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is_zonal_bucket method belongs to ExtendedGcsFileSystem. So the first check is there to make sure is_zonal_bucket method is found in the class. |
||
|
|
||
| return await ext_initiate_upload( | ||
| fs, | ||
| bucket, | ||
| key, | ||
| content_type, | ||
| metadata, | ||
| fixed_key_metadata, | ||
| mode, | ||
| kms_key_name, | ||
| ) | ||
|
|
||
| j = {"name": key} | ||
| if metadata: | ||
| j["metadata"] = metadata | ||
|
|
@@ -2135,6 +2164,24 @@ async def simple_upload( | |
| mode="overwrite", | ||
| kms_key_name=None, | ||
| ): | ||
| from .extended_gcsfs import ExtendedGcsFileSystem | ||
| from .extended_gcsfs import simple_upload as ext_simple_upload | ||
|
|
||
| if isinstance(fs, ExtendedGcsFileSystem) and await fs._is_zonal_bucket(bucket): | ||
|
|
||
| return await ext_simple_upload( | ||
| fs, | ||
| bucket, | ||
| key, | ||
| datain, | ||
| metadatain, | ||
| consistency, | ||
| content_type, | ||
| fixed_key_metadata, | ||
| mode, | ||
| kms_key_name, | ||
| ) | ||
|
|
||
| checker = get_consistency_checker(consistency) | ||
| path = f"{fs._location}/upload/storage/v1/b/{quote(bucket)}/o" | ||
| metadata = {"name": key} | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,7 @@ | |
| from google.api_core import exceptions as api_exceptions | ||
| from google.api_core import gapic_v1 | ||
| from google.api_core.client_info import ClientInfo | ||
| from google.auth.credentials import AnonymousCredentials | ||
| from google.cloud import storage_control_v2 | ||
| from google.cloud.storage._experimental.asyncio.async_grpc_client import AsyncGrpcClient | ||
| from google.cloud.storage._experimental.asyncio.async_multi_range_downloader import ( | ||
|
|
@@ -48,6 +49,9 @@ def __init__(self, *args, **kwargs): | |
| super().__init__(*args, **kwargs) | ||
| self.grpc_client = None | ||
| self.storage_control_client = None | ||
| self.credential = self.credentials.credentials | ||
| if self.credentials.token == "anon": | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lets add documentation so reader is clear why only anon has to be added differently, something like anon is used tests to bypass credentials
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since this is separate from writes, does it make sense to seggregate the PRs so this can be merged easily. Also verify if all existing also tests run with experimental true, AFAIK creds was the only issue ? |
||
| self.credential = AnonymousCredentials() | ||
| # initializing grpc and storage control client for Hierarchical and | ||
| # zonal bucket operations | ||
| self.grpc_client = asyn.sync(self.loop, self._create_grpc_client) | ||
|
|
@@ -59,6 +63,7 @@ def __init__(self, *args, **kwargs): | |
| async def _create_grpc_client(self): | ||
| if self.grpc_client is None: | ||
| return AsyncGrpcClient( | ||
| credentials=self.credential, | ||
| client_info=ClientInfo(user_agent=f"{USER_AGENT}/{version}"), | ||
| ).grpc_client | ||
| else: | ||
|
|
@@ -71,7 +76,7 @@ async def _create_control_plane_client(self): | |
| user_agent=f"{USER_AGENT}/{version}" | ||
| ) | ||
| return storage_control_v2.StorageControlAsyncClient( | ||
| credentials=self.credentials.credentials, client_info=client_info | ||
| credentials=self.credential, client_info=client_info | ||
| ) | ||
|
|
||
| async def _lookup_bucket_type(self, bucket): | ||
|
|
@@ -243,3 +248,41 @@ async def _cat_file(self, path, start=None, end=None, mrd=None, **kwargs): | |
| # Explicit cleanup if we created the MRD | ||
| if mrd_created: | ||
| await mrd.close() | ||
|
|
||
|
|
||
| async def upload_chunk(fs, location, data, offset, size, content_type): | ||
| raise NotImplementedError( | ||
| "upload_chunk is not implemented yet for ExtendedGcsFileSystem. Please use write() instead." | ||
|
||
| ) | ||
|
|
||
|
|
||
| async def initiate_upload( | ||
| fs, | ||
| bucket, | ||
| key, | ||
| content_type="application/octet-stream", | ||
| metadata=None, | ||
| fixed_key_metadata=None, | ||
| mode="overwrite", | ||
| kms_key_name=None, | ||
| ): | ||
| raise NotImplementedError( | ||
| "initiate_upload is not implemented yet for ExtendedGcsFileSystem. Please use write() instead." | ||
| ) | ||
|
|
||
|
|
||
| async def simple_upload( | ||
| fs, | ||
| bucket, | ||
| key, | ||
| datain, | ||
| metadatain=None, | ||
| consistency=None, | ||
| content_type="application/octet-stream", | ||
| fixed_key_metadata=None, | ||
| mode="overwrite", | ||
| kms_key_name=None, | ||
| ): | ||
| raise NotImplementedError( | ||
| "simple_upload is not implemented yet for ExtendedGcsFileSystem. Please use write() instead." | ||
| ) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding ignore on CI, please do the same in tests itself as added in main
reason: fsspec also run all gcsfs tests instead of adding --ignore in all ci rules, should be contained in test itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added check for experimental env variable in the test file