-
Notifications
You must be signed in to change notification settings - Fork 322
feat: update BigQueryClient methods #2273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9250e7b
e14f965
9c71e52
f9a0953
cdc4b52
548d7c6
6d188e1
6c9aa08
48ed583
f1d8291
0833f43
0fb94ea
2209257
32cd996
002fbdb
b295d74
8e55738
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,7 @@ | |
|
|
||
| import os | ||
| from typing import ( | ||
| Dict, | ||
| Optional, | ||
| Sequence, | ||
| Tuple, | ||
|
|
@@ -34,6 +35,7 @@ | |
| # Import types modules (to access *Requests classes) | ||
| from google.cloud.bigquery_v2.types import ( | ||
| dataset, | ||
| dataset_reference, | ||
| job, | ||
| model, | ||
| ) | ||
|
|
@@ -43,139 +45,200 @@ | |
| from google.api_core import retry as retries | ||
| from google.auth import credentials as auth_credentials | ||
|
|
||
| # Create a type alias | ||
| # Create type aliases | ||
| try: | ||
| OptionalRetry = Union[retries.Retry, gapic_v1.method._MethodDefault, None] | ||
| except AttributeError: # pragma: NO COVER | ||
| OptionalRetry = Union[retries.Retry, object, None] # type: ignore | ||
|
|
||
| # TODO: This line is here to simplify prototyping, etc. | ||
| PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT") | ||
| DatasetIdentifier = Union[str, dataset_reference.DatasetReference] | ||
|
|
||
| # TODO: This variable is here to simplify prototyping, etc. | ||
| PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: When using
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That upgrade is handled in this separate PR. |
||
| DEFAULT_RETRY: OptionalRetry = gapic_v1.method.DEFAULT | ||
| DEFAULT_TIMEOUT: Union[float, object] = gapic_v1.method.DEFAULT | ||
| DEFAULT_METADATA: Sequence[Tuple[str, Union[str, bytes]]] = () | ||
|
|
||
|
|
||
| # Create Centralized Client | ||
| class BigQueryClient: | ||
| """A centralized client for BigQuery API.""" | ||
|
|
||
| def __init__( | ||
| self, | ||
| *, | ||
| credentials: Optional[auth_credentials.Credentials] = None, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the past, Likewise, Speaking of commonly used arguments: Fun fact, the Kaggle team was (is?) using
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for these insights into what customers commonly pass into the client constructor.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For those who are reading, this may provide some good context: Right now, if you ran the As part of this alpha, we are trying to enable one basic transmogrification: allow a user to continue to be able to supply a "project_id_value.dataset_id_value" string to the method (if this proves useful and universally generatable, other convenience transformers will follow). This is done by injecting several helper functions that can invisibly accept the string and create a
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Including For the moment, I have not made any attempts to also process |
||
| client_options: Optional[Union[client_options_lib.ClientOptions, dict]] = None, | ||
| ): | ||
| self._clients = {} | ||
| """ | ||
| Initializes the BigQueryClient. | ||
| Args: | ||
| credentials: | ||
| The credentials to use for authentication. If not provided, the | ||
| client will attempt to use the default credentials. | ||
| client_options: | ||
| A dictionary of client options to pass to the underlying | ||
| service clients. | ||
|
Comment on lines
+79
to
+81
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The type above also allows for a single
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the alpha, I may leave out the |
||
| """ | ||
|
|
||
| self._clients: Dict[str, object] = {} | ||
| self._credentials = credentials | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll want to call
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That upgrade is handled in this separate PR. |
||
| self._client_options = client_options | ||
| self.project = PROJECT_ID | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might want to upgrade this to a
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That upgrade is handled in this separate PR. |
||
|
|
||
| # --- HELPER METHODS --- | ||
| def _parse_dataset_path(self, dataset_path: str) -> Tuple[Optional[str], str]: | ||
| """ | ||
| Helper to parse project_id and/or dataset_id from a string identifier. | ||
| Args: | ||
| dataset_path: A string in the format 'project_id.dataset_id' or | ||
| 'dataset_id'. | ||
| Returns: | ||
| A tuple of (project_id, dataset_id). | ||
| """ | ||
| if "." in dataset_path: | ||
| # Use rsplit to handle legacy paths like `google.com:my-project.my_dataset`. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For my education, is |
||
| project_id, dataset_id = dataset_path.rsplit(".", 1) | ||
| return project_id, dataset_id | ||
| return self.project, dataset_path | ||
|
|
||
| def _parse_dataset_id_to_dict(self, dataset_id: DatasetIdentifier) -> dict: | ||
| """ | ||
| Helper to create a dictionary from a project_id and dataset_id to pass | ||
| internally between helper functions. | ||
| Args: | ||
| dataset_id: A string or DatasetReference. | ||
| Returns: | ||
| A dict of {"project_id": project_id, "dataset_id": dataset_id_str }. | ||
| """ | ||
| if isinstance(dataset_id, str): | ||
| project_id, dataset_id_str = self._parse_dataset_path(dataset_id) | ||
| return {"project_id": project_id, "dataset_id": dataset_id_str} | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Curious to see
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not information that is sent directly to the API. I will include a note to this effect in the docstring of the helper funcs.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
| elif isinstance(dataset_id, dataset_reference.DatasetReference): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case, can't we use the https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.MessageToDict or the proto-plus equivalent?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not intended to be sent to the API. It is internal use only. |
||
| return { | ||
| "project_id": dataset_id.project_id, | ||
| "dataset_id": dataset_id.dataset_id, | ||
| } | ||
| else: | ||
| raise TypeError(f"Invalid type for dataset_id: {type(dataset_id)}") | ||
|
|
||
| def _parse_project_id_to_dict(self, project_id: Optional[str] = None) -> dict: | ||
| """Helper to create a request dictionary from a project_id.""" | ||
| final_project_id = project_id or self.project | ||
| return {"project_id": final_project_id} | ||
|
|
||
| # --- *SERVICECLIENT ATTRIBUTES --- | ||
| @property | ||
| def dataset_service_client(self): | ||
| if "dataset" not in self._clients: | ||
| from google.cloud.bigquery_v2.services import dataset_service | ||
|
|
||
| self._clients["dataset"] = dataset_service.DatasetServiceClient( | ||
| credentials=self._credentials, client_options=self._client_options | ||
| ) | ||
| return self._clients["dataset"] | ||
|
|
||
| @dataset_service_client.setter | ||
| def dataset_service_client(self, value): | ||
| # Check for the methods the centralized client exposes (to allow duck-typing) | ||
| required_methods = [ | ||
| "get_dataset", | ||
| "insert_dataset", | ||
| "patch_dataset", | ||
| "update_dataset", | ||
| "delete_dataset", | ||
| "list_datasets", | ||
| "undelete_dataset", | ||
| ] | ||
| for method in required_methods: | ||
| if not hasattr(value, method) or not callable(getattr(value, method)): | ||
| raise AttributeError( | ||
| f"Object assigned to dataset_service_client is missing a callable '{method}' method." | ||
| ) | ||
| if not isinstance(value, dataset_service.DatasetServiceClient): | ||
| raise TypeError( | ||
| "Expected an instance of dataset_service.DatasetServiceClient." | ||
| ) | ||
| self._clients["dataset"] = value | ||
|
|
||
| @property | ||
| def job_service_client(self): | ||
| if "job" not in self._clients: | ||
| from google.cloud.bigquery_v2.services import job_service | ||
|
|
||
| self._clients["job"] = job_service.JobServiceClient( | ||
| credentials=self._credentials, client_options=self._client_options | ||
| ) | ||
| return self._clients["job"] | ||
|
|
||
| @job_service_client.setter | ||
| def job_service_client(self, value): | ||
| required_methods = [ | ||
| "get_job", | ||
| "insert_job", | ||
| "cancel_job", | ||
| "delete_job", | ||
| "list_jobs", | ||
| ] | ||
| for method in required_methods: | ||
| if not hasattr(value, method) or not callable(getattr(value, method)): | ||
| raise AttributeError( | ||
| f"Object assigned to job_service_client is missing a callable '{method}' method." | ||
| ) | ||
| if not isinstance(value, job_service.JobServiceClient): | ||
| raise TypeError("Expected an instance of job_service.JobServiceClient.") | ||
| self._clients["job"] = value | ||
|
|
||
| @property | ||
| def model_service_client(self): | ||
| if "model" not in self._clients: | ||
| from google.cloud.bigquery_v2.services import model_service | ||
|
|
||
| self._clients["model"] = model_service.ModelServiceClient( | ||
| credentials=self._credentials, client_options=self._client_options | ||
| ) | ||
| return self._clients["model"] | ||
|
|
||
| @model_service_client.setter | ||
| def model_service_client(self, value): | ||
| required_methods = [ | ||
| "get_model", | ||
| "delete_model", | ||
| "patch_model", | ||
| "list_models", | ||
| ] | ||
| for method in required_methods: | ||
| if not hasattr(value, method) or not callable(getattr(value, method)): | ||
| raise AttributeError( | ||
| f"Object assigned to model_service_client is missing a callable '{method}' method." | ||
| ) | ||
| if not isinstance(value, model_service.ModelServiceClient): | ||
| raise TypeError("Expected an instance of model_service.ModelServiceClient.") | ||
| self._clients["model"] = value | ||
|
|
||
| # --- *SERVICECLIENT METHODS --- | ||
| def get_dataset( | ||
| self, | ||
| request: Optional[Union[dataset.GetDatasetRequest, dict]] = None, | ||
| dataset_id: Optional[DatasetIdentifier] = None, | ||
| *, | ||
| request: Optional["dataset.GetDatasetRequest"] = None, | ||
| retry: OptionalRetry = DEFAULT_RETRY, | ||
| timeout: Union[float, object] = DEFAULT_TIMEOUT, | ||
| metadata: Sequence[Tuple[str, Union[str, bytes]]] = DEFAULT_METADATA, | ||
| ): | ||
| ) -> "dataset.Dataset": | ||
| """ | ||
| TODO: Docstring is purposefully blank. microgenerator will add automatically. | ||
| """ | ||
| kwargs = _helpers._drop_self_key(locals()) | ||
| return self.dataset_service_client.get_dataset(**kwargs) | ||
| final_request = _helpers._make_request( | ||
| request_class=dataset.GetDatasetRequest, | ||
| user_request=request, | ||
| identifier_value=dataset_id, | ||
| identifier_name="dataset_id", | ||
| parser=self._parse_dataset_id_to_dict, | ||
| identifier_required=True, | ||
| ) | ||
|
|
||
| return self.dataset_service_client.get_dataset( | ||
| request=final_request, | ||
| retry=retry, | ||
| timeout=timeout, | ||
| metadata=metadata, | ||
| ) | ||
|
|
||
| def list_datasets( | ||
| self, | ||
| request: Optional[Union[dataset.ListDatasetsRequest, dict]] = None, | ||
| project_id: Optional[str] = None, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is |
||
| *, | ||
| request: Optional["dataset.ListDatasetsRequest"] = None, | ||
| retry: OptionalRetry = DEFAULT_RETRY, | ||
| timeout: Union[float, object] = DEFAULT_TIMEOUT, | ||
| metadata: Sequence[Tuple[str, Union[str, bytes]]] = DEFAULT_METADATA, | ||
| ): | ||
| """ | ||
| TODO: Docstring is purposefully blank. microgenerator will add automatically. | ||
| """ | ||
| kwargs = _helpers._drop_self_key(locals()) | ||
| return self.dataset_service_client.list_datasets(**kwargs) | ||
| final_request = _helpers._make_request( | ||
| request_class=dataset.ListDatasetsRequest, | ||
| user_request=request, | ||
| identifier_value=project_id, | ||
| identifier_name="project_id", | ||
| parser=self._parse_project_id_to_dict, | ||
| identifier_required=False, | ||
| ) | ||
|
|
||
| return self.dataset_service_client.list_datasets( | ||
| request=final_request, | ||
| retry=retry, | ||
| timeout=timeout, | ||
| metadata=metadata, | ||
| ) | ||
|
|
||
| # ============================================================================ | ||
| # TODO: HERE THERE BE DRAGONS. Once the above changes have been approved the | ||
| # methods below this comment will be updated to look and function similar | ||
| # to the above. | ||
| # NOT YET READY FOR REVIEW. | ||
| # ============================================================================ | ||
|
|
||
| def list_jobs( | ||
| self, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, there are cases where we merge these two in the existing hand-written client. For example, load jobs can take a string as a destination but merge in the job config object to the final request:
python-bigquery/google/cloud/bigquery/client.py
Lines 2577 to 2590 in ef2740a
I actually haven't thought too much about how the non-query jobs fit into this design, though. I suppose the user needs to specify more than just an identifier for all of the job types, so this method wouldn't apply?
Note: In addition to query jobs, load jobs using jobs.insert REST API will need a bit of handwritten magic to support load from local data via "resumable media uploads" (https://cloud.google.com/bigquery/docs/reference/api-uploads). I imagine we're planning on providing a separate hand-written helper for this, similar to queries? Actually do we know if the GAPICs even support the resumable upload API? AFAIK, it's only used in BigQuery, Cloud Storage, and Google Drive APIs. CC @parthea
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to handle the Query experience is being designed by someone else and is not fully fleshed out for the python libraries.
I would like to defer this as out of scope for the alpha release.