@@ -4,5 +4,233 @@ BigQuery DataFrames
4
4
BigQuery DataFrames provides a Pythonic DataFrame and machine learning (ML) API
5
5
powered by the BigQuery engine.
6
6
7
- * ``bigframes.pandas `` provides a pandas-like API for analytics.
8
- * ``bigframes.ml `` provides a Scikit-Learn-like API for ML.
7
+ * ``bigframes.pandas `` provides a pandas-compatible API for analytics.
8
+ * ``bigframes.ml `` provides a scikit-learn-like API for ML.
9
+
10
+ Documentation
11
+ -------------
12
+
13
+ * `BigQuery DataFrames sample notebooks <https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks >`_
14
+ * `BigQuery DataFrames API reference <https://cloud.google.com/python/docs/reference/bigframes/latest >`_
15
+ * `BigQuery documentation <https://cloud.google.com/bigquery/docs/ >`_
16
+
17
+
18
+ Quickstart
19
+ ----------
20
+
21
+ Prerequisites
22
+ ^^^^^^^^^^^^^
23
+
24
+ * Install the ``bigframes `` package.
25
+ * Create a Google Cloud project and billing account.
26
+ * When running locally, authenticate with application default credentials. See
27
+ the `gcloud auth application-default login
28
+ <https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login> `_
29
+ reference.
30
+
31
+ Code sample
32
+ ^^^^^^^^^^^
33
+
34
+ Import ``bigframes.pandas `` for a pandas-like interface. The ``read_gbq ``
35
+ method accepts either a fully-qualified table ID or a SQL query.
36
+
37
+ .. code-block :: python
38
+
39
+ import bigframes.pandas as bpd
40
+
41
+ df1 = bpd.read_gbq(" project.dataset.table" )
42
+ df2 = bpd.read_gbq(" SELECT a, b, c, FROM `project.dataset.table`" )
43
+
44
+ * `More code samples <https://github.com/googleapis/python-bigquery-dataframes/tree/main/samples/snippets >`_
45
+
46
+
47
+ Locations
48
+ ---------
49
+ BigQuery DataFrames uses a
50
+ `BigQuery session <https://cloud.google.com/bigquery/docs/sessions-intro >`_
51
+ internally to manage metadata on the service side. This session is tied to a
52
+ `location <https://cloud.google.com/bigquery/docs/locations >`_ .
53
+ BigQuery DataFrames uses the US multi-region as the default location, but you
54
+ can use ``session_options.location `` to set a different location. Every query
55
+ in a session is executed in the location where the session was created.
56
+
57
+ If you want to reset the location of the created DataFrame or Series objects,
58
+ can reset the session by executing ``bigframes.pandas.reset_session() ``.
59
+ After that, you can reuse ``bigframes.pandas.options.bigquery.location `` to
60
+ specify another location.
61
+
62
+
63
+ ``read_gbq() `` requires you to specify a location if the dataset you are
64
+ querying is not in the US multi-region. If you try to read a table from another
65
+ location, you get a NotFound exception.
66
+
67
+
68
+ ML locations
69
+ ------------
70
+
71
+ ``bigframes.ml `` supports the same locations as BigQuery ML. BigQuery ML model
72
+ prediction and other ML functions are supported in all BigQuery regions. Support
73
+ for model training varies by region. For more information, see
74
+ `BigQuery ML locations <https://cloud.google.com/bigquery/docs/locations#bqml-loc >`_.
75
+
76
+
77
+ Data types
78
+ ----------
79
+
80
+ BigQuery DataFrames supports the following numpy and pandas dtypes:
81
+
82
+ * ``numpy.dtype("O") ``
83
+ * ``pandas.BooleanDtype() ``
84
+ * ``pandas.Float64Dtype() ``
85
+ * ``pandas.Int64Dtype() ``
86
+ * ``pandas.StringDtype(storage="pyarrow") ``
87
+ * ``pandas.ArrowDtype(pa.date32()) ``
88
+ * ``pandas.ArrowDtype(pa.time64("us")) ``
89
+ * ``pandas.ArrowDtype(pa.timestamp("us")) ``
90
+ * ``pandas.ArrowDtype(pa.timestamp("us", tz="UTC")) ``
91
+
92
+ BigQuery DataFrames doesn’t support the following BigQuery data types:
93
+
94
+ * ``ARRAY ``
95
+ * ``NUMERIC ``
96
+ * ``BIGNUMERIC ``
97
+ * ``INTERVAL ``
98
+ * ``STRUCT ``
99
+ * ``JSON ``
100
+
101
+ All other BigQuery data types display as the object type.
102
+
103
+
104
+ Remote functions
105
+ ----------------
106
+
107
+ BigQuery DataFrames gives you the ability to turn your custom scalar functions
108
+ into `BigQuery remote functions
109
+ <https://cloud.google.com/bigquery/docs/remote-functions> `_ . Creating a remote
110
+ function in BigQuery DataFrames creates a BigQuery remote function, a `BigQuery
111
+ connection
112
+ <https://cloud.google.com/bigquery/docs/create-cloud-resource-connection> `_ ,
113
+ and a `Cloud Functions (2nd gen) function
114
+ <https://cloud.google.com/functions/docs/concepts/overview> `_ .
115
+
116
+ BigQuery connections are created in the same location as the BigQuery
117
+ DataFrames session, using the name you provide in the custom function
118
+ definition. To view and manage connections, do the following:
119
+
120
+ 1. Go to `BigQuery Studio <https://console.cloud.google.com/bigquery >`__.
121
+ 2. Select the project in which you created the remote function.
122
+ 3. In the Explorer pane, expand that project and then expand External connections.
123
+
124
+ BigQuery remote functions are created in the dataset you specify, or
125
+ in a dataset with the name ``bigframes_temp_location ``, where location is
126
+ the location used by the BigQuery DataFrames session. For example,
127
+ ``bigframes_temp_us_central1 ``. To view and manage remote functions, do
128
+ the following:
129
+
130
+ 1. Go to `BigQuery Studio <https://console.cloud.google.com/bigquery >`__.
131
+ 2. Select the project in which you created the remote function.
132
+ 3. In the Explorer pane, expand that project, expand the dataset in which you
133
+ created the remote function, and then expand Routines.
134
+
135
+ To view and manage Cloud Functions functions, use the
136
+ `Functions <https://console.cloud.google.com/functions/list?env=gen2 >`_
137
+ page and use the project picker to select the project in which you
138
+ created the function. For easy identification, the names of the functions
139
+ created by BigQuery DataFrames are prefixed by ``bigframes- ``.
140
+
141
+ **Requirements **
142
+
143
+ BigQuery DataFrames uses the ``gcloud `` command-line interface internally,
144
+ so you must run ``gcloud auth login `` before using remote functions.
145
+
146
+ To use BigQuery DataFrames remote functions, you must enable the following APIs:
147
+
148
+ * The BigQuery API (bigquery.googleapis.com)
149
+ * The BigQuery Connection API (bigqueryconnection.googleapis.com)
150
+ * The Cloud Functions API (cloudfunctions.googleapis.com)
151
+ * The Cloud Run API (run.googleapis.com)
152
+ * The Artifact Registry API (artifactregistry.googleapis.com)
153
+ * The Cloud Build API (cloudbuild.googleapis.com )
154
+ * The Cloud Resource Manager API (cloudresourcemanager.googleapis.com)
155
+
156
+ To use BigQuery DataFrames remote functions, you must be granted the
157
+ following IAM roles:
158
+
159
+ * BigQuery Data Editor (roles/bigquery.dataEditor)
160
+ * BigQuery Connection Admin (roles/bigquery.connectionAdmin)
161
+ * Cloud Functions Developer (roles/cloudfunctions.developer)
162
+ * Service Account User (roles/iam.serviceAccountUser)
163
+ * Storage Object Viewer (roles/storage.objectViewer)
164
+ * Project IAM Admin (roles/resourcemanager.projectIamAdmin)
165
+
166
+ **Limitations **
167
+
168
+ * Remote functions take about 90 seconds to become available when you first create them.
169
+ * Trivial changes in the notebook, such as inserting a new cell or renaming a variable,
170
+ might cause the remote function to be re-created, even if these changes are unrelated
171
+ to the remote function code.
172
+ * BigQuery DataFrames does not differentiate any personal data you include in the remote
173
+ function code. The remote function code is serialized as an opaque box to deploy it as a
174
+ Cloud Functions function.
175
+ * The Cloud Functions (2nd gen) functions, BigQuery connections, and BigQuery remote
176
+ functions created by BigQuery DataFrames persist in Google Cloud. If you don’t want to
177
+ keep these resources, you must delete them separately using an appropriate Cloud Functions
178
+ or BigQuery interface.
179
+ * A project can have up to 1000 Cloud Functions (2nd gen) functions at a time. See Cloud
180
+ Functions quotas for all the limits.
181
+
182
+
183
+ Quotas and limits
184
+ -----------------
185
+
186
+ `BigQuery quotas <https://cloud.google.com/bigquery/quotas >`_
187
+ including hardware, software, and network components.
188
+
189
+
190
+ Session termination
191
+ -------------------
192
+
193
+ Each BigQuery DataFrames DataFrame or Series object is tied to a BigQuery
194
+ DataFrames session, which is in turn based on a BigQuery session. BigQuery
195
+ sessions
196
+ `auto-terminate <https://cloud.google.com/bigquery/docs/sessions-terminating#auto-terminate_a_session >`_
197
+ ; when this happens, you can’t use previously
198
+ created DataFrame or Series objects and must re-create them using a new
199
+ BigQuery DataFrames session. You can do this by running
200
+ ``bigframes.pandas.reset_session() `` and then re-running the BigQuery
201
+ DataFrames expressions.
202
+
203
+
204
+ Data processing location
205
+ ------------------------
206
+
207
+ BigQuery DataFrames is designed for scale, which it achieves by keeping data
208
+ and processing on the BigQuery service. However, you can bring data into the
209
+ memory of your client machine by calling ``.execute() `` on a DataFrame or Series
210
+ object. If you choose to do this, the memory limitation of your client machine
211
+ applies.
212
+
213
+
214
+ License
215
+ -------
216
+
217
+ BigQuery DataFrames is distributed with the `Apache-2.0 license
218
+ <https://github.com/googleapis/python-bigquery-dataframes/blob/main/LICENSE> `_.
219
+
220
+ It also contains code derived from the following third-party packages:
221
+
222
+ * `Ibis <https://ibis-project.org/ >`_
223
+ * `pandas <https://pandas.pydata.org/ >`_
224
+ * `Python <https://www.python.org/ >`_
225
+ * `scikit-learn <https://scikit-learn.org/ >`_
226
+ * `XGBoost <https://xgboost.readthedocs.io/en/stable/ >`_
227
+
228
+ For details, see the `third_party
229
+ <https://github.com/googleapis/python-bigquery-dataframes/tree/main/third_party/bigframes_vendored> `_
230
+ directory.
231
+
232
+
233
+ Contact Us
234
+ ----------
235
+
236
+ For further help and provide feedback, you can email us at `
[email protected] <
https://mail.google.com/mail/?view=cm&fs=1&tf=1&[email protected] >`_.
0 commit comments