Skip to content

Commit 264142e

Browse files
authored
chore: release 0.1.0 (#4)
Syncs to internal commit 7927948 Change-Id: Ib799a9c5e5a18d9b471756410aa5c87cb4932fe8
1 parent b6caad6 commit 264142e

File tree

163 files changed

+15353
-4913
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

163 files changed

+15353
-4913
lines changed

.kokoro/continuous/e2e.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,5 @@
33
# Only run this nox session.
44
env_vars: {
55
key: "NOX_SESSION"
6-
value: "system_prerelease system_noextras e2e notebook samples"
6+
value: "system_noextras e2e notebook samples"
77
}

.kokoro/continuous/nightly.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
env_vars: {
44
key: "NOX_SESSION"
5-
value: "unit unit_prerelease system system_prerelease cover lint lint_setup_py mypy format docs e2e notebook"
5+
value: "unit system cover lint lint_setup_py mypy format docs e2e notebook"
66
}
77

88
build_file: "bigframes/.kokoro/release-nightly.sh"

.kokoro/presubmit/e2e.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,5 @@
33
# Only run this nox session.
44
env_vars: {
55
key: "NOX_SESSION"
6-
value: "system_prerelease system_noextras e2e notebook samples"
6+
value: "system_noextras e2e notebook samples"
77
}

.kokoro/release-nightly.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,3 +211,8 @@ gcs_docs () {
211211
}
212212

213213
gcs_docs
214+
215+
if ! [ ${DRY_RUN} ]; then
216+
# Copy docs and wheels to Google Drive
217+
python3.10 scripts/upload_to_google_drive.py
218+
fi

.repo-metadata.json

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
"name": "bigframes",
3+
"name_pretty": "A unified Python API in BigQuery",
4+
"product_documentation": "https://cloud.google.com/bigquery",
5+
"client_documentation": "https://cloud.google.com/python/docs/reference/bigframes/latest",
6+
"issue_tracker": "https://github.com/googleapis/python-bigquery-dataframes/issues",
7+
"release_level": "preview",
8+
"language": "python",
9+
"library_type": "INTEGRATION",
10+
"repo": "googleapis/python-bigquery-dataframes",
11+
"distribution_name": "bigframes",
12+
"api_id": "bigquery.googleapis.com",
13+
"default_version": "",
14+
"codeowner_team": "@googleapis/api-bigquery-dataframe",
15+
"api_shortname": "bigquery"
16+
}

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,14 @@
44

55
[1]: https://pypi.org/project/bigframes/#history
66

7-
## 0.1.0 (TBD)
7+
## 0.1.0 (2023-08-11)
88

99
### Features
1010

1111
* Add `bigframes.pandas` package with an API compatible with
1212
[pandas](https://pandas.pydata.org/). Supported data sources include:
1313
BigQuery SQL queries, BigQuery tables, CSV (local and GCS), Parquet (local
14-
and GCS), and more.
14+
and Cloud Storage), and more.
1515
* Add `bigframes.ml` package with an API inspired by
1616
[scikit-learn](https://scikit-learn.org/stable/). Train machine learning
1717
models and run batch predicition, powered by [BigQuery

README.rst

Lines changed: 230 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,233 @@ BigQuery DataFrames
44
BigQuery DataFrames provides a Pythonic DataFrame and machine learning (ML) API
55
powered by the BigQuery engine.
66

7-
* ``bigframes.pandas`` provides a pandas-like API for analytics.
8-
* ``bigframes.ml`` provides a Scikit-Learn-like API for ML.
7+
* ``bigframes.pandas`` provides a pandas-compatible API for analytics.
8+
* ``bigframes.ml`` provides a scikit-learn-like API for ML.
9+
10+
Documentation
11+
-------------
12+
13+
* `BigQuery DataFrames sample notebooks <https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks>`_
14+
* `BigQuery DataFrames API reference <https://cloud.google.com/python/docs/reference/bigframes/latest>`_
15+
* `BigQuery documentation <https://cloud.google.com/bigquery/docs/>`_
16+
17+
18+
Quickstart
19+
----------
20+
21+
Prerequisites
22+
^^^^^^^^^^^^^
23+
24+
* Install the ``bigframes`` package.
25+
* Create a Google Cloud project and billing account.
26+
* When running locally, authenticate with application default credentials. See
27+
the `gcloud auth application-default login
28+
<https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login>`_
29+
reference.
30+
31+
Code sample
32+
^^^^^^^^^^^
33+
34+
Import ``bigframes.pandas`` for a pandas-like interface. The ``read_gbq``
35+
method accepts either a fully-qualified table ID or a SQL query.
36+
37+
.. code-block:: python
38+
39+
import bigframes.pandas as bpd
40+
41+
df1 = bpd.read_gbq("project.dataset.table")
42+
df2 = bpd.read_gbq("SELECT a, b, c, FROM `project.dataset.table`")
43+
44+
* `More code samples <https://github.com/googleapis/python-bigquery-dataframes/tree/main/samples/snippets>`_
45+
46+
47+
Locations
48+
---------
49+
BigQuery DataFrames uses a
50+
`BigQuery session <https://cloud.google.com/bigquery/docs/sessions-intro>`_
51+
internally to manage metadata on the service side. This session is tied to a
52+
`location <https://cloud.google.com/bigquery/docs/locations>`_ .
53+
BigQuery DataFrames uses the US multi-region as the default location, but you
54+
can use ``session_options.location`` to set a different location. Every query
55+
in a session is executed in the location where the session was created.
56+
57+
If you want to reset the location of the created DataFrame or Series objects,
58+
can reset the session by executing ``bigframes.pandas.reset_session()``.
59+
After that, you can reuse ``bigframes.pandas.options.bigquery.location`` to
60+
specify another location.
61+
62+
63+
``read_gbq()`` requires you to specify a location if the dataset you are
64+
querying is not in the US multi-region. If you try to read a table from another
65+
location, you get a NotFound exception.
66+
67+
68+
ML locations
69+
------------
70+
71+
``bigframes.ml`` supports the same locations as BigQuery ML. BigQuery ML model
72+
prediction and other ML functions are supported in all BigQuery regions. Support
73+
for model training varies by region. For more information, see
74+
`BigQuery ML locations <https://cloud.google.com/bigquery/docs/locations#bqml-loc>`_.
75+
76+
77+
Data types
78+
----------
79+
80+
BigQuery DataFrames supports the following numpy and pandas dtypes:
81+
82+
* ``numpy.dtype("O")``
83+
* ``pandas.BooleanDtype()``
84+
* ``pandas.Float64Dtype()``
85+
* ``pandas.Int64Dtype()``
86+
* ``pandas.StringDtype(storage="pyarrow")``
87+
* ``pandas.ArrowDtype(pa.date32())``
88+
* ``pandas.ArrowDtype(pa.time64("us"))``
89+
* ``pandas.ArrowDtype(pa.timestamp("us"))``
90+
* ``pandas.ArrowDtype(pa.timestamp("us", tz="UTC"))``
91+
92+
BigQuery DataFrames doesn’t support the following BigQuery data types:
93+
94+
* ``ARRAY``
95+
* ``NUMERIC``
96+
* ``BIGNUMERIC``
97+
* ``INTERVAL``
98+
* ``STRUCT``
99+
* ``JSON``
100+
101+
All other BigQuery data types display as the object type.
102+
103+
104+
Remote functions
105+
----------------
106+
107+
BigQuery DataFrames gives you the ability to turn your custom scalar functions
108+
into `BigQuery remote functions
109+
<https://cloud.google.com/bigquery/docs/remote-functions>`_ . Creating a remote
110+
function in BigQuery DataFrames creates a BigQuery remote function, a `BigQuery
111+
connection
112+
<https://cloud.google.com/bigquery/docs/create-cloud-resource-connection>`_ ,
113+
and a `Cloud Functions (2nd gen) function
114+
<https://cloud.google.com/functions/docs/concepts/overview>`_ .
115+
116+
BigQuery connections are created in the same location as the BigQuery
117+
DataFrames session, using the name you provide in the custom function
118+
definition. To view and manage connections, do the following:
119+
120+
1. Go to `BigQuery Studio <https://console.cloud.google.com/bigquery>`__.
121+
2. Select the project in which you created the remote function.
122+
3. In the Explorer pane, expand that project and then expand External connections.
123+
124+
BigQuery remote functions are created in the dataset you specify, or
125+
in a dataset with the name ``bigframes_temp_location``, where location is
126+
the location used by the BigQuery DataFrames session. For example,
127+
``bigframes_temp_us_central1``. To view and manage remote functions, do
128+
the following:
129+
130+
1. Go to `BigQuery Studio <https://console.cloud.google.com/bigquery>`__.
131+
2. Select the project in which you created the remote function.
132+
3. In the Explorer pane, expand that project, expand the dataset in which you
133+
created the remote function, and then expand Routines.
134+
135+
To view and manage Cloud Functions functions, use the
136+
`Functions <https://console.cloud.google.com/functions/list?env=gen2>`_
137+
page and use the project picker to select the project in which you
138+
created the function. For easy identification, the names of the functions
139+
created by BigQuery DataFrames are prefixed by ``bigframes-``.
140+
141+
**Requirements**
142+
143+
BigQuery DataFrames uses the ``gcloud`` command-line interface internally,
144+
so you must run ``gcloud auth login`` before using remote functions.
145+
146+
To use BigQuery DataFrames remote functions, you must enable the following APIs:
147+
148+
* The BigQuery API (bigquery.googleapis.com)
149+
* The BigQuery Connection API (bigqueryconnection.googleapis.com)
150+
* The Cloud Functions API (cloudfunctions.googleapis.com)
151+
* The Cloud Run API (run.googleapis.com)
152+
* The Artifact Registry API (artifactregistry.googleapis.com)
153+
* The Cloud Build API (cloudbuild.googleapis.com )
154+
* The Cloud Resource Manager API (cloudresourcemanager.googleapis.com)
155+
156+
To use BigQuery DataFrames remote functions, you must be granted the
157+
following IAM roles:
158+
159+
* BigQuery Data Editor (roles/bigquery.dataEditor)
160+
* BigQuery Connection Admin (roles/bigquery.connectionAdmin)
161+
* Cloud Functions Developer (roles/cloudfunctions.developer)
162+
* Service Account User (roles/iam.serviceAccountUser)
163+
* Storage Object Viewer (roles/storage.objectViewer)
164+
* Project IAM Admin (roles/resourcemanager.projectIamAdmin)
165+
166+
**Limitations**
167+
168+
* Remote functions take about 90 seconds to become available when you first create them.
169+
* Trivial changes in the notebook, such as inserting a new cell or renaming a variable,
170+
might cause the remote function to be re-created, even if these changes are unrelated
171+
to the remote function code.
172+
* BigQuery DataFrames does not differentiate any personal data you include in the remote
173+
function code. The remote function code is serialized as an opaque box to deploy it as a
174+
Cloud Functions function.
175+
* The Cloud Functions (2nd gen) functions, BigQuery connections, and BigQuery remote
176+
functions created by BigQuery DataFrames persist in Google Cloud. If you don’t want to
177+
keep these resources, you must delete them separately using an appropriate Cloud Functions
178+
or BigQuery interface.
179+
* A project can have up to 1000 Cloud Functions (2nd gen) functions at a time. See Cloud
180+
Functions quotas for all the limits.
181+
182+
183+
Quotas and limits
184+
-----------------
185+
186+
`BigQuery quotas <https://cloud.google.com/bigquery/quotas>`_
187+
including hardware, software, and network components.
188+
189+
190+
Session termination
191+
-------------------
192+
193+
Each BigQuery DataFrames DataFrame or Series object is tied to a BigQuery
194+
DataFrames session, which is in turn based on a BigQuery session. BigQuery
195+
sessions
196+
`auto-terminate <https://cloud.google.com/bigquery/docs/sessions-terminating#auto-terminate_a_session>`_
197+
; when this happens, you can’t use previously
198+
created DataFrame or Series objects and must re-create them using a new
199+
BigQuery DataFrames session. You can do this by running
200+
``bigframes.pandas.reset_session()`` and then re-running the BigQuery
201+
DataFrames expressions.
202+
203+
204+
Data processing location
205+
------------------------
206+
207+
BigQuery DataFrames is designed for scale, which it achieves by keeping data
208+
and processing on the BigQuery service. However, you can bring data into the
209+
memory of your client machine by calling ``.execute()`` on a DataFrame or Series
210+
object. If you choose to do this, the memory limitation of your client machine
211+
applies.
212+
213+
214+
License
215+
-------
216+
217+
BigQuery DataFrames is distributed with the `Apache-2.0 license
218+
<https://github.com/googleapis/python-bigquery-dataframes/blob/main/LICENSE>`_.
219+
220+
It also contains code derived from the following third-party packages:
221+
222+
* `Ibis <https://ibis-project.org/>`_
223+
* `pandas <https://pandas.pydata.org/>`_
224+
* `Python <https://www.python.org/>`_
225+
* `scikit-learn <https://scikit-learn.org/>`_
226+
* `XGBoost <https://xgboost.readthedocs.io/en/stable/>`_
227+
228+
For details, see the `third_party
229+
<https://github.com/googleapis/python-bigquery-dataframes/tree/main/third_party/bigframes_vendored>`_
230+
directory.
231+
232+
233+
Contact Us
234+
----------
235+
236+
For further help and provide feedback, you can email us at `[email protected] <https://mail.google.com/mail/?view=cm&fs=1&tf=1&[email protected]>`_.

bigframes/_config/__init__.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020
import bigframes._config.bigquery_options as bigquery_options
2121
import bigframes._config.display_options as display_options
22+
import bigframes._config.sampling_options as sampling_options
2223

2324

2425
class Options:
@@ -27,6 +28,7 @@ class Options:
2728
def __init__(self):
2829
self._bigquery_options = bigquery_options.BigQueryOptions()
2930
self._display_options = display_options.DisplayOptions()
31+
self._sampling_options = sampling_options.SamplingOptions()
3032

3133
@property
3234
def bigquery(self) -> bigquery_options.BigQueryOptions:
@@ -38,6 +40,15 @@ def display(self) -> display_options.DisplayOptions:
3840
"""Options controlling object representation."""
3941
return self._display_options
4042

43+
@property
44+
def sampling(self) -> sampling_options.SamplingOptions:
45+
"""Options controlling downsampling when downloading data
46+
to memory. The data will be downloaded into memory explicitly
47+
(e.g., to_pandas, to_numpy, values) or implicitly (e.g.,
48+
matplotlib plotting). This option can be overriden by
49+
parameters in specific functions."""
50+
return self._sampling_options
51+
4152

4253
options = Options()
4354
"""Global options for default session."""

bigframes/_config/bigquery_options.py

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,14 @@
2121
import google.api_core.exceptions
2222
import google.auth.credentials
2323

24-
SESSION_STARTED_MESSAGE = "Cannot change '{attribute}' once a session has started."
24+
SESSION_STARTED_MESSAGE = (
25+
"Cannot change '{attribute}' once a session has started. "
26+
"Call bigframes.pandas.reset_session() first, if you are using the bigframes.pandas API."
27+
)
2528

2629

2730
class BigQueryOptions:
28-
"""Encapsulates configuration for working with an Session."""
31+
"""Encapsulates configuration for working with a session."""
2932

3033
def __init__(
3134
self,
@@ -55,7 +58,7 @@ def credentials(self, value: Optional[google.auth.credentials.Credentials]):
5558

5659
@property
5760
def location(self) -> Optional[str]:
58-
"""Default location for jobs / datasets / tables.
61+
"""Default location for job, datasets, and tables.
5962
6063
See: https://cloud.google.com/bigquery/docs/locations
6164
"""
@@ -69,7 +72,7 @@ def location(self, value: Optional[str]):
6972

7073
@property
7174
def project(self) -> Optional[str]:
72-
"""Google Cloud project ID to use for billing and default data project."""
75+
"""Google Cloud project ID to use for billing and as the default project."""
7376
return self._project
7477

7578
@project.setter
@@ -80,10 +83,12 @@ def project(self, value: Optional[str]):
8083

8184
@property
8285
def remote_udf_connection(self) -> Optional[str]:
83-
"""Name of the BigQuery connection for the purpose of remote UDFs.
86+
"""Name of the BigQuery connection to use for remote functions.
8487
85-
It should be either pre created in `location`, or the user should have
86-
privilege to create one.
88+
You should either have the connection already created in the
89+
<code>location</code> you have chosen, or you should have the Project IAM
90+
Admin role to enable the service to create the connection for you if you
91+
need it.
8792
"""
8893
return self._remote_udf_connection
8994

@@ -97,7 +102,7 @@ def remote_udf_connection(self, value: Optional[str]):
97102

98103
@property
99104
def use_regional_endpoints(self) -> bool:
100-
"""In preview. Flag to connect to regional API endpoints.
105+
"""Flag to connect to regional API endpoints.
101106
102107
Requires ``location`` to also be set. For example, set
103108
``location='asia-northeast1'`` and ``use_regional_endpoints=True`` to

0 commit comments

Comments
 (0)