Skip to content

Commit af4ab8f

Browse files
authored
DOC: Add code samples to introduction and refactor howto guides. (#287)
* DOC: Add code samples to introduction and refactor howto guides. This change adds a proper introduction article to introduce `read_gbq` and `to_gbq` with code samples. To reduce the amount of duplicated content, code samples have been extracted into sample files. To give a better next steps experience, the content in the authentication, reading, and writing guides has been rearranged to have the most relevant next steps (after the introduction) as the first section. * Move private_key_contents fixture to conftest * Autouse credentials, so that samples tests are skipped when credentials are not available. * Run system tests on Circle CI by checking for a file at `ci/service_account.json`.
1 parent 7e78b99 commit af4ab8f

21 files changed

+421
-213
lines changed

conftest.py

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
"""Shared pytest fixtures for system tests."""
2+
3+
import os
4+
import os.path
5+
import uuid
6+
7+
import google.oauth2.service_account
8+
import pytest
9+
10+
11+
@pytest.fixture(scope="session")
12+
def project_id():
13+
return os.environ.get("GBQ_PROJECT_ID") or os.environ.get(
14+
"GOOGLE_CLOUD_PROJECT"
15+
) # noqa
16+
17+
18+
@pytest.fixture(scope="session")
19+
def private_key_path():
20+
path = os.path.join(
21+
"ci", "service_account.json"
22+
) # Written by the 'ci/config_auth.sh' script.
23+
if "GBQ_GOOGLE_APPLICATION_CREDENTIALS" in os.environ:
24+
path = os.environ["GBQ_GOOGLE_APPLICATION_CREDENTIALS"]
25+
elif "GOOGLE_APPLICATION_CREDENTIALS" in os.environ:
26+
path = os.environ["GOOGLE_APPLICATION_CREDENTIALS"]
27+
28+
if not os.path.isfile(path):
29+
pytest.skip(
30+
"Cannot run integration tests when there is "
31+
"no file at the private key json file path"
32+
)
33+
return None
34+
35+
return path
36+
37+
38+
@pytest.fixture(scope="session")
39+
def private_key_contents(private_key_path):
40+
if private_key_path is None:
41+
return None
42+
43+
with open(private_key_path) as f:
44+
return f.read()
45+
46+
47+
@pytest.fixture(scope="module")
48+
def bigquery_client(project_id, private_key_path):
49+
from google.cloud import bigquery
50+
51+
return bigquery.Client.from_service_account_json(
52+
private_key_path, project=project_id
53+
)
54+
55+
56+
@pytest.fixture()
57+
def random_dataset_id(bigquery_client):
58+
import google.api_core.exceptions
59+
60+
dataset_id = "".join(["pandas_gbq_", str(uuid.uuid4()).replace("-", "_")])
61+
dataset_ref = bigquery_client.dataset(dataset_id)
62+
yield dataset_id
63+
try:
64+
bigquery_client.delete_dataset(dataset_ref, delete_contents=True)
65+
except google.api_core.exceptions.NotFound:
66+
pass # Not all tests actually create a dataset
67+
68+
69+
@pytest.fixture()
70+
def credentials(private_key_path):
71+
return google.oauth2.service_account.Credentials.from_service_account_file(
72+
private_key_path
73+
)

docs/source/changelog.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,12 @@ Changelog
1111
``max_results`` to 0 to ignore query outputs, such as for DML or DDL
1212
queries. (:issue:`102`)
1313

14+
Documentation
15+
~~~~~~~~~~~~~
16+
17+
- Add code samples to introduction and refactor howto guides. (:issue:`239`)
18+
19+
1420
.. _changelog-0.11.0:
1521

1622
0.11.0 / 2019-07-29
@@ -44,6 +50,10 @@ Internal changes
4450
0.10.0 / 2019-04-05
4551
-------------------
4652

53+
- **Breaking Change:** Default SQL dialect is now ``standard``. Use
54+
:attr:`pandas_gbq.context.dialect` to override the default value.
55+
(:issue:`195`, :issue:`245`)
56+
4757
Documentation
4858
~~~~~~~~~~~~~
4959

docs/source/howto/authentication.rst

Lines changed: 58 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,68 @@
11
Authentication
22
==============
33

4+
Before you begin, you must create a Google Cloud Platform project. Use the
5+
`BigQuery sandbox <https://cloud.google.com/bigquery/docs/sandbox>`__ to try
6+
the service for free.
7+
48
pandas-gbq `authenticates with the Google BigQuery service
5-
<https://cloud.google.com/bigquery/docs/authentication/>`_ via OAuth 2.0.
9+
<https://cloud.google.com/bigquery/docs/authentication/>`_ via OAuth 2.0. Use
10+
the ``credentials`` argument to explicitly pass in Google
11+
:class:`~google.auth.credentials.Credentials`.
612

713
.. _authentication:
814

15+
Default Authentication Methods
16+
------------------------------
17+
18+
If the ``credentials`` parameter is not set, pandas-gbq tries the following
19+
authentication methods:
20+
21+
1. In-memory, cached credentials at ``pandas_gbq.context.credentials``. See
22+
:attr:`pandas_gbq.Context.credentials` for details.
23+
24+
.. code:: python
25+
26+
import pandas_gbq
27+
28+
credentials = ... # From google-auth or pydata-google-auth library.
29+
30+
# Update the in-memory credentials cache (added in pandas-gbq 0.7.0).
31+
pandas_gbq.context.credentials = credentials
32+
pandas_gbq.context.project = "your-project-id"
33+
34+
# The credentials and project_id arguments can be omitted.
35+
df = pandas_gbq.read_gbq("SELECT my_col FROM `my_dataset.my_table`")
36+
37+
2. Application Default Credentials via the :func:`google.auth.default`
38+
function.
39+
40+
.. note::
41+
42+
If pandas-gbq can obtain default credentials but those credentials
43+
cannot be used to query BigQuery, pandas-gbq will also try obtaining
44+
user account credentials.
45+
46+
A common problem with default credentials when running on Google
47+
Compute Engine is that the VM does not have sufficient scopes to query
48+
BigQuery.
49+
50+
3. User account credentials.
51+
52+
pandas-gbq loads cached credentials from a hidden user folder on the
53+
operating system.
54+
55+
Windows
56+
``%APPDATA%\pandas_gbq\bigquery_credentials.dat``
57+
58+
Linux/Mac/Unix
59+
``~/.config/pandas_gbq/bigquery_credentials.dat``
60+
61+
If pandas-gbq does not find cached credentials, it prompts you to open a
62+
web browser, where you can grant pandas-gbq permissions to access your
63+
cloud resources. These credentials are only used locally. See the
64+
:doc:`privacy policy <privacy>` for details.
65+
966

1067
Authenticating with a Service Account
1168
--------------------------------------
@@ -131,55 +188,3 @@ credentials are not found.
131188
Additional information on the user credentials authentication mechanism
132189
can be found in the `Google Cloud authentication guide
133190
<https://cloud.google.com/docs/authentication/end-user>`__.
134-
135-
136-
Default Authentication Methods
137-
------------------------------
138-
139-
If the ``credentials`` parameter (or the deprecated ``private_key``
140-
parameter) is ``None``, pandas-gbq tries the following authentication
141-
methods:
142-
143-
1. In-memory, cached credentials at ``pandas_gbq.context.credentials``. See
144-
:attr:`pandas_gbq.Context.credentials` for details.
145-
146-
.. code:: python
147-
148-
import pandas_gbq
149-
150-
credentials = ... # From google-auth or pydata-google-auth library.
151-
152-
# Update the in-memory credentials cache (added in pandas-gbq 0.7.0).
153-
pandas_gbq.context.credentials = credentials
154-
pandas_gbq.context.project = "your-project-id"
155-
156-
# The credentials and project_id arguments can be omitted.
157-
df = pandas_gbq.read_gbq("SELECT my_col FROM `my_dataset.my_table`")
158-
159-
2. Application Default Credentials via the :func:`google.auth.default`
160-
function.
161-
162-
.. note::
163-
164-
If pandas-gbq can obtain default credentials but those credentials
165-
cannot be used to query BigQuery, pandas-gbq will also try obtaining
166-
user account credentials.
167-
168-
A common problem with default credentials when running on Google
169-
Compute Engine is that the VM does not have sufficient scopes to query
170-
BigQuery.
171-
172-
3. User account credentials.
173-
174-
pandas-gbq loads cached credentials from a hidden user folder on the
175-
operating system.
176-
177-
Windows
178-
``%APPDATA%\pandas_gbq\bigquery_credentials.dat``
179-
180-
Linux/Mac/Unix
181-
``~/.config/pandas_gbq/bigquery_credentials.dat``
182-
183-
If pandas-gbq does not find cached credentials, it opens a browser window
184-
asking for you to authenticate to your BigQuery account using the product
185-
name ``pandas GBQ``.

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ with a shape and data types derived from the source table. Additionally,
1313
DataFrames can be inserted into new BigQuery tables or appended to existing
1414
tables.
1515

16-
.. warning::
16+
.. note::
1717

1818
To use this module, you will need a valid BigQuery account. Use the
1919
`BigQuery sandbox <https://cloud.google.com/bigquery/docs/sandbox>`__ to

docs/source/install.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,15 +40,16 @@ This module requires following additional dependencies:
4040
- `pydata-google-auth <https://github.com/pydata/pydata-google-auth>`__: Helpers for authentication to Google's API
4141
- `google-auth <https://github.com/GoogleCloudPlatform/google-auth-library-python>`__: authentication and authorization for Google's API
4242
- `google-auth-oauthlib <https://github.com/GoogleCloudPlatform/google-auth-library-python-oauthlib>`__: integration with `oauthlib <https://github.com/idan/oauthlib>`__ for end-user authentication
43-
- `google-cloud-bigquery <http://github.com/GoogleCloudPlatform/google-cloud-python>`__: Google Cloud client library for BigQuery
43+
- `google-cloud-bigquery <https://googleapis.dev/python/bigquery/latest/index.html>`__: Google Cloud client library for BigQuery
44+
- `google-cloud-bigquery-storage <https://googleapis.dev/python/bigquerystorage/latest/index.html>`__: Google Cloud client library for BigQuery Storage API
4445

4546
.. note::
4647

47-
The dependency on `google-cloud-bigquery <http://github.com/GoogleCloudPlatform/google-cloud-python>`__ is new in version 0.3.0 of ``pandas-gbq``.
48+
The dependency on `google-cloud-bigquery <https://googleapis.dev/python/bigquery/latest/index.html>`__ is new in version 0.3.0 of ``pandas-gbq``.
4849
Versions less than 0.3.0 required the following dependencies:
4950

5051
- `httplib2 <https://github.com/httplib2/httplib2>`__: HTTP client (no longer required)
51-
- `google-api-python-client <http://github.com/google/google-api-python-client>`__: Google's API client (no longer required, replaced by `google-cloud-bigquery <http://github.com/GoogleCloudPlatform/google-cloud-python>`__:)
52+
- `google-api-python-client <http://github.com/google/google-api-python-client>`__: Google's API client (no longer required, replaced by `google-cloud-bigquery <hhttps://googleapis.dev/python/bigquery/latest/index.html>`__:)
5253
- `google-auth <https://github.com/GoogleCloudPlatform/google-auth-library-python>`__: authentication and authorization for Google's API
5354
- `google-auth-oauthlib <https://github.com/GoogleCloudPlatform/google-auth-library-python-oauthlib>`__: integration with `oauthlib <https://github.com/idan/oauthlib>`__ for end-user authentication
5455
- `google-auth-httplib2 <https://github.com/GoogleCloudPlatform/google-auth-library-python-httplib2>`__: adapter to use ``httplib2`` HTTP client with ``google-auth`` (no longer required)

docs/source/intro.rst

Lines changed: 51 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,43 @@
11
Introduction
22
============
33

4-
Supported Data Types
5-
++++++++++++++++++++
4+
The pandas-gbq package reads data from `Google BigQuery
5+
<https://cloud.google.com/bigquery/docs/>`__ to a :class:`pandas.DataFrame`
6+
object and also writes :class:`pandas.DataFrame` objects to BigQuery tables.
67

7-
Pandas supports all these `BigQuery data types <https://cloud.google.com/bigquery/data-types>`__:
8-
``STRING``, ``INTEGER`` (64bit), ``FLOAT`` (64 bit), ``BOOLEAN`` and
9-
``TIMESTAMP`` (microsecond precision). Data types ``BYTES`` and ``RECORD``
10-
are not supported.
8+
Authenticating to BigQuery
9+
--------------------------
1110

12-
Logging
13-
+++++++
11+
Before you begin, you must create a Google Cloud Platform project. Use the
12+
`BigQuery sandbox <https://cloud.google.com/bigquery/docs/sandbox>`__ to try
13+
the service for free.
14+
15+
If you do not provide any credentials, this module attempts to load
16+
credentials from the environment. If no credentials are found, pandas-gbq
17+
prompts you to open a web browser, where you can grant it permissions to
18+
access your cloud resources. These credentials are only used locally. See the
19+
:doc:`privacy policy <privacy>` for details.
20+
21+
Learn about authentication methods in the :doc:`authentication guide
22+
<howto/authentication>`.
23+
24+
Reading data from BigQuery
25+
--------------------------
26+
27+
Use the :func:`pandas_gbq.read_gbq` function to run a BigQuery query and
28+
download the results as a :class:`pandas.DataFrame` object.
29+
30+
.. literalinclude:: samples/read_gbq_simple.py
31+
:language: python
32+
:dedent: 4
33+
:start-after: [START bigquery_pandas_gbq_read_gbq_simple]
34+
:end-before: [END bigquery_pandas_gbq_read_gbq_simple]
35+
36+
By default, queries use standard SQL syntax. Visit the :doc:`reading tables
37+
guide <reading>` to learn about the available options.
38+
39+
Adjusting log vebosity
40+
^^^^^^^^^^^^^^^^^^^^^^
1441

1542
Because some requests take some time, this library will log its progress of
1643
longer queries. IPython & Jupyter by default attach a handler to the logger.
@@ -23,3 +50,19 @@ more verbose logs, you can do something like:
2350
logger = logging.getLogger('pandas_gbq')
2451
logger.setLevel(logging.DEBUG)
2552
logger.addHandler(logging.StreamHandler())
53+
54+
Writing data to BigQuery
55+
------------------------
56+
57+
Use the :func:`pandas_gbq.to_gbq` function to write a
58+
:class:`pandas.DataFrame` object to a BigQuery table.
59+
60+
.. literalinclude:: samples/to_gbq_simple.py
61+
:language: python
62+
:dedent: 4
63+
:start-after: [START bigquery_pandas_gbq_to_gbq_simple]
64+
:end-before: [END bigquery_pandas_gbq_to_gbq_simple]
65+
66+
The destination table and destination dataset will automatically be created.
67+
By default, writes to BigQuery fail if the table already exists. Visit the
68+
:doc:`writing tables guide <writing>` to learn about the available options.

0 commit comments

Comments
 (0)