Skip to content

Commit 20bd962

Browse files
authored
Merge pull request #122 from rstudio/feat-backend-gcs
feat(gcs): initial google cloud storage support
2 parents 3d221bf + a99d53f commit 20bd962

File tree

12 files changed

+80
-14
lines changed

12 files changed

+80
-14
lines changed

.env.dev

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,11 @@ AWS_REGION=us-east-1
1616
AZURE_STORAGE_ACCOUNT_NAME=cipins
1717
AZURE_STORAGE_ACCOUNT_KEY=
1818

19+
# GCS backend ----
20+
# Note that this backend uses gcsfs's
21+
# default auth setting, which requires authenticating
22+
# via the gcloud cli.
23+
1924
# Rstudio Connect license ----
2025
RSC_LICENSE=
2126

.github/workflows/ci.yml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,15 +48,26 @@ jobs:
4848
fi
4949
5050
python -m pip install -e .[test]
51+
52+
- name: Set up Cloud SDK
53+
uses: google-github-actions/setup-gcloud@v0
54+
with:
55+
project_id: siuba-tests
56+
service_account_key: ${{ secrets.GCP_SA_KEY }}
57+
export_default_credentials: true
58+
5159
- name: Run tests
5260
run: |
53-
pytest pins -m 'not fs_rsc and not skip_on_github' $PYTEST_OPTS
61+
pytest pins -m 'not fs_rsc and not skip_on_github' --workers 4 --tests-per-worker 1 $PYTEST_OPTS
5462
env:
5563
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
5664
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
5765
AWS_REGION: "us-east-1"
5866
PYTEST_OPTS: ${{ matrix.pytest_opts }}
5967
REQUIREMENTS: ${{ matrix.requirements }}
68+
# fixes error on macosx virtual machine with pytest-parallel
69+
# https://github.com/browsertron/pytest-parallel/issues/93
70+
no_proxy: "*"
6071

6172
test-rsconnect:
6273
name: "Test RSConnect"

Makefile

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,13 @@ README.md: README.Rmd
3030
--TagRemovePreprocessor.remove_input_tags='hide-cell' \
3131
--output $@
3232

33-
test:
34-
pytest
33+
test: test-most test-rsc
34+
35+
test-most:
36+
pytest pins -m "not fs_rsc" --workers 4 --tests-per-worker 1
37+
38+
test-rsc:
39+
pytest pins -m "fs_rsc"
3540

3641
docs-build:
3742
jb build --builder html docs

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
The pins package publishes data, models, and other Python objects, making it
66
easy to share them across projects and with your colleagues. You can pin
77
objects to a variety of pin *boards*, including folders (to share on a
8-
networked drive or with services like DropBox), RStudio Connect, and Amazon
9-
S3.
8+
networked drive or with services like DropBox), RStudio Connect, Amazon S3,
9+
and Google Cloud Storage.
1010
Pins can be automatically versioned, making it straightforward to track changes,
1111
re-run analyses on historical data, and undo mistakes.
1212

@@ -103,7 +103,7 @@ You can easily control who gets to access the data using the RStudio
103103
Connect permissions pane.
104104

105105
The pins package also includes boards that allow you to share data on
106-
services like Amazon’s S3 (`board_s3()`), with plans to support other backends--
106+
services like Amazon’s S3 (`board_s3()`), and Google Cloud Storage (`board_gcs`), with plans to support other backends--
107107
such as Azure's blob storage.
108108

109109
## Development

docs/api/constructors.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,6 @@ Board Constructors
1010
~board_local
1111
~board_temp
1212
~board_s3
13+
~board_gcs
1314
~board_rsconnect
1415
~board

docs/getting_started.Rmd

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ Getting Started
2121
===============
2222

2323
The pins package helps you publish data sets, models, and other Python objects, making it easy to share them across projects and with your colleagues.
24-
You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with DropBox), RStudio connect, Amazon S3, and more.
24+
You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with DropBox), RStudio connect, Amazon S3,
25+
Google Cloud Storage, and more.
2526
This vignette will introduce you to the basics of pins.
2627

2728
```{python}

pins/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,5 +19,6 @@
1919
board_urls,
2020
board_rsconnect,
2121
board_s3,
22+
board_gcs,
2223
board,
2324
)

pins/boards.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -277,7 +277,8 @@ def pin_write(
277277
# move pin to destination ----
278278
# create pin version folder
279279
dst_pin_path = self.construct_path([pin_name])
280-
dst_version_path = self.path_to_deploy_version(name, meta.version.version)
280+
dst_version = meta.version.version
281+
dst_version_path = self.path_to_deploy_version(name, dst_version)
281282

282283
if not self.fs.exists(dst_pin_path):
283284
# equivalent to mkdirp, want to fail quietly in case of race conditions
@@ -297,7 +298,9 @@ def pin_write(
297298
"but that directory already exists."
298299
)
299300

300-
inform(_log, f"Writing to pin {repr(pin_name)}")
301+
inform(
302+
_log, f"Writing pin:\nName: {repr(pin_name)}\nVersion: {dst_version}"
303+
)
301304

302305
res = self.fs.put(tmp_dir, dst_version_path, recursive=True)
303306

pins/constructors.py

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -115,12 +115,13 @@ def board(
115115
----
116116
Many fsspec implementations of filesystems cache the searching of files, which may
117117
cause you to not see pins saved by other people. Disable this on these file systems
118-
with `storage_options = {"cache_timeout": 0}`.
118+
with `storage_options = {"listings_expiry_time": 0}` on s3, or `{"cache_timeout": 0}`
119+
on google cloud storage.
119120
120121
"""
121122

122123
if storage_options is None:
123-
storage_options = {"listings_expiry_time": 0}
124+
storage_options = {}
124125

125126
# TODO: at this point should just manually construct the rsc board directly
126127
# from board_rsconnect...
@@ -381,4 +382,31 @@ def board_s3(path, versioned=True, cache=DEFAULT, allow_pickle_read=None):
381382
382383
"""
383384
# TODO: user should be able to specify storage options here?
384-
return board("s3", path, versioned, cache, allow_pickle_read)
385+
386+
opts = {"listings_expiry_time": 0}
387+
return board("s3", path, versioned, cache, allow_pickle_read, storage_options=opts)
388+
389+
390+
def board_gcs(path, versioned=True, cache=DEFAULT, allow_pickle_read=None):
391+
"""Create a board to read and write pins from an AWS S3 bucket folder.
392+
393+
Parameters
394+
----------
395+
path:
396+
Path of form <bucket_name>/<optional>/<subdirectory>.
397+
**kwargs:
398+
Passed to the pins.board function.
399+
400+
Note
401+
----
402+
The gcs board uses the fsspec library (gcsfs) to handle interacting with
403+
google cloud storage. Currently, its default mode of authentication
404+
is supported.
405+
406+
See https://gcsfs.readthedocs.io/en/latest/#credentials
407+
"""
408+
409+
# GCSFS uses a different name for listings_expiry_time, and then
410+
# fixes it under the hood
411+
opts = {"cache_timeout": 0}
412+
return board("gcs", path, versioned, cache, allow_pickle_read, storage_options=opts)

pins/tests/conftest.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,13 @@
1414

1515

1616
# Based on https://github.com/machow/siuba/blob/main/siuba/tests/helpers.py
17-
BACKEND_MARKS = ["fs_s3", "fs_file", "fs_rsc"]
17+
BACKEND_MARKS = ["fs_s3", "fs_file", "fs_gcs", "fs_rsc"]
1818

1919
# parameters that can be used more than once per session
2020
params_safe = [
2121
pytest.param(lambda: BoardBuilder("file"), id="file", marks=m.fs_file),
2222
pytest.param(lambda: BoardBuilder("s3"), id="s3", marks=m.fs_s3),
23+
pytest.param(lambda: BoardBuilder("gcs"), id="s3", marks=m.fs_gcs),
2324
]
2425

2526
# rsc should only be used once, because users are created at docker setup time

0 commit comments

Comments
 (0)