Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions changelog_entry.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
- bump: patch
changes:
added:
- Support for base64-encoded Google service account keys.
17 changes: 16 additions & 1 deletion policyengine/utils/data/simplified_google_storage_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@
from policyengine_core.data.dataset import atomic_write
import logging
from google.cloud.storage import Client, Blob
from google.oauth2 import service_account
from typing import Iterable, Optional
import os
import json
import base64

logger = logging.getLogger(__name__)

Expand All @@ -16,7 +20,18 @@ class SimplifiedGoogleStorageClient:
"""

def __init__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question, blocking: Have you looked at all into Google Secrets Manager?

An option I've found that we could use would be, instead of having us create individual service accounts for customers, we could generate our own API keys and then use Google Secrets Manager to store these keys and pass & validate them via SHA256 encryption. This seems a bit more akin to what you're seeking to do. It also gives us more granular control over which datasets this enables, as we can store richer metadata with the key (imagine one day we create another type of limited-access dataset that we don't want these customers to access).

I have a Claude chat here describing Google Secrets Manager, as well as other options that may be less desirable (e.g., JWT tokens). The most relevant portions are toward the end. Curious to hear your thoughts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to build some other service and host it to authenticate right though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to understand our use case better here. I don't really want to manage service accounts for external users or force the user to authenticate using a specific mechanism.

Is there a reason we can't have them provide a gmail account or service account that they own/manage and then grant it permission to access the bucket?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that we want to keep the process of running code that downloads the microdata simple. A single microdata access token as an env var is as simple as it gets

self.client = Client()
credentials = None
if os.getenv("POLICYENGINE_RESEARCH_TOKEN"):
# This will have b64-encoded JSON credentials in it
token = os.getenv("POLICYENGINE_RESEARCH_TOKEN")
decoded_token = base64.b64decode(token).decode("utf-8")
json_token = json.loads(decoded_token)
credentials = (
service_account.Credentials.from_service_account_info(
json_token
)
)
self.client = Client(credentials=credentials)

def get_versioned_blob(
self, bucket_name: str, key: str, version: Optional[str] = None
Expand Down
9 changes: 7 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ authors = [
{name = "PolicyEngine", email = "[email protected]"},
]
license = {file = "LICENSE"}
requires-python = ">=3.6"
requires-python = ">=3.10"
dependencies = [
"policyengine_core>=3.10",
"policyengine-uk",
Expand All @@ -20,7 +20,12 @@ dependencies = [
"google-cloud-storage (>=3.1.0,<4.0.0)",
"microdf_python",
"getpass4",
"pydantic"
"pydantic",
"google-auth>=2.40.3",
"google-auth-oauthlib>=1.2.2",
"google-auth-httplib2>=0.2.0",
"google-api-python-client>=2.172.0",
"click>=8.2.1",
]

[project.optional-dependencies]
Expand Down
Loading