Skip to content

Commit 4325171

Browse files
drernieclaude
andauthored
feat(docker): read package manifests from s3 (#259)
* feat(docker): read package manifests from s3 * fix(docker): add type assertions for manifest data Fix Pyright type error in _load_manifest_data() by adding explicit type casting for jsonlines.Reader return values. Also run Black formatter to fix code formatting issues. - Add cast() import from typing - Cast manifest_meta to Dict - Cast entries to List[Dict] - Run Black formatter on Python files Fixes CI type checking failures in PR #259. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: update CHANGELOG for v0.8.7 Document all changes in v0.8.7: Added: - Browse buttons for linked packages - Read-only package manifest browsing from S3 Changed: - PackageFileFetcher reads manifests from S3 (not quilt3.Package.browse()) - Added role_arn and region parameters for cross-account access Fixed: - Docker filesystem writes eliminated (closes #258) - Pyright type error with explicit type casting - Python code formatting (Black) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(deps): migrate to uv dependency-groups and refactor Makefile - Migrate from [project.optional-dependencies] to [dependency-groups] for PEP 735 compliance and proper uv group support - Add UVDEV variable to Makefile for improved readability - Update install target to use --group dev --group docs - Update lint and test-unit targets to use $(UVDEV) - Fix pyright not being available in uv run context This resolves the "Failed to spawn: pyright" error by ensuring dev dependencies (pytest, black, isort, pyright) are properly configured as a dependency group rather than optional dependencies. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * chore(deps): remove unused docs dependency group Remove sphinx and sphinx-rtd-theme from dependency-groups as the project uses markdown documentation instead of Sphinx-generated docs. Updates: - Remove docs group from pyproject.toml - Update Makefile install target to only sync dev group - Update uv.lock to reflect removed dependencies 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Update uv.lock Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent b8c4aab commit 4325171

File tree

7 files changed

+189
-272
lines changed

7 files changed

+189
-272
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,23 @@ All notable changes to this project will be documented in this file.
88
### Added
99

1010
- Browse buttons for linked packages in Benchling Canvas
11+
- Read-only package manifest browsing directly from S3 (no local filesystem writes)
1112

1213
### Changed
1314

1415
- **BREAKING**: Removed `--config` alias, use only `--profile` for all commands
16+
- `PackageFileFetcher` now reads manifests directly from S3 instead of using `quilt3.Package.browse()`
17+
- Added `role_arn` and `region` parameters to `PackageFileFetcher` for cross-account access
1518

1619
### Fixed
1720

1821
- **CRITICAL**: `--profile` flag now works correctly for all commands
1922
- Configuration validation allows additional properties for backward compatibility
2023
- Status command works with any profile containing stackArn (not just integrated stacks)
2124
- Python package license corrected to Apache-2.0
25+
- Docker filesystem writes eliminated by reading package manifests directly from S3 (closes #258)
26+
- Pyright type error in `_load_manifest_data()` with explicit type casting
27+
- Python code formatting issues (Black formatter)
2228

2329
## [0.8.3] - 2025-11-18
2430

docker/Makefile

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ PORT_NATIVE := 5001
2323
PORT_DOCKER_DEV := 5002
2424
PORT_DOCKER_PROD := 5003
2525

26+
# UV command with dev group (for linting and testing)
27+
UVDEV = uv run --group dev
28+
2629
.PHONY: help build test clean install lint ngrok kill \
2730
check-xdg check-ngrok run-native run-native-verbose run-dev run run-native-ngrok run-ecr \
2831
test test-unit test-integration test-dev test-docker-dev test-docker-prod test-deployed-dev test-deployed-dev-direct test-deployed-prod test-ecr test-native test-benchling test-query \
@@ -117,7 +120,7 @@ check-ngrok:
117120

118121
# Install dependencies
119122
install:
120-
uv sync --all-extras
123+
uv sync --group dev
121124

122125
# Test Benchling OAuth credentials from AWS Secrets Manager
123126
test-benchling: check-xdg
@@ -212,7 +215,7 @@ test-all: lint test-unit test-dev test-integration
212215

213216
# Run unit tests with pytest
214217
test-unit:
215-
uv run pytest -v
218+
$(UVDEV) pytest -v
216219

217220
# Run integration tests with execution monitoring (requires real Benchling credentials)
218221
test-integration: test-benchling
@@ -371,9 +374,9 @@ test-deployed-prod: check-xdg
371374

372375
# Auto-fix code formatting (black + isort) and type check (pyright)
373376
lint:
374-
uv run black src/ tests/ scripts/
375-
uv run isort src/ tests/ scripts/
376-
uv run pyright src/ tests/ scripts/
377+
$(UVDEV) black src/ tests/ scripts/
378+
$(UVDEV) isort src/ tests/ scripts/
379+
$(UVDEV) pyright src/ tests/ scripts/
377380

378381
# Deployment
379382

docker/pyproject.toml

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ dependencies = [
3636
"click>=8.1.0",
3737
]
3838

39-
[project.optional-dependencies]
39+
[dependency-groups]
4040
dev = [
4141
"pytest==9.0.1",
4242
"pytest-mock==3.15.1",
@@ -47,10 +47,6 @@ dev = [
4747
"isort==7.0.0",
4848
"pyright==1.1.407",
4949
]
50-
docs = [
51-
"sphinx==8.2.3",
52-
"sphinx-rtd-theme==3.0.2",
53-
]
5450

5551
[project.urls]
5652
Homepage = "https://github.com/quiltdata/benchling-webhook"

docker/src/canvas.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,8 @@ def __init__(
7676
self._package_file_fetcher = package_file_fetcher or PackageFileFetcher(
7777
catalog_url=config.quilt_catalog,
7878
bucket=config.s3_bucket_name,
79+
role_arn=config.quilt_write_role_arn,
80+
region=config.aws_region,
7981
)
8082

8183
@property

docker/src/package_files.py

Lines changed: 102 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,18 @@
1616
- Package creation (see entry_packager.py)
1717
"""
1818

19-
from typing import List, Optional
19+
import io
20+
import json
21+
import os
22+
from typing import Dict, List, Optional, Tuple, cast
2023

21-
import quilt3
24+
import jsonlines
2225
import structlog
26+
from quilt3.backends import get_package_registry
27+
from quilt3.packages import ManifestJSONDecoder
28+
from quilt3.util import PhysicalKey
2329

30+
from .auth.role_manager import RoleManager
2431
from .packages import Package
2532

2633
logger = structlog.get_logger(__name__)
@@ -82,25 +89,100 @@ class PackageFileFetcher:
8289
"""Fetches file lists and metadata from Quilt packages.
8390
8491
Responsibilities:
85-
- Browse package contents via quilt3.Package.browse()
92+
- Browse package manifests directly from S3 (no local writes)
8693
- List all files in a package with metadata
8794
- Fetch package-level metadata
88-
- Create PackageFile instances from browse results
95+
- Create PackageFile instances from manifest entries
8996
9097
Designed to be reusable and cacheable to avoid repeated API calls.
9198
Does not modify packages, only reads their contents.
9299
"""
93100

94-
def __init__(self, catalog_url: str, bucket: str):
101+
def __init__(
102+
self,
103+
catalog_url: str,
104+
bucket: str,
105+
role_arn: Optional[str] = None,
106+
region: Optional[str] = None,
107+
role_manager: Optional[RoleManager] = None,
108+
):
95109
"""Initialize fetcher.
96110
97111
Args:
98112
catalog_url: Quilt catalog URL (e.g., "nightly.quilttest.com")
99113
bucket: S3 bucket name
114+
role_arn: Optional IAM role ARN for cross-account access
115+
region: Optional AWS region (defaults to AWS_REGION env or us-east-1)
116+
role_manager: Optional RoleManager instance (used in tests)
100117
"""
101118
self.catalog_url = catalog_url
102119
self.bucket = bucket
103120
self.logger = structlog.get_logger(__name__)
121+
self.role_manager = role_manager or RoleManager(
122+
role_arn=role_arn,
123+
region=region or os.getenv("AWS_REGION", "us-east-1"),
124+
)
125+
126+
def _get_registry(self):
127+
"""Create an S3-backed package registry."""
128+
return get_package_registry(f"s3://{self.bucket}")
129+
130+
def _fetch_physical_key_bytes(self, physical_key: PhysicalKey) -> bytes:
131+
"""Read bytes from a PhysicalKey without touching the local filesystem."""
132+
if physical_key.is_local():
133+
with open(physical_key.path, "rb") as file:
134+
return file.read()
135+
136+
s3_client = self.role_manager.get_s3_client()
137+
params = {"Bucket": physical_key.bucket, "Key": physical_key.path}
138+
if physical_key.version_id:
139+
params["VersionId"] = physical_key.version_id
140+
response = s3_client.get_object(**params)
141+
return response["Body"].read()
142+
143+
def _load_manifest_data(self, package_name: str) -> Tuple[Dict, List[Dict]]:
144+
"""Load manifest metadata and entries for the latest package version."""
145+
registry = self._get_registry()
146+
147+
top_hash = self._fetch_physical_key_bytes(registry.pointer_latest_pk(package_name)).decode("utf-8").strip()
148+
manifest_bytes = self._fetch_physical_key_bytes(registry.manifest_pk(package_name, top_hash))
149+
150+
manifest_stream = io.StringIO(manifest_bytes.decode("utf-8"))
151+
reader = jsonlines.Reader(manifest_stream, loads=ManifestJSONDecoder().decode)
152+
153+
manifest_meta = cast(Dict, reader.read())
154+
entries = cast(List[Dict], list(reader))
155+
156+
return manifest_meta, entries
157+
158+
@staticmethod
159+
def _is_valid_file_entry(entry: Dict) -> bool:
160+
"""Return True if manifest entry represents a real file."""
161+
logical_key = entry.get("logical_key")
162+
physical_keys = entry.get("physical_keys") or []
163+
164+
if not logical_key or logical_key.startswith(".quilt/"):
165+
return False
166+
167+
return bool(physical_keys)
168+
169+
def _parse_entry_json(self, entries: List[Dict]) -> Optional[dict]:
170+
"""Load entry.json contents if present."""
171+
entry_json = next((entry for entry in entries if entry.get("logical_key") == "entry.json"), None)
172+
if not entry_json:
173+
return None
174+
175+
physical_keys = entry_json.get("physical_keys") or []
176+
if not physical_keys:
177+
return None
178+
179+
try:
180+
physical_key = PhysicalKey.from_url(physical_keys[0])
181+
data = self._fetch_physical_key_bytes(physical_key)
182+
return json.loads(data.decode("utf-8"))
183+
except Exception as exc: # pragma: no cover - defensive logging
184+
self.logger.warning("Failed to load entry.json", error=str(exc))
185+
return None
104186

105187
def get_package_files(
106188
self,
@@ -124,18 +206,18 @@ def get_package_files(
124206
self.logger.info("Fetching package files", package_name=package_name)
125207

126208
try:
127-
# Browse package
128-
pkg = quilt3.Package.browse(package_name, registry=f"s3://{self.bucket}")
129-
130-
# Collect all files (not directories)
131-
files = []
132-
for logical_key, entry in pkg.walk():
133-
# Skip metadata files (internal to Quilt)
134-
if logical_key.startswith(".quilt/"):
209+
_, entries = self._load_manifest_data(package_name)
210+
211+
files: List[PackageFile] = []
212+
for entry in entries:
213+
if not self._is_valid_file_entry(entry):
135214
continue
136215

137-
# Get file size
138-
size = entry.size if hasattr(entry, "size") else 0
216+
logical_key = entry["logical_key"]
217+
try:
218+
size = int(entry.get("size", 0) or 0)
219+
except (TypeError, ValueError):
220+
size = 0
139221

140222
files.append(
141223
PackageFile(
@@ -147,7 +229,6 @@ def get_package_files(
147229
)
148230
)
149231

150-
# Respect max_files limit
151232
if max_files and len(files) >= max_files:
152233
break
153234

@@ -183,16 +264,14 @@ def get_package_metadata(self, package_name: str) -> dict:
183264
self.logger.info("Fetching package metadata", package_name=package_name)
184265

185266
try:
186-
pkg = quilt3.Package.browse(package_name, registry=f"s3://{self.bucket}")
267+
manifest_meta, entries = self._load_manifest_data(package_name)
187268

188-
# Try to read entry.json if it exists
189-
if "entry.json" in pkg:
190-
metadata = pkg["entry.json"]() # Fetch and deserialize
269+
entry_metadata = self._parse_entry_json(entries)
270+
if entry_metadata is not None:
191271
self.logger.info("Fetched entry.json metadata", package_name=package_name)
192-
return metadata
272+
return entry_metadata
193273

194-
# Otherwise return package-level metadata
195-
metadata = pkg.meta or {}
274+
metadata = manifest_meta.get("user_meta") or {}
196275
self.logger.info("Fetched package metadata", package_name=package_name)
197276
return metadata
198277

0 commit comments

Comments
 (0)