Skip to content

Commit c4b1693

Browse files
Kezzsimgenematx
andauthored
๐Ÿชฃ Support for OBject Storage (#1021)
* ๐ŸŽˆ *writes data to get teh party started* * ๐ŸŒซ๏ธ *anxiously adds more cloud providers* * Resolve mypy errors * ๐Ÿ‘๏ธ Resolve minio https error preventing us from writing `zarr.json` * ๐Ÿšฎ Experiment with writing (sloppy) data * ๐Ÿชฒ DEBUG: problems with `write` * ๐Ÿ•ถ๏ธ Review : Add missing prefix Co-authored-by: Eugene <ymatviych@bnl.gov> * โœ๏ธ Write regex helper function * ๐Ÿงฝ refactor to clean up repeated code * โœ๏ธ Add Blobs to writing tests * โœ๏ธ Rewrite `get_storage` to be a router for buckets * refactor ObjectStorage * ๐Ÿ‹ Add minio container to CI for testing * ๐Ÿงช Make `TILED_TEST_BUCKET` env var for advanced testing * More refactoring of Storage * FIX: look up registered storages instead of recreating them * Simplify test config * TST: fix test_writing + more refactoring * MNT: add minio dependency for server * ENH: generalize asset deletion --------- Co-authored-by: Eugene <ymatviych@bnl.gov>
1 parent b01df99 commit c4b1693

File tree

13 files changed

+457
-56
lines changed

13 files changed

+457
-56
lines changed

โ€Ž.github/workflows/ci.ymlโ€Ž

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,12 @@ jobs:
6262
shell: bash -l {0}
6363
run: source continuous_integration/scripts/start_redis.sh
6464

65+
- name: Start Minio service in container.
66+
#TODO: This product is leaving open-source container distribution
67+
# Find a new image or product to use
68+
# https://github.com/minio/minio/issues/21647#issuecomment-3418675115
69+
shell: bash -l {0}
70+
run: source continuous_integration/scripts/start_minio.sh
6571

6672
- name: Ensure example data is migrated to current catalog database schema.
6773
# The example data is expected to be kept up to date to the latest Tiled
@@ -84,6 +90,7 @@ jobs:
8490
# Provide test suite with PostgreSQL and Redis databases to use.
8591
TILED_TEST_POSTGRESQL_URI: postgresql://postgres:secret@localhost:5432
8692
TILED_TEST_REDIS: redis://localhost:6379
93+
TILED_TEST_BUCKET: http://minioadmin:minioadmin@localhost:9000/buck
8794
# TODO Reinstate after finding a new image to use
8895
# https://github.com/bluesky/tiled/issues/1109
8996
# # Opt in to LDAPAuthenticator tests.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
version: "3.2"
2+
services:
3+
minio:
4+
image: minio/minio:latest
5+
ports:
6+
- 9000:9000
7+
- 9001:9001
8+
volumes:
9+
- minio-data:/data
10+
environment:
11+
MINIO_ROOT_USER: "minioadmin"
12+
MINIO_ROOT_PASSWORD: "minioadmin"
13+
command: server /data --console-address :9001
14+
restart: unless-stopped
15+
create-bucket:
16+
image: minio/mc:latest
17+
environment:
18+
MC_HOST_minio: http://minioadmin:minioadmin@minio:9000
19+
entrypoint:
20+
- sh
21+
- -c
22+
- |
23+
until mc ls minio > /dev/null 2>&1; do
24+
sleep 0.5
25+
done
26+
27+
mc mb --ignore-existing minio/buck
28+
volumes:
29+
minio-data:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#!/bin/bash
2+
set -e
3+
4+
# Start MinIO server in docker container
5+
docker pull minio/minio:latest
6+
docker compose -f continuous_integration/docker-configs/minio-docker-compose.yml up -d
7+
docker ps
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
authentication:
2+
allow_anonymous_access: false
3+
trees:
4+
- path: /
5+
tree: catalog
6+
args:
7+
uri: "sqlite:///storage/catalog.db"
8+
writable_storage:
9+
- provider: s3
10+
uri: "http://localhost:9000"
11+
config:
12+
access_key_id: "minioadmin"
13+
secret_access_key: "minioadmin"
14+
bucket: "buck"
15+
virtual_hosted_style_request: False
16+
client_options: {"allow_http": True}
17+
init_if_not_exists: true
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
version: "3.2"
2+
services:
3+
minio:
4+
image: minio/minio:latest
5+
ports:
6+
- 9000:9000
7+
- 9001:9001
8+
volumes:
9+
- minio-data:/data
10+
environment:
11+
MINIO_ROOT_USER: "minioadmin"
12+
MINIO_ROOT_PASSWORD: "minioadmin"
13+
command: server /data --console-address :9001
14+
restart: unless-stopped
15+
create-bucket:
16+
image: minio/mc:latest
17+
environment:
18+
MC_HOST_minio: http://minioadmin:minioadmin@minio:9000
19+
entrypoint:
20+
- sh
21+
- -c
22+
- |
23+
until mc ls minio > /dev/null 2>&1; do
24+
sleep 0.5
25+
done
26+
27+
mc mb --ignore-existing minio/buck
28+
volumes:
29+
minio-data:
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Create a local bucket for testing access to BLOBS
2+
3+
In this example there exists:
4+
- A `docker-compose.yml` file capable of instantiating and running a [Minio](https://min.io/) container.
5+
- A configuration yaml file `bucket_storage.yml` which contains information tiled needs to authenticate with the bucket storage system and write / read Binary Large Objects (BLOBS) through the Zaar adapter.
6+
7+
## How to run this example:
8+
1. In one terminal window, navigate to the directory where the `docker-compose.yml` and `bucket_storage.yml` are.
9+
2. Run `docker compose up` with adequate permissions.
10+
3. Open another terminal window in the same location and run `tiled serve config bucket_storage.yml --api-key secret`
11+
4. You will need to create a `storage` directory in `/example_configs/bucket_storage` for the sqlite database.
12+
5. Create an `ipython` session and run the following commands to write array data as a BLOB in a bucket:
13+
```python
14+
from tiled.client import from_uri
15+
c = from_uri('http://localhost:8000', api_key='secret')
16+
c.write_array([1,2,3])
17+
```
18+
6. You will be able to see the written data in the bucket if you log in to the minio container, exposed on your machine at `http://localhost:9001/login`. </br> Use testing credentials `minioadmin` for both fields.

โ€Žpixi.tomlโ€Ž

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ flake8 = "*"
5454
ipython = "*"
5555
ldap3 = "*"
5656
matplotlib = "*"
57+
minio = "*"
5758
mistune = "*"
5859
myst-parser = "*"
5960
numpydoc = "*"

โ€Žpyproject.tomlโ€Ž

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,12 @@ all = [
8080
"jinja2",
8181
"jmespath",
8282
"lz4",
83+
"minio",
8384
"ndindex",
8485
"numcodecs",
8586
"numpy",
8687
"numba >=0.59.0", # indirect, pinned to assist uv solve
88+
"obstore",
8789
"openpyxl",
8890
"packaging",
8991
"pandas",
@@ -156,6 +158,7 @@ dev = [
156158
"ldap3",
157159
"locust",
158160
"matplotlib",
161+
"minio",
159162
"mistune",
160163
"myst-parser",
161164
"numpydoc",
@@ -242,10 +245,12 @@ server = [
242245
"jinja2",
243246
"jmespath",
244247
"lz4",
248+
"minio",
245249
"ndindex",
246250
"numba >=0.59.0", # indirect, pinned to assist uv solve
247251
"numcodecs",
248252
"numpy",
253+
"obstore",
249254
"openpyxl",
250255
"packaging",
251256
"pandas",

โ€Žtiled/_tests/conftest.pyโ€Ž

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import tempfile
55
from pathlib import Path
66
from typing import Any
7+
from urllib.parse import urlparse
78

89
import asyncpg
910
import pytest
@@ -313,6 +314,40 @@ def redis_uri():
313314
raise pytest.skip("No TILED_TEST_REDIS configured")
314315

315316

317+
@pytest.fixture
318+
def minio_uri():
319+
if uri := os.getenv("TILED_TEST_BUCKET"):
320+
from minio import Minio
321+
from minio.deleteobjects import DeleteObject
322+
323+
# For convenience, we split the bucket from a string
324+
url = urlparse(uri)
325+
bucket = url.path.lstrip("/")
326+
uri = url._replace(netloc="{}:{}".format(url.hostname, url.port), path="")
327+
328+
client = Minio(
329+
uri.geturl(),
330+
access_key=url.username,
331+
secret_key=url.password,
332+
secure=False,
333+
)
334+
335+
# Reset the state of the bucket after each test.
336+
if client.bucket_exists(bucket):
337+
delete_object_list = map(
338+
lambda x: DeleteObject(x.object_name),
339+
client.list_objects(bucket, recursive=True),
340+
)
341+
errors = client.remove_objects(bucket, delete_object_list)
342+
for error in errors:
343+
print("error occurred when deleting object", error)
344+
else:
345+
client.make_bucket(bucket)
346+
347+
else:
348+
raise pytest.skip("No TILED_TEST_BUCKET configured")
349+
350+
316351
@pytest.fixture(scope="function")
317352
def tiled_websocket_context(tmpdir, redis_uri):
318353
"""Fixture that provides a Tiled context with websocket support."""

โ€Žtiled/_tests/test_writing.pyโ€Ž

Lines changed: 54 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@
55
"""
66

77
import base64
8+
import os
89
import threading
910
import uuid
1011
from datetime import datetime
12+
from urllib.parse import urljoin, urlparse
1113

1214
import awkward
1315
import dask.dataframe
@@ -17,6 +19,8 @@
1719
import pyarrow
1820
import pytest
1921
import sparse
22+
from minio import Minio
23+
from minio.error import S3Error
2024
from pandas.testing import assert_frame_equal
2125
from starlette.status import (
2226
HTTP_404_NOT_FOUND,
@@ -37,7 +41,7 @@
3741
from ..structures.data_source import DataSource
3842
from ..structures.sparse import COOStructure
3943
from ..structures.table import TableStructure
40-
from ..utils import APACHE_ARROW_FILE_MIME_TYPE, patch_mimetypes
44+
from ..utils import APACHE_ARROW_FILE_MIME_TYPE, patch_mimetypes, sanitize_uri
4145
from ..validation_registration import ValidationRegistry
4246
from .utils import fail_with_status_code
4347

@@ -46,13 +50,55 @@
4650

4751

4852
@pytest.fixture
49-
def tree(tmpdir):
50-
return in_memory(
51-
writable_storage=[
52-
f"file://localhost{str(tmpdir / 'data')}",
53-
f"duckdb:///{tmpdir / 'data.duckdb'}",
54-
]
55-
)
53+
def tmp_minio_bucket():
54+
"""Create a temporary MinIO bucket and clean it up after tests."""
55+
if uri := os.getenv("TILED_TEST_BUCKET"):
56+
clean_uri, username, password = sanitize_uri(uri)
57+
minio_client = Minio(
58+
urlparse(clean_uri).netloc, # e.g. only "localhost:9000"
59+
access_key=username or "minioadmin",
60+
secret_key=password or "minioadmin",
61+
secure=False,
62+
)
63+
64+
bucket_name = f"test-{uuid.uuid4().hex}"
65+
minio_client.make_bucket(bucket_name)
66+
67+
try:
68+
yield urljoin(uri, "/" + bucket_name) # full URI with credentials
69+
finally:
70+
# Cleanup: remove all objects and delete the bucket
71+
try:
72+
objects = minio_client.list_objects(bucket_name, recursive=True)
73+
for obj in objects:
74+
minio_client.remove_object(bucket_name, obj.object_name)
75+
minio_client.remove_bucket(bucket_name)
76+
except S3Error as e:
77+
print(f"Warning: failed to delete test bucket {bucket_name}: {e}")
78+
79+
else:
80+
yield None
81+
82+
83+
@pytest.fixture
84+
def tree(tmpdir, tmp_minio_bucket):
85+
writable_storage = [f"duckdb:///{tmpdir / 'data.duckdb'}"]
86+
87+
if tmp_minio_bucket:
88+
writable_storage.append(
89+
{
90+
"provider": "s3",
91+
"uri": tmp_minio_bucket,
92+
"config": {
93+
"virtual_hosted_style_request": False,
94+
"client_options": {"allow_http": True},
95+
},
96+
}
97+
)
98+
99+
writable_storage.append(f"file://localhost{str(tmpdir / 'data')}")
100+
101+
return in_memory(writable_storage=writable_storage)
56102

57103

58104
def test_write_array_full(tree):

0 commit comments

Comments
ย (0)