Skip to content

Commit 40f8e82

Browse files
authored
Merge branch 'main' into chore/numpy-docstrings
2 parents 0a55373 + 0341579 commit 40f8e82

File tree

11 files changed

+1371
-10
lines changed

11 files changed

+1371
-10
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ repos:
88
- repo: https://github.com/astral-sh/ruff-pre-commit
99
rev: v0.14.3
1010
hooks:
11-
- id: ruff
11+
- id: ruff-check
1212
types_or: [python, jupyter]
1313
args: ["--fix", "--show-fixes"]
1414
- id: ruff-format

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ The format is based on Keep a Changelog, and the project follows Semantic Versio
1111
### Changed
1212

1313
- Standardised NumPy docstrings across all public functions and classes in `export.py`, `inspect.py`, `writers.py`, and the `tools/` and `providers/` layers. No behaviour changes.
14+
### Added
15+
16+
- `load_export(path)` reader API that loads any export produced by `export_batch(...)` — both combined (single file) and per-item (directory) layouts — and returns a structured `ExportResult`. Failed points are NaN-filled rather than dropped, partial model runs are surfaced via `status="partial"`, and `ExportResult.embedding(model)` provides a typed shortcut to the embedding array.
1417

1518
### Fixed
1619

docs/api.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ If you want installation and first-run examples, start with [Quickstart](quickst
77

88
## Core Entry Points
99

10-
Most users only need four public entry points: `get_embedding(...)`, `get_embeddings_batch(...)`, `export_batch(...)`, and `inspect_provider_patch(...)`.
10+
Most users only need five public entry points: `get_embedding(...)`, `get_embeddings_batch(...)`, `export_batch(...)`, `load_export(...)`, and `inspect_provider_patch(...)`.
1111

1212
---
1313

@@ -18,6 +18,7 @@ Most users only need four public entry points: `get_embedding(...)`, `get_embedd
1818
| understand spatial/temporal/output specs | [API: Specs and Data Structures](api_specs.md) |
1919
| get one embedding or batch embeddings | [API: Embedding](api_embedding.md) |
2020
| build export pipelines and datasets | [API: Export](api_export.md) |
21+
| read back a saved export | [API: Load](api_load.md) |
2122
| inspect raw provider patches before inference | [API: Inspect](api_inspect.md) |
2223

2324
---

docs/api_export.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This page covers dataset export APIs.
44

5-
Related pages: [API: Specs and Data Structures](api_specs.md), [API: Embedding](api_embedding.md), and [API: Inspect](api_inspect.md).
5+
Related pages: [API: Specs and Data Structures](api_specs.md), [API: Embedding](api_embedding.md), [API: Load](api_load.md), and [API: Inspect](api_inspect.md).
66

77
---
88

docs/api_load.md

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
# API: Load
2+
3+
This page covers the export reader API for loading files produced by [`export_batch`](api_export.md).
4+
5+
Related pages: [API: Specs and Data Structures](api_specs.md), [API: Embedding](api_embedding.md), and [API: Export](api_export.md).
6+
7+
---
8+
9+
## load_export (primary / recommended) { #load_export }
10+
11+
### Signature
12+
13+
```python
14+
load_export(
15+
path: Union[str, os.PathLike],
16+
) -> ExportResult
17+
```
18+
19+
Use `load_export(...)` to read any export produced by [`export_batch`](api_export.md) — both **combined** (single file) and **per-item** (directory) layouts are supported. The layout is detected automatically.
20+
21+
### Mental Model
22+
23+
`load_export(...)` answers one question: *where is the export?*
24+
25+
- Pass a **file** (`.npz`, `.nc`, or `.json`) to load a **combined** export.
26+
- Pass a **directory** to load a **per-item** export.
27+
28+
Everything else — layout detection, key parsing, NaN-fill for partial failures — is handled automatically.
29+
30+
### Default Pattern
31+
32+
```python
33+
from rs_embed import load_export
34+
35+
# Combined export (single file)
36+
result = load_export("exports/run.npz")
37+
38+
# Per-item export (directory of p00000.npz, p00001.npz, ...)
39+
result = load_export("exports/per_item_run/")
40+
```
41+
42+
---
43+
44+
## Parameters
45+
46+
| Parameter | Meaning |
47+
| --------- | --------------------------------------------------------------------------------- |
48+
| `path` | Path to a `.npz`/`.nc`/`.json` file (combined) or a directory (per-item export). |
49+
50+
### Raises
51+
52+
| Exception | When |
53+
| ----------------- | --------------------------------------------------------------------- |
54+
| `FileNotFoundError` | Path does not exist. |
55+
| `ValueError` | Path exists but cannot be interpreted as an rs-embed export. |
56+
| `ImportError` | NetCDF export requested but `xarray` is not installed. |
57+
58+
---
59+
60+
## Return Value: ExportResult { #ExportResult }
61+
62+
`load_export(...)` always returns an `ExportResult`.
63+
64+
```python
65+
@dataclass
66+
class ExportResult:
67+
layout: str # "combined" or "per_item"
68+
spatials: list[dict] # one dict per spatial point
69+
temporal: dict | None # temporal spec used at export time
70+
n_items: int # number of spatial points
71+
status: str # "ok" | "partial" | "failed"
72+
models: dict[str, ModelResult] # keyed by model name
73+
manifest: dict # raw manifest for advanced use
74+
```
75+
76+
### Convenience Methods
77+
78+
```python
79+
result.embedding("remoteclip") # → np.ndarray, shape (n_items, dim)
80+
result.ok_models # → list[str] — models with status "ok"
81+
result.failed_models # → list[str] — models with status "failed"
82+
```
83+
84+
`embedding(model)` raises `KeyError` if the model was not part of the export and `ValueError` if the model failed for every point.
85+
86+
---
87+
88+
## ModelResult { #ModelResult }
89+
90+
Each entry in `result.models` is a `ModelResult`:
91+
92+
```python
93+
@dataclass
94+
class ModelResult:
95+
name: str # canonical model identifier
96+
status: str # "ok" | "partial" | "failed"
97+
embeddings: np.ndarray | None # (n_items, dim) or (n_items, C, H, W)
98+
inputs: np.ndarray | None # (n_items, C, H, W) — None if not saved
99+
meta: list[dict] # per-point embedding metadata
100+
error: str | None # error string for fully-failed models
101+
```
102+
103+
**Status values:**
104+
105+
| Status | Meaning |
106+
| --------- | ------------------------------------------------ |
107+
| `"ok"` | All points succeeded. |
108+
| `"partial"` | Some points succeeded; failed points are NaN-filled in `embeddings`. |
109+
| `"failed"` | All points failed; `embeddings` is `None`. |
110+
111+
---
112+
113+
## Common Patterns
114+
115+
### Load and inspect a combined export
116+
117+
```python
118+
from rs_embed import load_export
119+
120+
result = load_export("exports/combined_run.npz")
121+
122+
print(result.n_items) # number of spatial points
123+
print(result.ok_models) # models that succeeded
124+
print(result.temporal) # {'start': '2022-06-01', 'end': '2022-09-01'}
125+
126+
emb = result.embedding("remoteclip") # shape (n_items, dim)
127+
```
128+
129+
### Access inputs when save_inputs=True
130+
131+
```python
132+
result = load_export("exports/combined_run.npz")
133+
mr = result.models["prithvi"]
134+
if mr.inputs is not None:
135+
print(mr.inputs.shape) # (n_items, C, H, W)
136+
```
137+
138+
### Load a per-item export directory
139+
140+
```python
141+
result = load_export("exports/per_item_run/")
142+
print(result.layout) # "per_item"
143+
print(result.n_items) # number of files found
144+
145+
emb = result.embedding("remoteclip") # (n_items, dim) — stacked from per-file arrays
146+
```
147+
148+
### Handle partial failures
149+
150+
```python
151+
result = load_export("exports/combined_run.npz")
152+
153+
if result.failed_models:
154+
print("Failed:", result.failed_models)
155+
156+
for name in result.ok_models:
157+
emb = result.embedding(name)
158+
print(f"{name}: {emb.shape}")
159+
```
160+
161+
### Load via the manifest JSON
162+
163+
Pass the `.json` manifest path if that is what you have — `load_export` finds the paired array file automatically:
164+
165+
```python
166+
result = load_export("exports/combined_run.json")
167+
```
168+
169+
---
170+
171+
## Relationship to export_batch
172+
173+
`load_export` is the read-side counterpart to `export_batch`. Every file produced by `export_batch` can be read back with `load_export` without manually parsing NPZ keys or manifest JSON.
174+
175+
```python
176+
from rs_embed import export_batch, load_export, ExportTarget, ExportConfig, PointBuffer, TemporalSpec
177+
178+
# Write
179+
export_batch(
180+
spatials=[PointBuffer(121.5, 31.2, 2048)],
181+
temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
182+
models=["remoteclip"],
183+
target=ExportTarget.combined("exports/run"),
184+
config=ExportConfig(save_inputs=True),
185+
)
186+
187+
# Read back
188+
result = load_export("exports/run.npz")
189+
emb = result.embedding("remoteclip") # shape (1, dim)
190+
```
191+
192+
!!! tip "Simple rule"
193+
Pass a file path for combined exports, a directory path for per-item exports.
194+
Everything else is automatic.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@ nav:
8989
- Specs & Data Structures: api_specs.md
9090
- Embedding API: api_embedding.md
9191
- Export API: api_export.md
92+
- Load API: api_load.md
9293
- Inspect API: api_inspect.md
9394
- Extending:
9495
- Overview: extending.md

src/rs_embed/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
)
3535
from .export import export_npz
3636
from .inspect import inspect_gee_patch, inspect_provider_patch
37+
from .load import ExportResult, ModelResult, load_export
3738
from .model import Model
3839
from .pipelines.exporter import BatchExporter
3940

@@ -64,6 +65,10 @@
6465
# Export API
6566
"export_batch",
6667
"export_npz",
68+
# Load API
69+
"load_export",
70+
"ExportResult",
71+
"ModelResult",
6772
# Inspection
6873
"inspect_provider_patch",
6974
# Backward-compatible alias for inspect_provider_patch

src/rs_embed/api.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -70,12 +70,6 @@
7070
from .tools.model_defaults import (
7171
resolve_sensor_for_model as _resolve_sensor_for_model,
7272
)
73-
from .tools.normalization import (
74-
# Re-exported so `from rs_embed.api import ...` in tests/downstream still works.
75-
_default_provider_backend_for_api, # noqa: F401
76-
_probe_model_describe, # noqa: F401
77-
_resolve_embedding_api_backend, # noqa: F401
78-
)
7973
from .tools.normalization import (
8074
normalize_backend_name as _normalize_backend_name,
8175
)

0 commit comments

Comments
 (0)