|
| 1 | +## [TASK] Implement deterministic HOT/COOL storage resolver for `project_slug` (URI-based) |
| 2 | + |
| 3 | +### Context |
| 4 | + |
| 5 | +PRD – Storage path resolution |
| 6 | + |
| 7 | +`euphrosyne-tools-api` must deterministically compute **where project data lives** for two logical storage roles: |
| 8 | + |
| 9 | +* **HOT** — workspace / active project data |
| 10 | +* **COOL** — immutable cooled project data |
| 11 | + |
| 12 | +The resolver must be **role-based**, not “Azure Files vs Azure Blob”-based, because HOT and COOL may both live on blob containers in the future. |
| 13 | + |
| 14 | +This resolver is **foundational**: all later operations (AzCopy, listing, restore) must rely on it as the **single source of truth** for project storage locations. |
| 15 | + |
| 16 | +--- |
| 17 | + |
| 18 | +## Goal |
| 19 | + |
| 20 | +Implement a deterministic resolver that returns a canonical `DataLocation` for: |
| 21 | + |
| 22 | +* HOT project data |
| 23 | +* COOL project data (when cooling is enabled) |
| 24 | + |
| 25 | +using only: |
| 26 | + |
| 27 | +* `project_slug` |
| 28 | +* environment configuration |
| 29 | + |
| 30 | +No network calls. No Azure SDK usage. |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +## Data model: `DataLocation` |
| 35 | + |
| 36 | +Implement (or add) a minimal immutable dataclass: |
| 37 | + |
| 38 | +```python |
| 39 | +@dataclass(frozen=True) |
| 40 | +class DataLocation: |
| 41 | + role: StorageRole # HOT | COOL |
| 42 | + backend: StorageBackend # AZURE_FILESHARE | AZURE_BLOB |
| 43 | + project_slug: str |
| 44 | + uri: str # canonical URI for project root |
| 45 | +``` |
| 46 | + |
| 47 | +Notes: |
| 48 | + |
| 49 | +* Use `uri` (lowercase) for Python style. |
| 50 | +* `uri` must point to the **project root folder/prefix** (not to a file, not to a run subfolder). |
| 51 | +* `StorageBackend` enum values must be: |
| 52 | + |
| 53 | + * `AZURE_FILESHARE` |
| 54 | + * `AZURE_BLOB` |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## Environment configuration |
| 59 | + |
| 60 | +### Backend selection (per role) |
| 61 | + |
| 62 | +* `DATA_BACKEND=azure_fileshare|azure_blob` |
| 63 | + |
| 64 | + * Used for **HOT** data |
| 65 | + * Required |
| 66 | + |
| 67 | +* `DATA_BACKEND_COOL=azure_fileshare|azure_blob` |
| 68 | + |
| 69 | + * Used for **COOL** data |
| 70 | + * **Optional** |
| 71 | + * If **absent**, cooling is considered **disabled** and COOL resolution must not be allowed |
| 72 | + |
| 73 | +If `DATA_BACKEND_COOL` is set, **all required COOL-specific configuration must be present**; otherwise startup or resolution must fail with a clear configuration error. |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +### Backend-specific configuration |
| 78 | + |
| 79 | +#### Azure Fileshare |
| 80 | + |
| 81 | +* `AZURE_STORAGE_FILESHARE` |
| 82 | + |
| 83 | + * Fileshare name for HOT data (existing config; keep as-is) |
| 84 | + |
| 85 | +* `AZURE_STORAGE_FILESHARE_COOL` |
| 86 | + |
| 87 | + * Fileshare name for COOL data (required if `DATA_BACKEND_COOL=azure_fileshare`) |
| 88 | + |
| 89 | +#### Azure Blob |
| 90 | + |
| 91 | +* `AZURE_STORAGE_DATA_CONTAINER` |
| 92 | + |
| 93 | + * Blob container for HOT data (existing config; keep as-is) |
| 94 | + |
| 95 | +* `AZURE_STORAGE_DATA_CONTAINER_COOL` |
| 96 | + |
| 97 | + * Blob container for COOL data (required if `DATA_BACKEND_COOL=azure_blob`) |
| 98 | + |
| 99 | +--- |
| 100 | + |
| 101 | +### Project prefix configuration |
| 102 | + |
| 103 | +Project prefix must be **backend-agnostic**. |
| 104 | + |
| 105 | +* `DATA_PROJECTS_LOCATION_PREFIX` |
| 106 | + |
| 107 | + * Base prefix for HOT data |
| 108 | + |
| 109 | +* `DATA_PROJECTS_LOCATION_PREFIX_COOL` |
| 110 | + |
| 111 | + * Base prefix for COOL data |
| 112 | + |
| 113 | +**Backward compatibility rule**: |
| 114 | +* No backward compatibility rule. Replace `AZURE_STORAGE_PROJECTS_LOCATION_PREFIX` with `DATA_PROJECTS_LOCATION_PREFIX` |
| 115 | + |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +## Resolver behavior |
| 120 | + |
| 121 | +### Resolver API |
| 122 | + |
| 123 | +Expose role-based resolver functions (names indicative): |
| 124 | + |
| 125 | +* `resolve_hot_location(project_slug: str) -> DataLocation` |
| 126 | +* `resolve_cool_location(project_slug: str) -> DataLocation` |
| 127 | + |
| 128 | +Optionally expose: |
| 129 | + |
| 130 | +* `resolve_location(role: StorageRole, project_slug: str) -> DataLocation` |
| 131 | + |
| 132 | +The resolver must: |
| 133 | + |
| 134 | +* validate `project_slug` |
| 135 | +* determine backend from env configuration |
| 136 | +* build a **canonical URI** |
| 137 | +* return `DataLocation(role, backend, project_slug, uri)` |
| 138 | + |
| 139 | +If `resolve_cool_location` is called while `DATA_BACKEND_COOL` is **unset**, raise a clear error indicating that cooling is disabled. |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +## Canonical URI formats |
| 144 | + |
| 145 | +Use **no trailing slash** policy. |
| 146 | + |
| 147 | +### Azure Fileshare |
| 148 | + |
| 149 | +``` |
| 150 | +https://{account}.file.core.windows.net/{share}/{prefix}/{project_slug} |
| 151 | +``` |
| 152 | + |
| 153 | +### Azure Blob |
| 154 | + |
| 155 | +``` |
| 156 | +https://{account}.blob.core.windows.net/{container}/{prefix}/{project_slug} |
| 157 | +``` |
| 158 | + |
| 159 | +### Prefix normalization rules |
| 160 | + |
| 161 | +* Strip leading/trailing `/` from prefixes before joining |
| 162 | +* Avoid double slashes in the resulting path |
| 163 | +* Empty prefix must be handled cleanly |
| 164 | + |
| 165 | +--- |
| 166 | + |
| 167 | +## Validation requirements |
| 168 | + |
| 169 | +Reject invalid `project_slug` values: |
| 170 | + |
| 171 | +* empty string |
| 172 | +* leading or trailing whitespace |
| 173 | +* contains `/` or `\` |
| 174 | +* contains `..` |
| 175 | +* contains `//` |
| 176 | + |
| 177 | +Raise a clear, FastAPI-compatible error (HTTP 400–class). |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +## Determinism & stability requirements |
| 182 | + |
| 183 | +* Same inputs + same env config → **exact same `uri` string** |
| 184 | +* Must not depend on: |
| 185 | + |
| 186 | + * timestamps |
| 187 | + * randomness |
| 188 | + * operation_id |
| 189 | + * mutable metadata |
| 190 | +* All project storage URIs must be produced via this resolver; no ad-hoc concatenation elsewhere in the codebase. |
| 191 | + |
| 192 | +--- |
| 193 | + |
| 194 | +## Unit tests |
| 195 | + |
| 196 | +Add unit tests covering: |
| 197 | + |
| 198 | +1. **Determinism** |
| 199 | + |
| 200 | + * same slug + same config → identical `DataLocation` (including `uri`) |
| 201 | + |
| 202 | +2. **Golden snapshots** |
| 203 | + |
| 204 | + * exact URI assertion for: |
| 205 | + |
| 206 | + * HOT (fileshare) |
| 207 | + * COOL (blob) |
| 208 | + |
| 209 | +3. **Prefix joining** |
| 210 | + |
| 211 | + * empty prefix |
| 212 | + * non-empty prefix (no double slashes) |
| 213 | + |
| 214 | +4. **Validation** |
| 215 | + |
| 216 | + * invalid slugs rejected: |
| 217 | + |
| 218 | + * `""` |
| 219 | + * `"../x"` |
| 220 | + * `"a/b"` |
| 221 | + * `"a\\b"` |
| 222 | + * `"a..b"` |
| 223 | + * `" a "` |
| 224 | + |
| 225 | +5. **Role backend selection** |
| 226 | + |
| 227 | + * HOT backend driven by `DATA_BACKEND` |
| 228 | + * COOL backend driven by `DATA_BACKEND_COOL` |
| 229 | + * COOL resolution fails when `DATA_BACKEND_COOL` is unset |
| 230 | + |
| 231 | +--- |
| 232 | + |
| 233 | +## Acceptance criteria |
| 234 | + |
| 235 | +* `DataLocation` dataclass exists with: |
| 236 | + |
| 237 | + * `role`, `backend`, `project_slug`, `uri` |
| 238 | +* HOT and COOL resolver functions exist and are the **single source of truth** for project storage URIs |
| 239 | +* HOT and COOL backends are configurable independently |
| 240 | +* Cooling is **disabled by default** when `DATA_BACKEND_COOL` is absent |
| 241 | +* URIs are canonical, stable, and validated |
| 242 | +* Unit tests validate mapping, validation, and backend selection |
| 243 | + |
| 244 | +--- |
| 245 | + |
| 246 | +## Notes |
| 247 | + |
| 248 | +* This task is **resolution only**: |
| 249 | + |
| 250 | + * no Azure API calls |
| 251 | + * no AzCopy |
| 252 | + * no authentication or SAS logic |
| 253 | +* Keep implementation minimal, explicit, and well-documented. |
| 254 | +* This resolver defines a long-lived contract; correctness and stability matter more than flexibility. |
| 255 | + |
0 commit comments