Skip to content

Commit 3b2f08e

Browse files
authored
Merge pull request #685 from betagouv/epic/m1-lifecycle-fondation
Epic : M1 lifecycle fondation
2 parents c050208 + 557899a commit 3b2f08e

File tree

17 files changed

+1414
-33
lines changed

17 files changed

+1414
-33
lines changed

.env.example

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,13 @@ AZURE_RESOURCE_PREFIX=euphrosyne-01-vm-
77
AZURE_TEMPLATE_SPECS_NAME=vmSpec
88
AZURE_STORAGE_ACCOUNT=
99
AZURE_STORAGE_FILESHARE=
10-
AZURE_STORAGE_PROJECTS_LOCATION_PREFIX=
11-
PROJECT_STORAGE_BACKEND=azure_fileshare
1210
AZURE_STORAGE_DATA_CONTAINER=
11+
AZURE_STORAGE_FILESHARE_COOL=
12+
AZURE_STORAGE_DATA_CONTAINER_COOL=
13+
DATA_BACKEND=azure_fileshare
14+
DATA_BACKEND_COOL=
15+
DATA_PROJECTS_LOCATION_PREFIX=
16+
DATA_PROJECTS_LOCATION_PREFIX_COOL=
1317
AZURE_IMAGE_GALLERY=
1418
AZURE_IMAGE_DEFINITION=
1519
CORS_ALLOWED_ORIGIN=http://localhost:8000 http://localhost:8001

EPIC.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
## Context
2+
PRD: Hot → Cool Project Data Management (Immutable Cool)
3+
4+
This epic implements the **project-level** data lifecycle described in the PRD.
5+
The lifecycle applies to the **entire project data folder**, including runs,
6+
documents, and any other project-scoped files.
7+
8+
---
9+
10+
## Goal
11+
Implement **project-level** lifecycle state machine, automation, immutability
12+
enforcement, and UI visibility.
13+
14+
The lifecycle controls when a project’s data is stored in:
15+
- Azure Files (HOT workspace), or
16+
- Azure Blob Storage (COOL, immutable).
17+
18+
---
19+
20+
## Success criteria
21+
22+
- Projects automatically become eligible for cooling based on activity:
23+
- Initial eligibility: `project.created + 6 months`
24+
- Updated eligibility each time a new run is planned:
25+
`run.end_date + 6 months`
26+
- Entire project data (runs + documents) is cooled as a single unit.
27+
- Restore works on demand and returns the project to HOT.
28+
- Writes are blocked when the project is in `COOL` or `COOLING`:
29+
- document uploads/edits/deletes
30+
- run outputs
31+
- any write under the project folder
32+
- **New runs cannot be created** when the project is `COOL` or `COOLING`.
33+
- Admins can see:
34+
- lifecycle state
35+
- cooling eligibility date
36+
- last lifecycle operation and error (if any).
37+
38+
---
39+
40+
## Non-goals (explicit for this epic)
41+
42+
- No cold/archive tier
43+
- No partial (per-run) cooling
44+
- No deletion of hot data after cooling
45+
- No detection of concurrent readers (existing VM mounts tolerated)
46+
47+
---
48+
49+
## Notes
50+
51+
- Lifecycle state is tracked **at the project level**, not run level.
52+
- Euphrosyne is the source of truth for lifecycle state and eligibility.
53+
- Physical data movement is handled by euphrosyne-tools-api and is out of scope
54+
for this epic.

README.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@ Ce projet utilise [FastAPI](https://fastapi.tiangolo.com/).
1212

1313
| Nom de la variable | Description | Requis |
1414
| -------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ |
15-
| PROJECT_STORAGE_BACKEND | Optionnel. Backend de stockage des données projets. Valeurs : `azure_fileshare` (défaut) ou `azure_blob`. |
15+
| DATA_BACKEND | Requis. Backend de stockage des données projets HOT. Valeurs : `azure_fileshare` ou `azure_blob`. |
16+
| DATA_BACKEND_COOL | Optionnel. Backend de stockage des données projets COOL. Valeurs : `azure_fileshare` ou `azure_blob`. Si absent, le refroidissement est désactivé. |
1617
| AZURE_SUBSCRIPTION_ID | ID de la souscription. Azure |
1718
| AZURE_CLIENT_ID | ID de l'application Azure (voir section _Générer les clés pour s'authentifier auprès d'Azure_). |
1819
| AZURE_CLIENT_SECRET | Secret de l'application Azure (voir section _Générer les clés pour s'authentifier auprès d'Azure_). |
@@ -21,9 +22,12 @@ Ce projet utilise [FastAPI](https://fastapi.tiangolo.com/).
2122
| AZURE_TEMPLATE_SPECS_NAME | Nom du _Template specs_ utilisé pour déployer les machines virtuelles. Voir le projet `euphrosyne-tools-infra`. |
2223
| AZURE_RESOURCE_PREFIX | Optionnel. Préfixe utilisé pour éviter les collisions de nom lors de la création de ressources sur Azure. Doit être le même que dans la configuration Terraform (projet `euphrosyne-tools-infra`). |
2324
| AZURE_STORAGE_ACCOUNT | Nom du _Storage account_ Azure. |
24-
| AZURE_STORAGE_FILESHARE | Nom du _Fileshare_ contenant les fichiers de données sur le _Storage account_ Azure. Requis si `PROJECT_STORAGE_BACKEND=azure_fileshare` (valeur par défaut). |
25-
| AZURE_STORAGE_PROJECTS_LOCATION_PREFIX | Optionnel. Prefixe lorsque le dossier contenant les fichiers de données sur le _Fileshare_ Azure n'est pas à la racine. |
26-
| AZURE_STORAGE_DATA_CONTAINER | Nom du container Blob utilisé pour les données projets (requis si `PROJECT_STORAGE_BACKEND=azure_blob`). |
25+
| AZURE_STORAGE_FILESHARE | Nom du _Fileshare_ contenant les fichiers de données sur le _Storage account_ Azure. Requis si `DATA_BACKEND=azure_fileshare`. |
26+
| AZURE_STORAGE_FILESHARE_COOL | Nom du _Fileshare_ contenant les données projets COOL. Requis si `DATA_BACKEND_COOL=azure_fileshare`. |
27+
| DATA_PROJECTS_LOCATION_PREFIX | Optionnel. Préfixe du chemin de base des projets pour HOT. |
28+
| DATA_PROJECTS_LOCATION_PREFIX_COOL | Optionnel. Préfixe du chemin de base des projets pour COOL. |
29+
| AZURE_STORAGE_DATA_CONTAINER | Nom du container Blob utilisé pour les données projets (requis si `DATA_BACKEND=azure_blob`). |
30+
| AZURE_STORAGE_DATA_CONTAINER_COOL | Nom du container Blob utilisé pour les données projets COOL (requis si `DATA_BACKEND_COOL=azure_blob`). |
2731
| AZURE_IMAGE_GALLERY | Nom de la _Azure compute gallery_ qui stock les différentes images |
2832
| AZURE_IMAGE_DEFINITION | Nom de la _VM image definition_ qui est l'image pré-configurée pour les VM Euphrosyne |
2933
| CORS_ALLOWED_ORIGIN | Origines des frontends autorisées à utiliser l'API. Séparer les origines par des espaces. |
@@ -37,12 +41,14 @@ Ce projet utilise [FastAPI](https://fastapi.tiangolo.com/).
3741

3842
## Stockage des données projets
3943

40-
Les données projets peuvent être stockées soit dans un Fileshare Azure, soit dans un container Blob.
44+
Le backend HOT est défini par `DATA_BACKEND`. Le backend COOL est défini par `DATA_BACKEND_COOL` (si absent, le refroidissement est désactivé).
4145

42-
- **Fileshare (par défaut)** : `PROJECT_STORAGE_BACKEND=azure_fileshare` et `AZURE_STORAGE_FILESHARE` doit être renseigné.
43-
- **Blob** : définir `PROJECT_STORAGE_BACKEND=azure_blob` et renseigner `AZURE_STORAGE_DATA_CONTAINER`.
46+
- **HOT Fileshare** : `DATA_BACKEND=azure_fileshare` et `AZURE_STORAGE_FILESHARE` doit être renseigné.
47+
- **HOT Blob** : `DATA_BACKEND=azure_blob` et `AZURE_STORAGE_DATA_CONTAINER` doit être renseigné.
48+
- **COOL Fileshare** : `DATA_BACKEND_COOL=azure_fileshare` et `AZURE_STORAGE_FILESHARE_COOL` doit être renseigné.
49+
- **COOL Blob** : `DATA_BACKEND_COOL=azure_blob` et `AZURE_STORAGE_DATA_CONTAINER_COOL` doit être renseigné.
4450

45-
Le préfixe `AZURE_STORAGE_PROJECTS_LOCATION_PREFIX` continue de s'appliquer (chemin de base des projets) pour les deux backends.
51+
Les préfixes `DATA_PROJECTS_LOCATION_PREFIX` (HOT) et `DATA_PROJECTS_LOCATION_PREFIX_COOL` (COOL) s'appliquent au chemin de base des projets.
4652

4753
## Configurer le CORS (Blob / Fileshare)
4854

TASK.md

Lines changed: 255 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
## [TASK] Implement deterministic HOT/COOL storage resolver for `project_slug` (URI-based)
2+
3+
### Context
4+
5+
PRD – Storage path resolution
6+
7+
`euphrosyne-tools-api` must deterministically compute **where project data lives** for two logical storage roles:
8+
9+
* **HOT** — workspace / active project data
10+
* **COOL** — immutable cooled project data
11+
12+
The resolver must be **role-based**, not “Azure Files vs Azure Blob”-based, because HOT and COOL may both live on blob containers in the future.
13+
14+
This resolver is **foundational**: all later operations (AzCopy, listing, restore) must rely on it as the **single source of truth** for project storage locations.
15+
16+
---
17+
18+
## Goal
19+
20+
Implement a deterministic resolver that returns a canonical `DataLocation` for:
21+
22+
* HOT project data
23+
* COOL project data (when cooling is enabled)
24+
25+
using only:
26+
27+
* `project_slug`
28+
* environment configuration
29+
30+
No network calls. No Azure SDK usage.
31+
32+
---
33+
34+
## Data model: `DataLocation`
35+
36+
Implement (or add) a minimal immutable dataclass:
37+
38+
```python
39+
@dataclass(frozen=True)
40+
class DataLocation:
41+
role: StorageRole # HOT | COOL
42+
backend: StorageBackend # AZURE_FILESHARE | AZURE_BLOB
43+
project_slug: str
44+
uri: str # canonical URI for project root
45+
```
46+
47+
Notes:
48+
49+
* Use `uri` (lowercase) for Python style.
50+
* `uri` must point to the **project root folder/prefix** (not to a file, not to a run subfolder).
51+
* `StorageBackend` enum values must be:
52+
53+
* `AZURE_FILESHARE`
54+
* `AZURE_BLOB`
55+
56+
---
57+
58+
## Environment configuration
59+
60+
### Backend selection (per role)
61+
62+
* `DATA_BACKEND=azure_fileshare|azure_blob`
63+
64+
* Used for **HOT** data
65+
* Required
66+
67+
* `DATA_BACKEND_COOL=azure_fileshare|azure_blob`
68+
69+
* Used for **COOL** data
70+
* **Optional**
71+
* If **absent**, cooling is considered **disabled** and COOL resolution must not be allowed
72+
73+
If `DATA_BACKEND_COOL` is set, **all required COOL-specific configuration must be present**; otherwise startup or resolution must fail with a clear configuration error.
74+
75+
---
76+
77+
### Backend-specific configuration
78+
79+
#### Azure Fileshare
80+
81+
* `AZURE_STORAGE_FILESHARE`
82+
83+
* Fileshare name for HOT data (existing config; keep as-is)
84+
85+
* `AZURE_STORAGE_FILESHARE_COOL`
86+
87+
* Fileshare name for COOL data (required if `DATA_BACKEND_COOL=azure_fileshare`)
88+
89+
#### Azure Blob
90+
91+
* `AZURE_STORAGE_DATA_CONTAINER`
92+
93+
* Blob container for HOT data (existing config; keep as-is)
94+
95+
* `AZURE_STORAGE_DATA_CONTAINER_COOL`
96+
97+
* Blob container for COOL data (required if `DATA_BACKEND_COOL=azure_blob`)
98+
99+
---
100+
101+
### Project prefix configuration
102+
103+
Project prefix must be **backend-agnostic**.
104+
105+
* `DATA_PROJECTS_LOCATION_PREFIX`
106+
107+
* Base prefix for HOT data
108+
109+
* `DATA_PROJECTS_LOCATION_PREFIX_COOL`
110+
111+
* Base prefix for COOL data
112+
113+
**Backward compatibility rule**:
114+
* No backward compatibility rule. Replace `AZURE_STORAGE_PROJECTS_LOCATION_PREFIX` with `DATA_PROJECTS_LOCATION_PREFIX`
115+
116+
117+
---
118+
119+
## Resolver behavior
120+
121+
### Resolver API
122+
123+
Expose role-based resolver functions (names indicative):
124+
125+
* `resolve_hot_location(project_slug: str) -> DataLocation`
126+
* `resolve_cool_location(project_slug: str) -> DataLocation`
127+
128+
Optionally expose:
129+
130+
* `resolve_location(role: StorageRole, project_slug: str) -> DataLocation`
131+
132+
The resolver must:
133+
134+
* validate `project_slug`
135+
* determine backend from env configuration
136+
* build a **canonical URI**
137+
* return `DataLocation(role, backend, project_slug, uri)`
138+
139+
If `resolve_cool_location` is called while `DATA_BACKEND_COOL` is **unset**, raise a clear error indicating that cooling is disabled.
140+
141+
---
142+
143+
## Canonical URI formats
144+
145+
Use **no trailing slash** policy.
146+
147+
### Azure Fileshare
148+
149+
```
150+
https://{account}.file.core.windows.net/{share}/{prefix}/{project_slug}
151+
```
152+
153+
### Azure Blob
154+
155+
```
156+
https://{account}.blob.core.windows.net/{container}/{prefix}/{project_slug}
157+
```
158+
159+
### Prefix normalization rules
160+
161+
* Strip leading/trailing `/` from prefixes before joining
162+
* Avoid double slashes in the resulting path
163+
* Empty prefix must be handled cleanly
164+
165+
---
166+
167+
## Validation requirements
168+
169+
Reject invalid `project_slug` values:
170+
171+
* empty string
172+
* leading or trailing whitespace
173+
* contains `/` or `\`
174+
* contains `..`
175+
* contains `//`
176+
177+
Raise a clear, FastAPI-compatible error (HTTP 400–class).
178+
179+
---
180+
181+
## Determinism & stability requirements
182+
183+
* Same inputs + same env config → **exact same `uri` string**
184+
* Must not depend on:
185+
186+
* timestamps
187+
* randomness
188+
* operation_id
189+
* mutable metadata
190+
* All project storage URIs must be produced via this resolver; no ad-hoc concatenation elsewhere in the codebase.
191+
192+
---
193+
194+
## Unit tests
195+
196+
Add unit tests covering:
197+
198+
1. **Determinism**
199+
200+
* same slug + same config → identical `DataLocation` (including `uri`)
201+
202+
2. **Golden snapshots**
203+
204+
* exact URI assertion for:
205+
206+
* HOT (fileshare)
207+
* COOL (blob)
208+
209+
3. **Prefix joining**
210+
211+
* empty prefix
212+
* non-empty prefix (no double slashes)
213+
214+
4. **Validation**
215+
216+
* invalid slugs rejected:
217+
218+
* `""`
219+
* `"../x"`
220+
* `"a/b"`
221+
* `"a\\b"`
222+
* `"a..b"`
223+
* `" a "`
224+
225+
5. **Role backend selection**
226+
227+
* HOT backend driven by `DATA_BACKEND`
228+
* COOL backend driven by `DATA_BACKEND_COOL`
229+
* COOL resolution fails when `DATA_BACKEND_COOL` is unset
230+
231+
---
232+
233+
## Acceptance criteria
234+
235+
* `DataLocation` dataclass exists with:
236+
237+
* `role`, `backend`, `project_slug`, `uri`
238+
* HOT and COOL resolver functions exist and are the **single source of truth** for project storage URIs
239+
* HOT and COOL backends are configurable independently
240+
* Cooling is **disabled by default** when `DATA_BACKEND_COOL` is absent
241+
* URIs are canonical, stable, and validated
242+
* Unit tests validate mapping, validation, and backend selection
243+
244+
---
245+
246+
## Notes
247+
248+
* This task is **resolution only**:
249+
250+
* no Azure API calls
251+
* no AzCopy
252+
* no authentication or SAS logic
253+
* Keep implementation minimal, explicit, and well-documented.
254+
* This resolver defines a long-lived contract; correctness and stability matter more than flexibility.
255+

0 commit comments

Comments
 (0)