Skip to content

Commit 7c683cc

Browse files
committed
add EPIC.md
1 parent 3b2f08e commit 7c683cc

File tree

1 file changed

+213
-35
lines changed

1 file changed

+213
-35
lines changed

EPIC.md

Lines changed: 213 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,232 @@
1+
# [EPIC] Project data mover (Files ↔ Blob Cool) with AzCopy
2+
13
## Context
2-
PRD: Hot → Cool Project Data Management (Immutable Cool)
34

4-
This epic implements the **project-level** data lifecycle described in the PRD.
5-
The lifecycle applies to the **entire project data folder**, including runs,
6-
documents, and any other project-scoped files.
5+
This EPIC is part of the **Hot → Cool Project Data Management (Immutable Cool)** initiative.
6+
7+
After delivering the **lifecycle state machine, immutability rules, eligibility logic, and admin visibility** (EPIC 1), the system now needs a **reliable, auditable, and verifiable mechanism** to physically move project data between storage tiers.
8+
9+
Project data transitions must be:
10+
11+
* **explicitly triggered**
12+
* **long-running**
13+
* **idempotent**
14+
* **verifiable**
15+
* **fully observable**
16+
17+
Actual data movement is delegated to **euphrosyne-tools-api**, using **AzCopy** for storage-native, high-throughput transfers.
718

819
---
920

1021
## Goal
11-
Implement **project-level** lifecycle state machine, automation, immutability
12-
enforcement, and UI visibility.
1322

14-
The lifecycle controls when a project’s data is stored in:
15-
- Azure Files (HOT workspace), or
16-
- Azure Blob Storage (COOL, immutable).
23+
Implement **project-level COOL and RESTORE operations** that:
24+
25+
* move **entire project data** between:
26+
27+
* Azure Files (HOT)
28+
* Azure Blob Storage (Cool tier)
29+
* are executed via **AzCopy**
30+
* are tracked as **long-running lifecycle operations**
31+
* are verified using **AzCopy transfer summaries**
32+
* drive project lifecycle state transitions in Euphrosyne
33+
34+
This EPIC is the first one that **moves bytes**, not just states.
1735

1836
---
1937

20-
## Success criteria
38+
## Scope
39+
40+
### In scope
41+
42+
* Project-level data copy:
43+
44+
* Files → Blob Cool (COOL)
45+
* Blob Cool → Files (RESTORE)
46+
* Long-running operation tracking
47+
* Operation polling and status reconciliation
48+
* Verification based on:
49+
50+
* expected file count
51+
* expected byte size
52+
* Automatic triggering for eligible projects
53+
* Manual triggering for restore
54+
* Full auditability
55+
56+
### Out of scope (explicit)
57+
58+
* Deletion of HOT data after cooling
59+
* Cold / Archive tier
60+
* Partial project tiering
61+
* Delta sync or incremental copy
62+
* Deduplication
63+
* Concurrent HOT + COOL writes
64+
65+
---
66+
67+
## High-level design
68+
69+
### Source of truth
70+
71+
* **Euphrosyne DB** is the authoritative source for:
72+
73+
* lifecycle state
74+
* storage class
75+
* eligibility
76+
* expected bytes / files
77+
* **tools-api** is the executor and reporter of physical copy operations
78+
79+
State transitions **never happen based on AzCopy alone**
80+
they only occur after **verified success** is reported back.
81+
82+
---
83+
84+
### Lifecycle operations
85+
86+
Each COOL or RESTORE is modeled as a **single lifecycle operation** with:
87+
88+
* a unique `operation_id`
89+
* a fixed direction (`COOL` or `RESTORE`)
90+
* immutable expectations (bytes/files)
91+
* monotonic status progression:
92+
93+
```
94+
PENDING → RUNNING → SUCCEEDED | FAILED
95+
```
96+
97+
Operations are:
98+
99+
* idempotent per `(project_id, operation_id)`
100+
* never reused across retries
101+
* fully auditable
102+
103+
---
104+
105+
### Storage movement model
106+
107+
**COOL**
108+
109+
```
110+
Azure Files (project folder)
111+
112+
Azure Blob Storage (Cool tier, project prefix)
113+
```
114+
115+
**RESTORE**
116+
117+
```
118+
Azure Blob Storage (Cool tier)
119+
120+
Azure Files (project folder)
121+
```
21122

22-
- Projects automatically become eligible for cooling based on activity:
23-
- Initial eligibility: `project.created + 6 months`
24-
- Updated eligibility each time a new run is planned:
25-
`run.end_date + 6 months`
26-
- Entire project data (runs + documents) is cooled as a single unit.
27-
- Restore works on demand and returns the project to HOT.
28-
- Writes are blocked when the project is in `COOL` or `COOLING`:
29-
- document uploads/edits/deletes
30-
- run outputs
31-
- any write under the project folder
32-
- **New runs cannot be created** when the project is `COOL` or `COOLING`.
33-
- Admins can see:
34-
- lifecycle state
35-
- cooling eligibility date
36-
- last lifecycle operation and error (if any).
123+
Characteristics:
124+
125+
* Entire project moves as a single unit
126+
* Directory structure is preserved
127+
* COOL storage is treated as **immutable**
128+
* HOT storage is treated as **ephemeral workspace**
37129

38130
---
39131

40-
## Non-goals (explicit for this epic)
132+
## AzCopy integration model
133+
134+
### Role of AzCopy
135+
136+
AzCopy is used as the **only mechanism** for data transfer:
137+
138+
* high throughput
139+
* resumable
140+
* storage-native verification
141+
142+
tools-api is responsible for:
41143

42-
- No cold/archive tier
43-
- No partial (per-run) cooling
44-
- No deletion of hot data after cooling
45-
- No detection of concurrent readers (existing VM mounts tolerated)
144+
* starting AzCopy jobs
145+
* polling job progress
146+
* parsing AzCopy summaries
147+
* translating AzCopy outcomes into lifecycle operation status
46148

47149
---
48150

49-
## Notes
151+
### Verification contract
152+
153+
A lifecycle operation is considered **successful** if and only if:
154+
155+
* AzCopy job completes successfully
156+
* `files_copied == expected_files`
157+
* `bytes_copied == expected_bytes`
158+
159+
Any mismatch or execution error results in **FAILED**.
160+
161+
There is no partial success.
162+
163+
---
164+
165+
## Execution flow (nominal)
166+
167+
### COOL (automatic)
168+
169+
1. Project becomes eligible for cooling
170+
2. Euphrosyne starts a COOL operation
171+
3. Project enters `COOLING`
172+
4. tools-api launches AzCopy (Files → Blob Cool)
173+
5. AzCopy completes
174+
6. tools-api reports verified success + stats
175+
7. Euphrosyne marks project `COOL`
176+
177+
---
178+
179+
### RESTORE (manual)
180+
181+
1. User or admin triggers restore
182+
2. Project enters `RESTORING`
183+
3. tools-api launches AzCopy (Blob Cool → Files)
184+
4. AzCopy completes
185+
5. tools-api reports verified success + stats
186+
6. Euphrosyne marks project `HOT`
187+
188+
---
189+
190+
## Failure model
191+
192+
* Any failure during copy or verification:
193+
194+
* lifecycle operation → `FAILED`
195+
* project lifecycle → `ERROR`
196+
* Errors are:
197+
198+
* persisted
199+
* visible to admins
200+
* retryable via a new operation
201+
202+
Project state never flips on partial or unverified success.
203+
204+
---
205+
206+
## Observability & auditability
207+
208+
For each operation, the system records:
209+
210+
* type (COOL / RESTORE)
211+
* timestamps (start / finish)
212+
* status
213+
* expected vs actual bytes/files
214+
* error details (if any)
215+
216+
Admins can:
217+
218+
* inspect operation history
219+
* understand failures
220+
* retry safely
221+
222+
---
223+
224+
## Success criteria
225+
226+
This EPIC is considered complete when:
50227

51-
- Lifecycle state is tracked **at the project level**, not run level.
52-
- Euphrosyne is the source of truth for lifecycle state and eligibility.
53-
- Physical data movement is handled by euphrosyne-tools-api and is out of scope
54-
for this epic.
228+
* COOL and RESTORE operations move full project data via AzCopy
229+
* Operations are fully tracked and observable
230+
* Verification gates lifecycle state transitions
231+
* Automatic cooling works end-to-end
232+
* Restore reliably returns projects to HOT

0 commit comments

Comments
 (0)