Skip to content

Commit 5ad6f4c

Browse files
committed
add explanatory commentary to the lifecycle readme
1 parent 6b73771 commit 5ad6f4c

File tree

1 file changed

+72
-3
lines changed

1 file changed

+72
-3
lines changed

sde_collections/models/README_LIFECYCLE.md

Lines changed: 72 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -56,17 +56,84 @@ Migration converts DumpUrls to DeltaUrls, preserving all fields and applying pat
5656
4. Apply all patterns to new Deltas
5757
5. Clear DumpUrls
5858

59+
## Migration Process (Dump → Delta)
60+
61+
### Overview
62+
Migration converts DumpUrls to DeltaUrls, preserving all fields and applying patterns. This process happens when:
63+
- New content is scraped
64+
- Content is reindexed
65+
- Collection is being prepared for curation
66+
### Steps
67+
1. Clear existing DeltaUrls
68+
2. Process each DumpUrl:
69+
- If matching CuratedUrl exists: Create Delta with all fields
70+
- If no matching CuratedUrl: Create Delta as new URL
71+
3. Process missing CuratedUrls:
72+
- Create deletion Deltas for any not in Dump
73+
4. Apply all patterns to new Deltas
74+
5. Clear DumpUrls
75+
5976
### Examples
6077

61-
#### Example 1: Migration with Pattern Application
78+
#### Example 1: Basic Migration
79+
If there are no patterns or existing CuratedUrls, the DeltaUrl will be created from the DumpUrl.
80+
```python
81+
# Starting State
82+
dump_url = DumpUrl(
83+
url="example.com/doc",
84+
scraped_title="Original Title",
85+
document_type=DocumentTypes.DOCUMENTATION
86+
)
87+
88+
# After Migration
89+
delta_url = DeltaUrl(
90+
url="example.com/doc",
91+
scraped_title="Original Title",
92+
document_type=DocumentTypes.DOCUMENTATION,
93+
to_delete=False
94+
)
95+
```
96+
97+
#### Example 2: Migration with Existing Curated
98+
If a CuratedUrl exists for the URL, and the DumpUrl has changes, a DeltaUrl will be created.
99+
```python
100+
# Starting State
101+
dump_url = DumpUrl(
102+
url="example.com/doc",
103+
scraped_title="New Title",
104+
document_type=DocumentTypes.DOCUMENTATION
105+
)
106+
107+
curated_url = CuratedUrl(
108+
url="example.com/doc",
109+
scraped_title="Old Title",
110+
document_type=DocumentTypes.DOCUMENTATION
111+
)
112+
113+
# After Migration
114+
delta_url = DeltaUrl(
115+
url="example.com/doc",
116+
scraped_title="New Title", # Different from curated
117+
document_type=DocumentTypes.DOCUMENTATION,
118+
to_delete=False
119+
)
120+
121+
curated_url = CuratedUrl(
122+
url="example.com/doc",
123+
scraped_title="Old Title",
124+
document_type=DocumentTypes.DOCUMENTATION
125+
)
126+
```
127+
128+
#### Example 3: Migration with Pattern Application
129+
If a pattern exists that modifies the document type of a DumpUrl, that pattern will be applied and the DeltaUrl will reflect the pattern's changes.
62130
```python
63131
# Starting State
64132
dump_url = DumpUrl(
65133
url="example.com/data/file.pdf",
66134
scraped_title="Data File",
67135
document_type=None
68136
)
69-
70137
document_type_pattern = DocumentTypePattern(
71138
match_pattern="*.pdf",
72139
document_type=DocumentTypes.DATA
@@ -97,6 +164,7 @@ Promotion moves DeltaUrls to CuratedUrls, carrying forward all changes including
97164
### Examples
98165

99166
#### Example 1: Basic Promotion
167+
If there ae no CuratedUrls for the URL, the DeltaUrl will be promoted to a new CuratedUrl.
100168
```python
101169
# Starting State
102170
delta_url = DeltaUrl(
@@ -115,6 +183,7 @@ curated_url = CuratedUrl(
115183
```
116184

117185
#### Example 2: Promotion with NULL Override
186+
It's important to notice that the None value in the DeltaUrl is preserved in the CuratedUrl.
118187
```python
119188
# Starting State
120189
delta_url = DeltaUrl(
@@ -139,6 +208,7 @@ curated_url = CuratedUrl(
139208
```
140209

141210
#### Example 3: Deletion During Promotion
211+
If there is no DumpUrl for an existing CuratedUrl, this signifies the url has been removed from the collection. A DeltaUrl with `to_delete=True` will be created, and on promotion the CuratedUrl will be deleted.
142212
```python
143213
# Starting State
144214
delta_url = DeltaUrl(
@@ -159,7 +229,6 @@ curated_url = CuratedUrl(
159229

160230
## Important Notes
161231

162-
163232
### Field Handling
164233
- ALL fields are copied during migration and promotion
165234
- NULL values in DeltaUrls are treated as explicit values

0 commit comments

Comments
 (0)