You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sde_collections/models/README_LIFECYCLE.md
+72-3Lines changed: 72 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,17 +56,84 @@ Migration converts DumpUrls to DeltaUrls, preserving all fields and applying pat
56
56
4. Apply all patterns to new Deltas
57
57
5. Clear DumpUrls
58
58
59
+
## Migration Process (Dump → Delta)
60
+
61
+
### Overview
62
+
Migration converts DumpUrls to DeltaUrls, preserving all fields and applying patterns. This process happens when:
63
+
- New content is scraped
64
+
- Content is reindexed
65
+
- Collection is being prepared for curation
66
+
### Steps
67
+
1. Clear existing DeltaUrls
68
+
2. Process each DumpUrl:
69
+
- If matching CuratedUrl exists: Create Delta with all fields
70
+
- If no matching CuratedUrl: Create Delta as new URL
71
+
3. Process missing CuratedUrls:
72
+
- Create deletion Deltas for any not in Dump
73
+
4. Apply all patterns to new Deltas
74
+
5. Clear DumpUrls
75
+
59
76
### Examples
60
77
61
-
#### Example 1: Migration with Pattern Application
78
+
#### Example 1: Basic Migration
79
+
If there are no patterns or existing CuratedUrls, the DeltaUrl will be created from the DumpUrl.
80
+
```python
81
+
# Starting State
82
+
dump_url = DumpUrl(
83
+
url="example.com/doc",
84
+
scraped_title="Original Title",
85
+
document_type=DocumentTypes.DOCUMENTATION
86
+
)
87
+
88
+
# After Migration
89
+
delta_url = DeltaUrl(
90
+
url="example.com/doc",
91
+
scraped_title="Original Title",
92
+
document_type=DocumentTypes.DOCUMENTATION,
93
+
to_delete=False
94
+
)
95
+
```
96
+
97
+
#### Example 2: Migration with Existing Curated
98
+
If a CuratedUrl exists for the URL, and the DumpUrl has changes, a DeltaUrl will be created.
99
+
```python
100
+
# Starting State
101
+
dump_url = DumpUrl(
102
+
url="example.com/doc",
103
+
scraped_title="New Title",
104
+
document_type=DocumentTypes.DOCUMENTATION
105
+
)
106
+
107
+
curated_url = CuratedUrl(
108
+
url="example.com/doc",
109
+
scraped_title="Old Title",
110
+
document_type=DocumentTypes.DOCUMENTATION
111
+
)
112
+
113
+
# After Migration
114
+
delta_url = DeltaUrl(
115
+
url="example.com/doc",
116
+
scraped_title="New Title", # Different from curated
117
+
document_type=DocumentTypes.DOCUMENTATION,
118
+
to_delete=False
119
+
)
120
+
121
+
curated_url = CuratedUrl(
122
+
url="example.com/doc",
123
+
scraped_title="Old Title",
124
+
document_type=DocumentTypes.DOCUMENTATION
125
+
)
126
+
```
127
+
128
+
#### Example 3: Migration with Pattern Application
129
+
If a pattern exists that modifies the document type of a DumpUrl, that pattern will be applied and the DeltaUrl will reflect the pattern's changes.
62
130
```python
63
131
# Starting State
64
132
dump_url = DumpUrl(
65
133
url="example.com/data/file.pdf",
66
134
scraped_title="Data File",
67
135
document_type=None
68
136
)
69
-
70
137
document_type_pattern = DocumentTypePattern(
71
138
match_pattern="*.pdf",
72
139
document_type=DocumentTypes.DATA
@@ -97,6 +164,7 @@ Promotion moves DeltaUrls to CuratedUrls, carrying forward all changes including
97
164
### Examples
98
165
99
166
#### Example 1: Basic Promotion
167
+
If there ae no CuratedUrls for the URL, the DeltaUrl will be promoted to a new CuratedUrl.
100
168
```python
101
169
# Starting State
102
170
delta_url = DeltaUrl(
@@ -115,6 +183,7 @@ curated_url = CuratedUrl(
115
183
```
116
184
117
185
#### Example 2: Promotion with NULL Override
186
+
It's important to notice that the None value in the DeltaUrl is preserved in the CuratedUrl.
118
187
```python
119
188
# Starting State
120
189
delta_url = DeltaUrl(
@@ -139,6 +208,7 @@ curated_url = CuratedUrl(
139
208
```
140
209
141
210
#### Example 3: Deletion During Promotion
211
+
If there is no DumpUrl for an existing CuratedUrl, this signifies the url has been removed from the collection. A DeltaUrl with `to_delete=True` will be created, and on promotion the CuratedUrl will be deleted.
142
212
```python
143
213
# Starting State
144
214
delta_url = DeltaUrl(
@@ -159,7 +229,6 @@ curated_url = CuratedUrl(
159
229
160
230
## Important Notes
161
231
162
-
163
232
### Field Handling
164
233
- ALL fields are copied during migration and promotion
165
234
- NULL values in DeltaUrls are treated as explicit values
0 commit comments