You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All fields are transferred between states, including:
16
+
All fields transfer between states, including:
17
17
- URL
18
18
- Scraped Title
19
19
- Generated Title
@@ -23,6 +23,21 @@ All fields are transferred between states, including:
23
23
- Scraped Text
24
24
- Any additional metadata
25
25
26
+
## Pattern Application
27
+
28
+
### When Patterns Are Applied
29
+
Patterns are applied in two scenarios:
30
+
1. During migration from Dump to Delta
31
+
2. When a new pattern is created/updated
32
+
33
+
Patterns are NOT applied during promotion. The effects of patterns (modified titles, document types, etc.) are carried through to CuratedUrls during promotion, but the patterns themselves don't reapply.
34
+
35
+
### Pattern Effects
36
+
- Patterns modify DeltaUrls when they are created or when DeltaUrls are created through migration
37
+
- Pattern-modified fields (titles, document types, etc.) become part of the DeltaUrl's data
38
+
- These modifications persist through promotion to CuratedUrls
39
+
- Pattern relationships (which patterns affect which URLs) are maintained for tracking purposes
40
+
26
41
## Migration Process (Dump → Delta)
27
42
28
43
### Overview
@@ -43,49 +58,7 @@ Migration converts DumpUrls to DeltaUrls, preserving all fields and applying pat
43
58
44
59
### Examples
45
60
46
-
#### Example 1: Basic Migration
47
-
```python
48
-
# Starting State
49
-
dump_url = DumpUrl(
50
-
url="example.com/doc",
51
-
scraped_title="Original Title",
52
-
document_type=DocumentTypes.DOCUMENTATION
53
-
)
54
-
55
-
# After Migration
56
-
delta_url = DeltaUrl(
57
-
url="example.com/doc",
58
-
scraped_title="Original Title",
59
-
document_type=DocumentTypes.DOCUMENTATION,
60
-
to_delete=False
61
-
)
62
-
```
63
-
64
-
#### Example 2: Migration with Existing Curated
65
-
```python
66
-
# Starting State
67
-
dump_url = DumpUrl(
68
-
url="example.com/doc",
69
-
scraped_title="New Title",
70
-
document_type=DocumentTypes.DOCUMENTATION
71
-
)
72
-
73
-
curated_url = CuratedUrl(
74
-
url="example.com/doc",
75
-
scraped_title="Old Title",
76
-
document_type=DocumentTypes.DOCUMENTATION
77
-
)
78
-
79
-
# After Migration
80
-
delta_url = DeltaUrl(
81
-
url="example.com/doc",
82
-
scraped_title="New Title", # Different from curated
83
-
document_type=DocumentTypes.DOCUMENTATION,
84
-
to_delete=False
85
-
)
86
-
```
87
-
88
-
#### Example 3: Migration with Pattern Application
61
+
#### Example 1: Migration with Pattern Application
89
62
```python
90
63
# Starting State
91
64
dump_url = DumpUrl(
@@ -111,15 +84,15 @@ delta_url = DeltaUrl(
111
84
## Promotion Process (Delta → Curated)
112
85
113
86
### Overview
114
-
Promotion moves DeltaUrls to CuratedUrls, applying all changes including explicit NULL values. This occurs when:
115
-
- A curator marks a collection as Curated.
87
+
Promotion moves DeltaUrls to CuratedUrls, carrying forward all changes including pattern-applied modifications. This occurs when:
88
+
- A curator marks a collection as Curated
116
89
117
90
### Steps
118
91
1. Process each DeltaUrl:
119
92
- If marked for deletion: Remove matching CuratedUrl
120
93
- Otherwise: Update/create CuratedUrl with ALL fields
121
94
2. Clear all DeltaUrls
122
-
3.Refresh pattern relationships
95
+
3.Update pattern relationship tracking
123
96
124
97
### Examples
125
98
@@ -186,18 +159,13 @@ curated_url = CuratedUrl(
186
159
187
160
## Important Notes
188
161
162
+
189
163
### Field Handling
190
164
- ALL fields are copied during migration and promotion
191
165
- NULL values in DeltaUrls are treated as explicit values
192
166
- Pattern-set values take precedence over original values
193
167
194
-
### Pattern Application
195
-
- Patterns are applied after migration
196
-
- Pattern effects persist through promotion
197
-
- Multiple patterns can affect the same URL
198
-
199
-
### Data Integrity
200
-
- Migrations preserve all field values
201
-
- Promotions apply all changes
202
-
- Deletion flags are honored during promotion
203
-
- Pattern relationships are maintained
168
+
### Pattern Behavior
169
+
- Patterns only apply during migration or when patterns themselves are created/updated
170
+
- Pattern effects are preserved during promotion as regular field values
171
+
- Patterns are NOT re-applied during promotion. This means you can't add a DeltaUrl outside of the migration process and expect patterns to apply. In this case, you would need to either add it as a DumpUrl and migrate it correctly, or add it as a DeltaUrl manually apply the pattern.
0 commit comments