Skip to content

Commit aef7cdb

Browse files
authored
Merge pull request #1146 from NASA-IMPACT/1142-test-field-modified-unapply-logic
refactor readme for unapply logic and refactor unapply to account for overlapping patterns
2 parents a0a80e9 + 85a65d0 commit aef7cdb

File tree

5 files changed

+726
-63
lines changed

5 files changed

+726
-63
lines changed

sde_collections/models/README_UNAPPLY_LOGIC.md

Lines changed: 77 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -14,56 +14,105 @@
1414
- Delta URL exists with pattern effect
1515
- Pattern is removed
1616
```
17-
Curated: None
18-
Delta: division=BIOLOGY (from pattern)
19-
[Pattern removed]
20-
Result: Delta remains with division=None
17+
Curated: None exists
18+
Delta: url=new.com, division=None
19+
```
20+
`[Pattern: division=BIOLOGY], created`
21+
```
22+
Curated: None exists
23+
Delta: url=new.com, division=BIOLOGY
24+
```
25+
`[Pattern: division=BIOLOGY], deleted`
26+
```
27+
Curated: None exists
28+
Delta: url=new.com, division=None
2129
```
2230

23-
### Case 2: Delta and Curated Exist
31+
### Case 2: Delta Created to Apply Pattern
2432
**Scenario:**
25-
- Both curated and delta URLs exist
33+
- A Curated with no division already exists
34+
- A pattern is created
35+
- A delta is created to to apply a pattern
2636
- Pattern is removed
37+
- Delta should be deleted
38+
```
39+
Curated: division=None
2740
```
28-
Curated: division=GENERAL
41+
`[Pattern: division=BIOLOGY], created`
42+
```
43+
Curated: division=None
2944
Delta: division=BIOLOGY (from pattern)
30-
[Pattern removed]
31-
Result: Delta reverts to curated value (division=GENERAL)
32-
If delta now matches curated exactly, delta is deleted
45+
```
46+
`[Pattern: division=BIOLOGY], deleted`
47+
```
48+
Curated: division=None
3349
```
3450

35-
### Case 3: Curated Only
36-
**Scenario:**
37-
- Only curated URL exists
51+
### Case 3: Pre-existing Delta
52+
- A Curated with no division already exists
53+
- A Delta with an updated scraped_title exists
54+
- A pattern is created to set division
55+
- A delta is created to apply a pattern
3856
- Pattern is removed
57+
- Delta should be maintained because of scraped_title
58+
59+
```
60+
Curated: division=None
61+
Delta: scraped_title="Modified", division=None
62+
```
63+
`[Pattern: division=BIOLOGY], created`
64+
```
65+
Curated: division=None
66+
Delta: scraped_title="Modified", division=BIOLOGY (from pattern)
67+
```
68+
`[Pattern: division=BIOLOGY], deleted`
3969
```
40-
Curated: division=GENERAL
41-
Delta: None
42-
[Pattern removed]
43-
Result: New delta created with division=None
70+
Curated: division=None
71+
Delta: scraped_title="Modified", division=None
4472
```
4573

4674
### Case 4: Multiple Pattern Effects
4775
**Scenario:**
4876
- Delta has changes from multiple patterns
4977
- One pattern is removed
5078
```
51-
Curated: division=GENERAL, doc_type=DOCUMENTATION
5279
Delta: division=BIOLOGY, doc_type=DATA (from two patterns)
53-
[Division pattern removed]
54-
Result: Delta remains with division=GENERAL, doc_type=DATA preserved
80+
Pattern: division=BIOLOGY
81+
Pattern: doc_type=DATA
82+
```
83+
`[Pattern: division=BIOLOGY], deleted`
84+
```
85+
Delta: division=None, doc_type=DATA
86+
Pattern: doc_type=DATA
5587
```
5688

57-
### Case 5: Pattern Removal with Manual Changes
58-
**Scenario:**
59-
- Delta has both pattern effect and manual changes
60-
- Pattern is removed
89+
### Case 5: Overlapping Patterns, Specific Deleted
6190
```
62-
Curated: division=GENERAL, title="Original"
63-
Delta: division=BIOLOGY, title="Modified" (pattern + manual)
64-
[Pattern removed]
65-
Result: Delta remains with division=GENERAL, title="Modified" preserved
91+
Delta: division=ASTROPHYSICS (because of specific pattern)
92+
Specific Pattern: division=ASTROPHYSICS
93+
General Pattern: division=BIOLOGY
6694
```
95+
`[Specific Pattern: division=ASTROPHYSICS], deleted`
96+
97+
```
98+
Delta: division=BIOLOGY (because of general pattern)
99+
General Pattern: division=BIOLOGY
100+
```
101+
102+
103+
### Case 6: Overlapping Patterns, General Deleted
104+
```
105+
Delta: division=ASTROPHYSICS (because of specific pattern)
106+
Specific Pattern: division=ASTROPHYSICS
107+
General Pattern: division=BIOLOGY
108+
```
109+
`[General Pattern: division=BIOLOGY], deleted`
110+
111+
```
112+
Delta: division=ASTROPHYSICS (because of specific pattern)
113+
Specific Pattern: division=ASTROPHYSICS
114+
```
115+
67116

68117
## Implementation Steps
69118

sde_collections/models/collection.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -211,18 +211,18 @@ def promote_to_curated(self):
211211
continue
212212

213213
delta_value = getattr(delta, field_name)
214-
if delta_value not in [None, ""] and getattr(curated, field_name) != delta_value:
214+
if getattr(curated, field_name) != delta_value:
215215
updated_fields[field_name] = delta_value
216216

217217
if updated_fields:
218218
CuratedUrl.objects.filter(pk=curated.pk).update(**updated_fields)
219219
else:
220-
# If no matching CuratedUrl, create a new one using all non-null and non-empty fields
220+
# Previously, we excluded fields with values of None and ""
221+
# however, such null values are considered meaningful and should be copied over
221222
new_data = {
222223
field.name: getattr(delta, field.name)
223224
for field in delta._meta.fields
224-
if field.name not in ["to_delete", "collection", "id"]
225-
and getattr(delta, field.name) not in [None, ""]
225+
if field.name not in ["to_delete", "collection", "id"] and getattr(delta, field.name)
226226
}
227227
CuratedUrl.objects.create(collection=self, **new_data)
228228

sde_collections/models/delta_patterns.py

Lines changed: 112 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -333,16 +333,28 @@ def unapply(self) -> None:
333333
affected_deltas = self.delta_urls.all()
334334
affected_curated = self.curated_urls.all()
335335

336+
# Get all other patterns of same type for this collection
337+
pattern_class = self.__class__
338+
other_patterns = pattern_class.objects.filter(collection=self.collection).exclude(id=self.id)
339+
336340
# Process each affected delta URL
337341
for delta in affected_deltas:
338342
curated = CuratedUrl.objects.filter(collection=self.collection, url=delta.url).first()
339343

340-
if not curated:
341-
# Scenario 1: Delta only - new URL
342-
setattr(delta, field, None)
344+
# Find next most specific matching pattern if any
345+
matching_patterns = [p for p in other_patterns if re.search(p.get_regex_pattern(), delta.url)]
346+
347+
next_pattern = None
348+
if matching_patterns:
349+
# Sort by number of URLs matched (ascending) to find most specific
350+
next_pattern = min(matching_patterns, key=lambda p: p.get_url_match_count())
351+
352+
if next_pattern:
353+
# Apply next most specific pattern's value
354+
setattr(delta, field, next_pattern.get_new_value())
343355
delta.save()
344-
else:
345-
# Scenario 2: Both exist
356+
elif curated:
357+
# No other patterns match, revert to curated value
346358
setattr(delta, field, getattr(curated, field))
347359
delta.save()
348360

@@ -354,17 +366,36 @@ def unapply(self) -> None:
354366
)
355367
if fields_match:
356368
delta.delete()
369+
else:
370+
# No curated URL or other patterns, set to None
371+
setattr(delta, field, None)
372+
delta.save()
357373

358374
# Handle curated URLs that don't have deltas
359375
for curated in affected_curated:
360376
if not DeltaUrl.objects.filter(url=curated.url).exists():
361-
# Scenario 3: Curated only
362-
# Copy all fields from curated except the one we're nulling
363-
fields = {
364-
f.name: getattr(curated, f.name) for f in curated._meta.fields if f.name not in ["id", "collection"]
365-
}
366-
fields[field] = None # Set the pattern's field to None
367-
delta = DeltaUrl.objects.create(collection=self.collection, **fields)
377+
# Find any matching patterns
378+
matching_patterns = [p for p in other_patterns if re.search(p.get_regex_pattern(), curated.url)]
379+
380+
if matching_patterns:
381+
# Apply most specific pattern's value
382+
next_pattern = min(matching_patterns, key=lambda p: p.get_url_match_count())
383+
fields = {
384+
f.name: getattr(curated, f.name)
385+
for f in curated._meta.fields
386+
if f.name not in ["id", "collection"]
387+
}
388+
fields[field] = next_pattern.get_new_value()
389+
DeltaUrl.objects.create(collection=self.collection, **fields)
390+
else:
391+
# No other patterns, create delta with None
392+
fields = {
393+
f.name: getattr(curated, f.name)
394+
for f in curated._meta.fields
395+
if f.name not in ["id", "collection"]
396+
}
397+
fields[field] = None
398+
DeltaUrl.objects.create(collection=self.collection, **fields)
368399

369400
# Clear pattern relationships
370401
self.delta_urls.clear()
@@ -523,11 +554,12 @@ def apply(self) -> None:
523554

524555
def unapply(self) -> None:
525556
"""
526-
Remove title modifications:
527-
1. Create Delta URLs for affected Curated URLs to explicitly clear titles
528-
2. Remove generated titles from affected Delta URLs
529-
3. Clean up Delta URLs that become identical to their Curated URL
530-
4. Clear resolution tracking
557+
Remove title modifications, maintaining pattern precedence:
558+
1. Find any remaining patterns that match each URL
559+
2. Apply most specific matching pattern's title if one exists
560+
3. Otherwise revert to curated title or clear title
561+
4. Update title resolution tracking
562+
5. Clean up redundant deltas
531563
"""
532564
DeltaUrl = apps.get_model("sde_collections", "DeltaUrl")
533565
CuratedUrl = apps.get_model("sde_collections", "CuratedUrl")
@@ -538,16 +570,36 @@ def unapply(self) -> None:
538570
affected_deltas = self.delta_urls.all()
539571
affected_curated = self.curated_urls.all()
540572

573+
# Get all other title patterns for this collection
574+
other_patterns = DeltaTitlePattern.objects.filter(collection=self.collection).exclude(id=self.id)
575+
541576
# Process each affected delta URL
542577
for delta in affected_deltas:
543578
curated = CuratedUrl.objects.filter(collection=self.collection, url=delta.url).first()
544579

545-
if not curated:
546-
# Scenario 1: Delta only - clear generated title
547-
delta.generated_title = ""
548-
delta.save()
549-
else:
550-
# Scenario 2: Both exist - revert to curated title
580+
# Find next most specific matching pattern if any
581+
matching_patterns = [p for p in other_patterns if re.search(p.get_regex_pattern(), delta.url)]
582+
583+
next_pattern = None
584+
if matching_patterns:
585+
# Sort by number of URLs matched (ascending) to find most specific
586+
next_pattern = min(matching_patterns, key=lambda p: p.get_url_match_count())
587+
588+
if next_pattern:
589+
# Apply next most specific pattern's title
590+
new_title, error = next_pattern.generate_title_for_url(delta)
591+
if error:
592+
DeltaResolvedTitleError.objects.update_or_create(
593+
delta_url=delta, defaults={"title_pattern": next_pattern, "error_string": error}
594+
)
595+
else:
596+
delta.generated_title = new_title
597+
delta.save()
598+
DeltaResolvedTitle.objects.update_or_create(
599+
delta_url=delta, defaults={"title_pattern": next_pattern, "resolved_title": new_title}
600+
)
601+
elif curated:
602+
# No other patterns match, revert to curated title
551603
delta.generated_title = curated.generated_title
552604
delta.save()
553605

@@ -559,18 +611,47 @@ def unapply(self) -> None:
559611
)
560612
if fields_match:
561613
delta.delete()
614+
else:
615+
# No curated URL or other patterns, clear title
616+
delta.generated_title = ""
617+
delta.save()
562618

563619
# Handle curated URLs that don't have deltas
564620
for curated in affected_curated:
565621
if not DeltaUrl.objects.filter(url=curated.url).exists():
566-
# Scenario 3: Curated only - create delta with cleared title
567-
fields = {
568-
f.name: getattr(curated, f.name) for f in curated._meta.fields if f.name not in ["id", "collection"]
569-
}
570-
fields["generated_title"] = ""
571-
DeltaUrl.objects.create(collection=self.collection, **fields)
572-
573-
# Clear resolution tracking
622+
# Find any matching patterns
623+
matching_patterns = [p for p in other_patterns if re.search(p.get_regex_pattern(), curated.url)]
624+
625+
if matching_patterns:
626+
# Apply most specific pattern's title
627+
next_pattern = min(matching_patterns, key=lambda p: p.get_url_match_count())
628+
629+
# Copy all fields from curated
630+
fields = {
631+
f.name: getattr(curated, f.name)
632+
for f in curated._meta.fields
633+
if f.name not in ["id", "collection"]
634+
}
635+
636+
# Generate and apply new title
637+
new_title, error = next_pattern.generate_title_for_url(curated)
638+
if not error:
639+
fields["generated_title"] = new_title
640+
delta = DeltaUrl.objects.create(collection=self.collection, **fields)
641+
DeltaResolvedTitle.objects.create(
642+
title_pattern=next_pattern, delta_url=delta, resolved_title=new_title
643+
)
644+
else:
645+
# No other patterns, create delta with cleared title
646+
fields = {
647+
f.name: getattr(curated, f.name)
648+
for f in curated._meta.fields
649+
if f.name not in ["id", "collection"]
650+
}
651+
fields["generated_title"] = ""
652+
DeltaUrl.objects.create(collection=self.collection, **fields)
653+
654+
# Clear resolution tracking for this pattern
574655
DeltaResolvedTitle.objects.filter(title_pattern=self).delete()
575656
DeltaResolvedTitleError.objects.filter(title_pattern=self).delete()
576657

0 commit comments

Comments
 (0)