Skip to content

Commit 5f21ace

Browse files
committed
write a draft testing guide
1 parent 56c61f6 commit 5f21ace

File tree

1 file changed

+166
-0
lines changed

1 file changed

+166
-0
lines changed
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# COSMOS Curation System Testing Guide
2+
3+
## Resources
4+
There are 14 collections which have been reindexed on dev and can have their statuses changed to `REINDEXING_FINISHED` to test url importing. The collections and their counts can be seen [here](https://docs.google.com/spreadsheets/d/1mJFqZXdIyAN8LTuVQMLRDuNgzm7GIMlYb_cPUtLKSCM/edit?gid=1316450061#gid=1316450061 ).
5+
6+
## Test Flow 1: Basic URL Collection Lifecycle
7+
8+
### Objective
9+
Verify the complete lifecycle of a URL collection from initial creation through curation to production.
10+
11+
### Prerequisites
12+
- Access to dev environment
13+
- Test collection created
14+
- Sample URLs ready for testing
15+
16+
### Test Cases
17+
18+
#### 1.1 Collection Status Progression
19+
1. Create new collection in `RESEARCH_IN_PROGRESS` status
20+
2. Verify initial scraper and indexer configs are created when moved to `READY_FOR_ENGINEERING`
21+
3. Progress through `ENGINEERING_IN_PROGRESS` to `INDEXING_FINISHED_ON_DEV`
22+
4. Confirm full text fetch triggers automatically
23+
5. Verify status updates to `READY_FOR_CURATION`
24+
6. Check plugin config creation
25+
7. Move through `CURATION_IN_PROGRESS` to `CURATED`
26+
8. Verify DeltaUrls promotion to CuratedUrls
27+
9. Test quality check status changes (`QUALITY_CHECK_PERFECT/MINOR`)
28+
10. Confirm collection appears in public query after PR merge
29+
30+
#### 1.2 Data State Transitions
31+
1. Verify DumpUrls are created during indexing
32+
2. Test migration from DumpUrls to DeltaUrls
33+
3. Confirm field preservation during transitions
34+
4. Check promotion from DeltaUrls to CuratedUrls
35+
5. Verify all metadata transfers correctly
36+
37+
Expected Results:
38+
- Each status transition triggers appropriate automated actions
39+
- Data integrity maintained through all transitions
40+
- Correct config generation at each stage
41+
- Proper public visibility after final approval
42+
43+
## Test Flow 2: Pattern System Functionality
44+
45+
### Objective
46+
Test the creation, application, and interaction of different pattern types.
47+
48+
### Prerequisites
49+
- Collection with sample URLs
50+
- Mix of different URL types and structures
51+
52+
### Test Cases
53+
54+
#### 2.1 Include/Exclude Patterns
55+
1. Create exclude pattern for specific directory
56+
```python
57+
pattern = "https://example.com/internal/*"
58+
```
59+
2. Create include pattern for specific file within excluded directory
60+
```python
61+
pattern = "https://example.com/internal/public-doc.html"
62+
```
63+
3. Verify include pattern overrides exclude pattern
64+
4. Test wildcard pattern matching
65+
5. Check pattern precedence rules
66+
67+
#### 2.2 Modification Patterns
68+
1. Create overlapping title patterns:
69+
```python
70+
pattern1 = "*/docs/* → title='Documentation'"
71+
pattern2 = "*/docs/api/* → title='API Reference'"
72+
```
73+
2. Create division patterns with different specificity
74+
3. Test document type patterns with wildcards
75+
4. Verify "smallest set priority" resolution
76+
5. Check pattern application during migrations
77+
78+
#### 2.3 Pattern Removal Scenarios
79+
1. Test removing pattern affecting only Delta URLs
80+
2. Remove pattern affecting Curated URLs
81+
3. Verify handling of multiple pattern effects
82+
4. Test manual change preservation
83+
5. Check cleanup procedures
84+
85+
Expected Results:
86+
- Pattern precedence rules correctly applied
87+
- Proper handling of overlapping patterns
88+
- Manual changes preserved during pattern operations
89+
- Correct reversal of pattern effects on removal
90+
91+
## Test Flow 3: Reindexing Workflow
92+
93+
### Objective
94+
Verify the reindexing process and status management.
95+
96+
### Prerequisites
97+
- Existing collection in production
98+
- Access to both dev and prod environments
99+
100+
### Test Cases
101+
102+
#### 3.1 Reindexing Status Progression
103+
1. Change status from `REINDEXING_NOT_NEEDED` to `REINDEXING_NEEDED_ON_DEV`
104+
2. Complete reindexing and update to `REINDEXING_FINISHED_ON_DEV`
105+
3. Verify automatic full text fetch
106+
4. Confirm status update to `REINDEXING_READY_FOR_CURATION`
107+
5. Progress through `REINDEXING_CURATED`
108+
6. Final update to `REINDEXING_INDEXED_ON_PROD`
109+
110+
#### 3.2 Data Handling During Reindex
111+
1. Verify existing DumpUrls are cleared
112+
2. Check new full text data processing
113+
3. Test DumpUrl to DeltaUrl migration
114+
4. Verify pattern reapplication
115+
5. Confirm CuratedUrl updates
116+
117+
Expected Results:
118+
- Proper status progression through reindexing
119+
- Data integrity maintained
120+
- Patterns correctly reapplied
121+
- Existing customizations preserved
122+
123+
## Edge Cases and Stress Testing
124+
125+
### URL Pattern Edge Cases
126+
1. Test URLs with/without trailing slashes
127+
2. Verify handling of overlapping wildcards
128+
3. Check pattern resolution with equal URL count matches
129+
4. Test maximum pattern chain depth
130+
5. Verify handling of malformed URLs
131+
132+
### Status Transition Edge Cases
133+
1. Test interrupted transitions
134+
2. Verify handling of failed automated actions
135+
3. Check concurrent status updates
136+
4. Test invalid status progressions
137+
5. Verify recovery procedures
138+
139+
### Data Volume Testing
140+
1. Test with large number of URLs (>100k)
141+
2. Check pattern application performance
142+
3. Verify migration speed with large datasets
143+
4. Test memory usage during bulk operations
144+
5. Check system response under heavy concurrent access
145+
146+
## Common Issues to Watch For
147+
148+
1. Pattern Precedence
149+
- Multiple patterns affecting same URL
150+
- Include/exclude pattern conflicts
151+
- Resolution of equal-specificity patterns
152+
153+
2. Data Integrity
154+
- Field preservation during transitions
155+
- Manual change retention
156+
- Pattern effect tracking
157+
158+
3. Performance
159+
- Large collection handling
160+
- Multiple pattern application
161+
- Status transition timing
162+
163+
4. Status Management
164+
- Automated trigger reliability
165+
- Status update race conditions
166+
- Recovery from failed transitions

0 commit comments

Comments
 (0)