|
| 1 | +# COSMOS Curation System Testing Guide |
| 2 | + |
| 3 | +## Resources |
| 4 | +There are 14 collections which have been reindexed on dev and can have their statuses changed to `REINDEXING_FINISHED` to test url importing. The collections and their counts can be seen [here](https://docs.google.com/spreadsheets/d/1mJFqZXdIyAN8LTuVQMLRDuNgzm7GIMlYb_cPUtLKSCM/edit?gid=1316450061#gid=1316450061 ). |
| 5 | + |
| 6 | +## Test Flow 1: Basic URL Collection Lifecycle |
| 7 | + |
| 8 | +### Objective |
| 9 | +Verify the complete lifecycle of a URL collection from initial creation through curation to production. |
| 10 | + |
| 11 | +### Prerequisites |
| 12 | +- Access to dev environment |
| 13 | +- Test collection created |
| 14 | +- Sample URLs ready for testing |
| 15 | + |
| 16 | +### Test Cases |
| 17 | + |
| 18 | +#### 1.1 Collection Status Progression |
| 19 | +1. Create new collection in `RESEARCH_IN_PROGRESS` status |
| 20 | +2. Verify initial scraper and indexer configs are created when moved to `READY_FOR_ENGINEERING` |
| 21 | +3. Progress through `ENGINEERING_IN_PROGRESS` to `INDEXING_FINISHED_ON_DEV` |
| 22 | +4. Confirm full text fetch triggers automatically |
| 23 | +5. Verify status updates to `READY_FOR_CURATION` |
| 24 | +6. Check plugin config creation |
| 25 | +7. Move through `CURATION_IN_PROGRESS` to `CURATED` |
| 26 | +8. Verify DeltaUrls promotion to CuratedUrls |
| 27 | +9. Test quality check status changes (`QUALITY_CHECK_PERFECT/MINOR`) |
| 28 | +10. Confirm collection appears in public query after PR merge |
| 29 | + |
| 30 | +#### 1.2 Data State Transitions |
| 31 | +1. Verify DumpUrls are created during indexing |
| 32 | +2. Test migration from DumpUrls to DeltaUrls |
| 33 | +3. Confirm field preservation during transitions |
| 34 | +4. Check promotion from DeltaUrls to CuratedUrls |
| 35 | +5. Verify all metadata transfers correctly |
| 36 | + |
| 37 | +Expected Results: |
| 38 | +- Each status transition triggers appropriate automated actions |
| 39 | +- Data integrity maintained through all transitions |
| 40 | +- Correct config generation at each stage |
| 41 | +- Proper public visibility after final approval |
| 42 | + |
| 43 | +## Test Flow 2: Pattern System Functionality |
| 44 | + |
| 45 | +### Objective |
| 46 | +Test the creation, application, and interaction of different pattern types. |
| 47 | + |
| 48 | +### Prerequisites |
| 49 | +- Collection with sample URLs |
| 50 | +- Mix of different URL types and structures |
| 51 | + |
| 52 | +### Test Cases |
| 53 | + |
| 54 | +#### 2.1 Include/Exclude Patterns |
| 55 | +1. Create exclude pattern for specific directory |
| 56 | + ```python |
| 57 | + pattern = "https://example.com/internal/*" |
| 58 | + ``` |
| 59 | +2. Create include pattern for specific file within excluded directory |
| 60 | + ```python |
| 61 | + pattern = "https://example.com/internal/public-doc.html" |
| 62 | + ``` |
| 63 | +3. Verify include pattern overrides exclude pattern |
| 64 | +4. Test wildcard pattern matching |
| 65 | +5. Check pattern precedence rules |
| 66 | + |
| 67 | +#### 2.2 Modification Patterns |
| 68 | +1. Create overlapping title patterns: |
| 69 | + ```python |
| 70 | + pattern1 = "*/docs/* → title='Documentation'" |
| 71 | + pattern2 = "*/docs/api/* → title='API Reference'" |
| 72 | + ``` |
| 73 | +2. Create division patterns with different specificity |
| 74 | +3. Test document type patterns with wildcards |
| 75 | +4. Verify "smallest set priority" resolution |
| 76 | +5. Check pattern application during migrations |
| 77 | + |
| 78 | +#### 2.3 Pattern Removal Scenarios |
| 79 | +1. Test removing pattern affecting only Delta URLs |
| 80 | +2. Remove pattern affecting Curated URLs |
| 81 | +3. Verify handling of multiple pattern effects |
| 82 | +4. Test manual change preservation |
| 83 | +5. Check cleanup procedures |
| 84 | + |
| 85 | +Expected Results: |
| 86 | +- Pattern precedence rules correctly applied |
| 87 | +- Proper handling of overlapping patterns |
| 88 | +- Manual changes preserved during pattern operations |
| 89 | +- Correct reversal of pattern effects on removal |
| 90 | + |
| 91 | +## Test Flow 3: Reindexing Workflow |
| 92 | + |
| 93 | +### Objective |
| 94 | +Verify the reindexing process and status management. |
| 95 | + |
| 96 | +### Prerequisites |
| 97 | +- Existing collection in production |
| 98 | +- Access to both dev and prod environments |
| 99 | + |
| 100 | +### Test Cases |
| 101 | + |
| 102 | +#### 3.1 Reindexing Status Progression |
| 103 | +1. Change status from `REINDEXING_NOT_NEEDED` to `REINDEXING_NEEDED_ON_DEV` |
| 104 | +2. Complete reindexing and update to `REINDEXING_FINISHED_ON_DEV` |
| 105 | +3. Verify automatic full text fetch |
| 106 | +4. Confirm status update to `REINDEXING_READY_FOR_CURATION` |
| 107 | +5. Progress through `REINDEXING_CURATED` |
| 108 | +6. Final update to `REINDEXING_INDEXED_ON_PROD` |
| 109 | + |
| 110 | +#### 3.2 Data Handling During Reindex |
| 111 | +1. Verify existing DumpUrls are cleared |
| 112 | +2. Check new full text data processing |
| 113 | +3. Test DumpUrl to DeltaUrl migration |
| 114 | +4. Verify pattern reapplication |
| 115 | +5. Confirm CuratedUrl updates |
| 116 | + |
| 117 | +Expected Results: |
| 118 | +- Proper status progression through reindexing |
| 119 | +- Data integrity maintained |
| 120 | +- Patterns correctly reapplied |
| 121 | +- Existing customizations preserved |
| 122 | + |
| 123 | +## Edge Cases and Stress Testing |
| 124 | + |
| 125 | +### URL Pattern Edge Cases |
| 126 | +1. Test URLs with/without trailing slashes |
| 127 | +2. Verify handling of overlapping wildcards |
| 128 | +3. Check pattern resolution with equal URL count matches |
| 129 | +4. Test maximum pattern chain depth |
| 130 | +5. Verify handling of malformed URLs |
| 131 | + |
| 132 | +### Status Transition Edge Cases |
| 133 | +1. Test interrupted transitions |
| 134 | +2. Verify handling of failed automated actions |
| 135 | +3. Check concurrent status updates |
| 136 | +4. Test invalid status progressions |
| 137 | +5. Verify recovery procedures |
| 138 | + |
| 139 | +### Data Volume Testing |
| 140 | +1. Test with large number of URLs (>100k) |
| 141 | +2. Check pattern application performance |
| 142 | +3. Verify migration speed with large datasets |
| 143 | +4. Test memory usage during bulk operations |
| 144 | +5. Check system response under heavy concurrent access |
| 145 | + |
| 146 | +## Common Issues to Watch For |
| 147 | + |
| 148 | +1. Pattern Precedence |
| 149 | + - Multiple patterns affecting same URL |
| 150 | + - Include/exclude pattern conflicts |
| 151 | + - Resolution of equal-specificity patterns |
| 152 | + |
| 153 | +2. Data Integrity |
| 154 | + - Field preservation during transitions |
| 155 | + - Manual change retention |
| 156 | + - Pattern effect tracking |
| 157 | + |
| 158 | +3. Performance |
| 159 | + - Large collection handling |
| 160 | + - Multiple pattern application |
| 161 | + - Status transition timing |
| 162 | + |
| 163 | +4. Status Management |
| 164 | + - Automated trigger reliability |
| 165 | + - Status update race conditions |
| 166 | + - Recovery from failed transitions |
0 commit comments