|
| 1 | +# Iteration 003: Download/Import System Separation |
| 2 | + |
| 3 | +**Status**: In Review |
| 4 | +**Started**: 2026-01-29 |
| 5 | +**Goal**: Separate downloading from importing into distinct subsystems with explicit state machines, error classification, and immutable audit trails |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Problem Statement |
| 10 | + |
| 11 | +The download system combined downloading and importing into a single monolithic flow with multiple issues: |
| 12 | + |
| 13 | +1. **Combined concerns** - `download_job` tracked both download states (created, downloading) AND import states (importing, import_failed, import_complete) in one entity |
| 14 | +2. **Single file assumption** - Schema assumed one file per download, but season packs have many episodes |
| 15 | +3. **No retry granularity** - If 1 of 10 episodes failed import, the entire job was marked failed |
| 16 | +4. **No reimport capability** - Can't retry a failed import without re-downloading |
| 17 | +5. **Implicit state transitions** - No validation that state changes were valid (e.g., "created" → "completed" directly) |
| 18 | +6. **No error classification** - All errors treated equally; couldn't distinguish permanent failures (invalid magnet) from transient ones (network timeout) |
| 19 | +7. **No audit trail** - No visibility into what happened during download/import lifecycle |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## Design Decisions |
| 24 | + |
| 25 | +### 1. Separate Download Jobs from Import Tasks |
| 26 | + |
| 27 | +**Before:** |
| 28 | +``` |
| 29 | +download_job |
| 30 | +├── id, status (8 states including import states) |
| 31 | +├── download fields (protocol, link, external_id, progress) |
| 32 | +├── import fields (dest_path, import_method) |
| 33 | +└── media references (media_item_id, season_id, episode_id) |
| 34 | +``` |
| 35 | + |
| 36 | +**After:** |
| 37 | +``` |
| 38 | +download_job (6 states, download only) |
| 39 | +├── id, status (created, enqueued, downloading, completed, failed, cancelled) |
| 40 | +├── download fields (protocol, link, external_id, progress, save_path, content_path) |
| 41 | +├── retry logic (attempt_count, next_run_at, last_error, error_category) |
| 42 | +└── media references |
| 43 | +
|
| 44 | +import_task (5 states, per-file import) |
| 45 | +├── id, status (pending, in_progress, completed, failed, cancelled) |
| 46 | +├── download_job_id (nullable FK) |
| 47 | +├── previous_task_id (reimport chain) |
| 48 | +├── source_path, dest_path |
| 49 | +├── retry logic (attempt_count, max_attempts, next_run_at, last_error, error_category) |
| 50 | +└── media references |
| 51 | +``` |
| 52 | + |
| 53 | +### 2. One Import Task Per File |
| 54 | + |
| 55 | +Season packs now spawn multiple `import_task` rows: |
| 56 | +- 10-episode pack → 10 import tasks |
| 57 | +- Each tracks independently: 8 succeed, 2 fail → `partial_failure` status |
| 58 | +- Individual reimport without re-downloading |
| 59 | + |
| 60 | +### 3. Explicit State Machines |
| 61 | + |
| 62 | +State transitions are validated before execution: |
| 63 | + |
| 64 | +```go |
| 65 | +var DownloadJobTransitions = map[Status][]Status{ |
| 66 | + "created": {"enqueued", "failed", "cancelled"}, |
| 67 | + "enqueued": {"downloading", "failed", "cancelled"}, |
| 68 | + "downloading": {"completed", "failed", "cancelled"}, |
| 69 | + "completed": {}, // terminal |
| 70 | + "failed": {}, // terminal |
| 71 | + "cancelled": {}, // terminal |
| 72 | +} |
| 73 | + |
| 74 | +var ImportTaskTransitions = map[Status][]Status{ |
| 75 | + "pending": {"in_progress", "cancelled"}, |
| 76 | + "in_progress": {"completed", "failed"}, |
| 77 | + "completed": {}, // terminal - reimport creates new task |
| 78 | + "failed": {}, // terminal - reimport creates new task |
| 79 | + "cancelled": {}, // terminal |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +### 4. Error Categories |
| 84 | + |
| 85 | +Errors are classified for intelligent retry behavior: |
| 86 | + |
| 87 | +```go |
| 88 | +type Category string |
| 89 | +const ( |
| 90 | + Transient Category = "transient" // retry: network timeout, 5xx |
| 91 | + Permanent Category = "permanent" // fail fast: 401/403, invalid magnet |
| 92 | +) |
| 93 | +``` |
| 94 | + |
| 95 | +Default: unknown errors → transient (allow retry) |
| 96 | + |
| 97 | +### 5. Event Tables for Audit Trail |
| 98 | + |
| 99 | +Every state change and error is logged: |
| 100 | + |
| 101 | +```sql |
| 102 | +download_job_event (created, status_changed, error, retry_scheduled) |
| 103 | +import_task_event (created, status_changed, error, retry_scheduled, reimport_requested) |
| 104 | +``` |
| 105 | + |
| 106 | +Combined timeline query aggregates both for unified history view. |
| 107 | + |
| 108 | +### 6. Computed Import Status |
| 109 | + |
| 110 | +`GetDownloadJobWithImportSummary` returns aggregate status: |
| 111 | +- `download_pending` - download not yet completed |
| 112 | +- `awaiting_import` - completed, no import tasks yet |
| 113 | +- `importing` - tasks in progress |
| 114 | +- `fully_imported` - all tasks completed |
| 115 | +- `partial_failure` - some completed, some failed |
| 116 | +- `import_failed` - all tasks failed |
| 117 | + |
| 118 | +### 7. Cancel Cascade |
| 119 | + |
| 120 | +Cancelling a download job automatically cancels all pending import tasks for that job. |
| 121 | + |
| 122 | +### 8. Reimport Chain |
| 123 | + |
| 124 | +Import tasks have `previous_task_id` FK: |
| 125 | +- Original import: `previous_task_id = NULL` |
| 126 | +- Reimport: `previous_task_id = <original task ID>` |
| 127 | +- Recursive CTE retrieves full chain history |
| 128 | + |
| 129 | +Reimport handling: if destination exists and this is a reimport, remove old file first. |
| 130 | + |
| 131 | +--- |
| 132 | + |
| 133 | +## Architecture |
| 134 | + |
| 135 | +``` |
| 136 | +┌─────────────────────────────────────────────────────────────┐ |
| 137 | +│ internal/jobs/download/worker.go │ |
| 138 | +│ │ |
| 139 | +│ Polls download clients, manages download_job lifecycle │ |
| 140 | +│ Config: 3s interval, 20 claim limit, 10 max attempts │ |
| 141 | +│ │ |
| 142 | +│ created → enqueue to client → enqueued │ |
| 143 | +│ enqueued/downloading → poll status → downloading/completed│ |
| 144 | +│ completed → spawn import_task(s) │ |
| 145 | +└─────────────────────────────────────────────────────────────┘ |
| 146 | + ↓ |
| 147 | +┌─────────────────────────────────────────────────────────────┐ |
| 148 | +│ internal/jobs/import/worker.go │ |
| 149 | +│ │ |
| 150 | +│ Processes import tasks: hardlink/copy to library │ |
| 151 | +│ Config: 2s interval, 10 claim limit, 5 max attempts │ |
| 152 | +│ │ |
| 153 | +│ pending → claim (atomic) → in_progress │ |
| 154 | +│ validate source → compute dest → hardlink/copy → completed│ |
| 155 | +└─────────────────────────────────────────────────────────────┘ |
| 156 | +``` |
| 157 | + |
| 158 | +--- |
| 159 | + |
| 160 | +## Schema Changes |
| 161 | + |
| 162 | +### New Tables |
| 163 | + |
| 164 | +| Table | Purpose | |
| 165 | +|-------|---------| |
| 166 | +| `import_task` | Per-file import tracking with reimport chain | |
| 167 | +| `download_job_event` | Download job audit log | |
| 168 | +| `import_task_event` | Import task audit log | |
| 169 | + |
| 170 | +### Modified Tables |
| 171 | + |
| 172 | +| Table | Change | |
| 173 | +|-------|--------| |
| 174 | +| `download_job` | Simplified to 6 states, added error_category, removed import fields | |
| 175 | +| `media_file_import` | Added `import_task_id` FK | |
| 176 | + |
| 177 | +### Deleted Tables/Columns |
| 178 | + |
| 179 | +| Change | Reason | |
| 180 | +|--------|--------| |
| 181 | +| `download_job_media_file` | Replaced by `import_task` | |
| 182 | +| `download_job.import_*` columns | Moved to `import_task` | |
| 183 | + |
| 184 | +--- |
| 185 | + |
| 186 | +## New API Endpoints |
| 187 | + |
| 188 | +``` |
| 189 | +# Import Tasks |
| 190 | +GET /v1/import-tasks - List import tasks (paginated, filterable by status) |
| 191 | +GET /v1/import-tasks/counts - Get counts by status |
| 192 | +GET /v1/import-tasks/:id - Get import task with details |
| 193 | +GET /v1/import-tasks/:id/timeline - Get event log |
| 194 | +GET /v1/import-tasks/:id/history - Get reimport chain |
| 195 | +POST /v1/import-tasks/:id/reimport - Create new task for reimport |
| 196 | +POST /v1/import-tasks/:id/cancel - Cancel pending task |
| 197 | +
|
| 198 | +# Download Jobs (enhanced) |
| 199 | +GET /v1/download-jobs/:id - Now returns import summary |
| 200 | +GET /v1/download-jobs/:id/timeline - Combined download + import events |
| 201 | +GET /v1/download-jobs/:id/import-tasks - List import tasks for job |
| 202 | +``` |
| 203 | + |
| 204 | +--- |
| 205 | + |
| 206 | +## Files Created |
| 207 | + |
| 208 | +| File | Purpose | |
| 209 | +|------|---------| |
| 210 | +| `internal/errors/category.go` | Error categorization (transient/permanent) | |
| 211 | +| `internal/jobs/state/machine.go` | State machine definitions and validation | |
| 212 | +| `internal/jobs/download/worker.go` | Download worker (replaces old worker) | |
| 213 | +| `internal/jobs/import/worker.go` | Import worker | |
| 214 | +| `internal/service/import_tasks.go` | Import tasks service | |
| 215 | +| `internal/http/handlers/import_tasks.go` | Import tasks HTTP handler | |
| 216 | +| `internal/repo/import_tasks.go` | Import tasks repository | |
| 217 | +| `internal/db/queries/import_tasks.sql` | Import task SQLC queries | |
| 218 | +| `internal/db/queries/download_job_events.sql` | Download event SQLC queries | |
| 219 | +| `internal/db/queries/import_task_events.sql` | Import event SQLC queries | |
| 220 | + |
| 221 | +--- |
| 222 | + |
| 223 | +## Files Modified |
| 224 | + |
| 225 | +| File | Change | |
| 226 | +|------|--------| |
| 227 | +| `internal/db/migrations/0008_download_job.up.sql` | Complete rewrite with new schema | |
| 228 | +| `internal/db/queries/download_jobs.sql` | Simplified states, added timeline/summary queries | |
| 229 | +| `internal/service/download_jobs.go` | Added cancel cascade, timeline, import summary | |
| 230 | +| `internal/http/handlers/download_jobs.go` | Added timeline, import-tasks endpoints | |
| 231 | +| `internal/http/http.go` | Register import tasks handler | |
| 232 | +| `cmd/api/main.go` | Start both download and import workers | |
| 233 | + |
| 234 | +--- |
| 235 | + |
| 236 | +## Files Deleted |
| 237 | + |
| 238 | +| File | Reason | |
| 239 | +|------|--------| |
| 240 | +| `internal/jobs/downloadjobs/worker.go` | Replaced by separate download/import workers | |
| 241 | + |
| 242 | +--- |
| 243 | + |
| 244 | +## Worker Configuration |
| 245 | + |
| 246 | +| Worker | Poll Interval | Claim Limit | Max Attempts | |
| 247 | +|--------|---------------|-------------|--------------| |
| 248 | +| Download | 3s | 20 | 10 | |
| 249 | +| Import | 2s | 10 | 5 | |
| 250 | + |
| 251 | +Both use exponential backoff: 2^attempt seconds |
| 252 | + |
| 253 | +--- |
| 254 | + |
| 255 | +## Testing Checklist |
| 256 | + |
| 257 | +- [ ] Nuke DB, verify migrations run clean |
| 258 | +- [ ] `sqlc generate` succeeds |
| 259 | +- [ ] `go build ./...` succeeds |
| 260 | +- [ ] Search → enqueue download → verify `download_job` created with status `created` |
| 261 | +- [ ] Watch status progress: `created` → `enqueued` → `downloading` → `completed` |
| 262 | +- [ ] Verify `import_task`(s) spawned when download completes |
| 263 | +- [ ] Verify import progresses: `pending` → `in_progress` → `completed` |
| 264 | +- [ ] Check `GET /v1/download-jobs/:id` returns `import_status: fully_imported` |
| 265 | +- [ ] Test reimport: `POST /v1/import-tasks/:id/reimport` |
| 266 | +- [ ] Verify new task created with `previous_task_id` set |
| 267 | +- [ ] Test timeline: `GET /v1/download-jobs/:id/timeline` |
| 268 | +- [ ] Test cancel cascade: cancel download, verify pending imports cancelled |
| 269 | +- [ ] Test series pack: download season, verify multiple import tasks created |
| 270 | +- [ ] Simulate partial failure: verify `import_status: partial_failure` |
| 271 | +- [ ] Test permanent error (stop qBittorrent): verify job fails after retries |
| 272 | +- [ ] Test transient error: verify retry with exponential backoff |
| 273 | + |
| 274 | +--- |
| 275 | + |
| 276 | +## Known Issues / TODOs |
| 277 | + |
| 278 | +1. **qBittorrent error categorization not implemented** - Wrapper should mark 401/403 as permanent errors |
| 279 | +2. **No "created" event logged for spawned import tasks** - Download worker creates tasks but doesn't log events |
| 280 | +3. **ListDownloadJobs has no pagination** - Could be slow with many jobs |
| 281 | + |
| 282 | +--- |
| 283 | + |
| 284 | +## Future Work (Out of Scope) |
| 285 | + |
| 286 | +- Import task cleanup/retention policy (delete old completed tasks) |
| 287 | +- Frontend UI for import task management |
| 288 | +- Configurable worker limits via settings |
| 289 | +- Metrics/monitoring for worker health |
| 290 | +- Manual import (create import task without download job) |
0 commit comments