|
| 1 | + |
| 2 | +## Technical Reference |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | +### Time Measurements |
| 7 | +- All `time` values in the worker_analysis.csv are measured in **seconds** |
| 8 | + |
| 9 | +### Data Quality Scoring Algorithm |
| 10 | +Base score: 100, with penalties applied: |
| 11 | +- Invalid CSV: -30 points |
| 12 | +- Unsorted data: -10 points |
| 13 | +- Unsafe headers: -5 points per unsafe header (max -25) |
| 14 | +- Failed normalization: -20 points |
| 15 | +- Failed analysis: -25 points |
| 16 | +- UTF-8 encoding: +5 points |
| 17 | +- `>1000 records: +5 points` |
| 18 | + |
| 19 | +### Performance Anomaly Detection |
| 20 | +- Uses statistical analysis (mean + 2 standard deviations) |
| 21 | +- Identifies jobs with processing times significantly above normal |
| 22 | +- Requires minimum 3 successful jobs for analysis |
| 23 | + |
| 24 | + |
| 25 | +### CSV Output Schema |
| 26 | + |
| 27 | +#### Primary Fields |
| 28 | +| Column | Type | Description | |
| 29 | +|--------|------|-------------| |
| 30 | +| `timestamp` | String | Job start timestamp (YYYY-MM-DD HH:MM:SS) | |
| 31 | +| `job_id` | String | UUID of the processing job | |
| 32 | +| `file_name` | String | Name of the processed file | |
| 33 | +| `status` | String | SUCCESS, ERROR, or INCOMPLETE | |
| 34 | +| `qsv_version` | String | Version of QSV tool used | |
| 35 | +| `file_format` | String | Detected file format (CSV, XLSX, etc.) | |
| 36 | +| `encoding` | String | File character encoding | |
| 37 | +| `normalized` | String | "Successful" or "Failed" | |
| 38 | +| `valid_csv` | String | "TRUE" or "FALSE" | |
| 39 | +| `sorted` | String | "TRUE", "FALSE", or "UNKNOWN" | |
| 40 | +| `db_safe_headers` | String | Header safety status | |
| 41 | +| `analysis` | String | "Successful" or "Failed" | |
| 42 | +| `records` | Integer | Number of records detected | |
| 43 | + |
| 44 | +#### Timing Fields (all in seconds) |
| 45 | +| Column | Type | Description | |
| 46 | +|--------|------|-------------| |
| 47 | +| `total_time` | Float | Total processing time | |
| 48 | +| `download_time` | Float | File download time | |
| 49 | +| `analysis_time` | Float | Analysis phase time | |
| 50 | +| `copying_time` | Float | Database copy time | |
| 51 | +| `indexing_time` | Float | Index creation time | |
| 52 | +| `formulae_time` | Float | Formula processing time | |
| 53 | +| `metadata_time` | Float | Metadata update time | |
0 commit comments