Skip to content

Commit 7b8bc6f

Browse files
Merge pull request #247 from a5dur/main
Add Technical Reference readme file for test workflow
2 parents 421780e + 8a1e9a9 commit 7b8bc6f

File tree

1 file changed

+53
-0
lines changed

1 file changed

+53
-0
lines changed

tests/README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
2+
## Technical Reference
3+
4+
5+
6+
### Time Measurements
7+
- All `time` values in the worker_analysis.csv are measured in **seconds**
8+
9+
### Data Quality Scoring Algorithm
10+
Base score: 100, with penalties applied:
11+
- Invalid CSV: -30 points
12+
- Unsorted data: -10 points
13+
- Unsafe headers: -5 points per unsafe header (max -25)
14+
- Failed normalization: -20 points
15+
- Failed analysis: -25 points
16+
- UTF-8 encoding: +5 points
17+
- `>1000 records: +5 points`
18+
19+
### Performance Anomaly Detection
20+
- Uses statistical analysis (mean + 2 standard deviations)
21+
- Identifies jobs with processing times significantly above normal
22+
- Requires minimum 3 successful jobs for analysis
23+
24+
25+
### CSV Output Schema
26+
27+
#### Primary Fields
28+
| Column | Type | Description |
29+
|--------|------|-------------|
30+
| `timestamp` | String | Job start timestamp (YYYY-MM-DD HH:MM:SS) |
31+
| `job_id` | String | UUID of the processing job |
32+
| `file_name` | String | Name of the processed file |
33+
| `status` | String | SUCCESS, ERROR, or INCOMPLETE |
34+
| `qsv_version` | String | Version of QSV tool used |
35+
| `file_format` | String | Detected file format (CSV, XLSX, etc.) |
36+
| `encoding` | String | File character encoding |
37+
| `normalized` | String | "Successful" or "Failed" |
38+
| `valid_csv` | String | "TRUE" or "FALSE" |
39+
| `sorted` | String | "TRUE", "FALSE", or "UNKNOWN" |
40+
| `db_safe_headers` | String | Header safety status |
41+
| `analysis` | String | "Successful" or "Failed" |
42+
| `records` | Integer | Number of records detected |
43+
44+
#### Timing Fields (all in seconds)
45+
| Column | Type | Description |
46+
|--------|------|-------------|
47+
| `total_time` | Float | Total processing time |
48+
| `download_time` | Float | File download time |
49+
| `analysis_time` | Float | Analysis phase time |
50+
| `copying_time` | Float | Database copy time |
51+
| `indexing_time` | Float | Index creation time |
52+
| `formulae_time` | Float | Formula processing time |
53+
| `metadata_time` | Float | Metadata update time |

0 commit comments

Comments
 (0)