|
| 1 | +# High-level architecture |
| 2 | + |
| 3 | +OpenDT runs two concurrent threads that communicate via Kafka: |
| 4 | + |
| 5 | +## 1. Producer Loop (Workload Replay) |
| 6 | + |
| 7 | +The producer streams workload traces to Kafka, simulating real-time datacenter activity. |
| 8 | + |
| 9 | +### Data Sources |
| 10 | +- **tasks.parquet** (7,850 tasks) |
| 11 | + - Columns: `id`, `submission_time`, `duration`, `cpu_count`, `cpu_capacity`, `mem_capacity` |
| 12 | + - Tasks have timestamps indicating when they were submitted |
| 13 | + |
| 14 | +- **fragments.parquet** (2.3M fragments) |
| 15 | + - Columns: `id` (task_id), `duration`, `cpu_count`, `cpu_usage` |
| 16 | + - Fragments have NO submission times in raw data |
| 17 | + - Each task is composed of ≥1 fragments for fine-grained resource modeling |
| 18 | + |
| 19 | +The SURF workload trace captures approximately **7 days of wall clock time**. |
| 20 | + |
| 21 | +### Key Transformations |
| 22 | + |
| 23 | +**Fragment Timestamp Synthesis:** |
| 24 | +Fragments inherit their parent task's submission time, offset by cumulative fragment durations. The synthesized timestamp is stored in the `submission_time` key of each fragment message: |
| 25 | + |
| 26 | +``` |
| 27 | +Task starts at T=0 |
| 28 | +Fragment 1.submission_time = T=0 + duration[0] |
| 29 | +Fragment 2.submission_time = T=0 + duration[0] + duration[1] |
| 30 | +Fragment 3.submission_time = T=0 + duration[0] + duration[1] + duration[2] |
| 31 | +... |
| 32 | +``` |
| 33 | + |
| 34 | +This creates a sequential execution timeline where fragments "chain" together. |
| 35 | + |
| 36 | +**Time-Scaled Replay:** |
| 37 | +- Original trace: ~7 days of workload data |
| 38 | +- Replay speed: 10x accelerated (TIME_SCALE = 0.1) |
| 39 | +- Real-time gaps between events are preserved but compressed |
| 40 | +- Example: 1-hour workload gap → 6-minute real wait |
| 41 | + |
| 42 | +### Workload Semantics |
| 43 | + |
| 44 | +**Task-Fragment Relationship:** |
| 45 | +In the AtLarge traces, tasks are _always_ equally split into fragments. For example, we can have a task with `duration=10000ms` split into 5 fragments of `duration=2000ms`. |
| 46 | + |
| 47 | +**Computing Clock Cycles:** |
| 48 | +Fragments specify the total number of clock cycles needed, calculated as: |
| 49 | +``` |
| 50 | +total_cycles = duration (ms) × cpu_usage (MHz) × 1,000 |
| 51 | +``` |
| 52 | +Since MHz = million cycles/second and duration is in milliseconds. |
| 53 | + |
| 54 | +**Simulation Behavior:** |
| 55 | +The `duration` field represents the actual measured runtime of the original workload. However, when simulating on different topologies, the execution time varies based on CPU specifications. |
| 56 | + |
| 57 | +**Example:** |
| 58 | +A fragment with `cpu_count=1`, `cpu_usage=1000 MHz`, `duration=5000 ms`: |
| 59 | +- Total cycles needed: `5000 × 1000 × 1,000 = 5,000,000,000 cycles` (5 billion) |
| 60 | + |
| 61 | +When simulated on different hardware: |
| 62 | +- **2 CPUs @ 1000 MHz each:** Total throughput = 2,000 MHz → `duration = 5,000,000,000 / (2000 × 1,000,000) = 2500 ms` |
| 63 | +- **1 CPU @ 5000 MHz:** Total throughput = 5,000 MHz → `duration = 5,000,000,000 / (5000 × 1,000,000) = 1000 ms` |
| 64 | + |
| 65 | +This is a simplification of reality but makes reasoning about the simulation behavior clearer. |
| 66 | + |
| 67 | +### Output to Kafka |
| 68 | + |
| 69 | +**Topic: "tasks"** |
| 70 | +```json |
| 71 | +{ |
| 72 | + "key": {"id": "task_id"}, |
| 73 | + "value": { |
| 74 | + "submission_time": "2022-10-06T22:00:00Z", |
| 75 | + "duration": 27935000, |
| 76 | + "cpu_count": 16, |
| 77 | + "cpu_capacity": 33600.0, |
| 78 | + "mem_capacity": 100000 |
| 79 | + } |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +**Topic: "fragments"** |
| 84 | +```json |
| 85 | +{ |
| 86 | + "key": {"id": "task_id"}, |
| 87 | + "value": { |
| 88 | + "submission_time": "2022-10-06T22:00:30Z", |
| 89 | + "duration": 30000, |
| 90 | + "cpu_usage": 1953.0 |
| 91 | + } |
| 92 | +} |
| 93 | +``` |
| 94 | + |
| 95 | +### Implementation |
| 96 | +- Two parallel threads: one for tasks, one for fragments |
| 97 | +- Both synchronized via barrier to start simultaneously |
| 98 | +- Pacing via `sleep()` between messages to maintain temporal structure |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +## 2. Consumer Loop (Digital Twin) |
| 103 | + |
| 104 | +_[To be documented]_ |
0 commit comments