|
| 1 | +# DFaaS Event Streaming: Missing Events in Dashboard |
| 2 | + |
| 3 | +## Goal |
| 4 | +Make k6 and runner events appear in the TUI dashboard Log Stream and Event Status |
| 5 | +without requiring manual env var setup. |
| 6 | + |
| 7 | +## Expected Behavior |
| 8 | +- LocalRunner emits `LB_EVENT` lines while the workload runs. |
| 9 | +- These events are ingested by the controller pipeline and shown in the dashboard. |
| 10 | +- k6 log lines are emitted as `LB_EVENT` (type=log) and appear in the dashboard. |
| 11 | +- Polling loop should not drown out event logs. |
| 12 | + |
| 13 | +## Current Observations |
| 14 | +- Dashboard still does not show events (runner/k6), even after reducing polling noise. |
| 15 | +- Polling tasks are now summarized in-place as a single line ("Polling loop ..."). |
| 16 | +- k6 log stream works on the generator (k6.log exists and is updated). |
| 17 | + |
| 18 | +## Event Flow (How It Should Work) |
| 19 | +1) **LocalRunner (target)** runs `lb_runner.services.async_localrunner`. |
| 20 | +2) LocalRunner logs via `LBEventLogHandler` (default enabled), writing `LB_EVENT` lines to stdout. |
| 21 | +3) Async LocalRunner tee writes stdout to `lb_events.stream.log`. |
| 22 | +4) Ansible task `stream_events_step.yml` reads `lb_events.stream.log` and prints `LB_EVENT` lines to stdout. |
| 23 | +5) Controller pipeline parses those `LB_EVENT` lines and converts them to `RunEvent` objects. |
| 24 | +6) Dashboard receives `RunEvent` and displays a summary line in Log Stream. |
| 25 | + |
| 26 | +Additional k6 path: |
| 27 | +- `DfaasGenerator` calls `K6Runner`. |
| 28 | +- `K6Runner` streams k6 log and calls `_emit_k6_log_event`. |
| 29 | +- `_emit_k6_log_event` emits `LB_EVENT` log events using `StdoutEmitter`. |
| 30 | + |
| 31 | +## Key Files |
| 32 | +- `lb_controller/ansible/roles/workload_runner/tasks/run_single_rep.yml` |
| 33 | +- `lb_controller/ansible/roles/workload_runner/tasks/stream_events_step.yml` |
| 34 | +- `lb_runner/services/async_localrunner.py` |
| 35 | +- `lb_runner/engine/runner.py` (attaches `LBEventLogHandler`) |
| 36 | +- `lb_plugins/plugins/dfaas/generator.py` (emits k6 log events) |
| 37 | +- `lb_app/services/run_pipeline.py` (ingests LB_EVENT lines) |
| 38 | +- `lb_app/services/run_output.py` (formatter; skips raw LB_EVENT output) |
| 39 | +- `lb_ui/tui/system/components/dashboard.py` (Log Stream rendering) |
| 40 | + |
| 41 | +## Recent Behavior Changes (Relevant) |
| 42 | +- Event logging now defaults to ON. Only `LB_ENABLE_EVENT_LOGGING=0/false/no` |
| 43 | + disables it. |
| 44 | +- k6 log lines are now emitted as `LB_EVENT` by `DfaasGenerator`. |
| 45 | +- Polling loop logs are summarized into a single line in the dashboard. |
| 46 | + |
| 47 | +## Why Events Might Still Be Missing (Hypotheses) |
| 48 | +1) **Event ingestion never sees LB_EVENT lines** |
| 49 | + - `stream_events_step.yml` reads the wrong file or wrong host. |
| 50 | + - The stream file is empty or contains no `LB_EVENT` lines. |
| 51 | + - The stream file is written, but the polling task never prints them. |
| 52 | +2) **Event parsing is skipped** |
| 53 | + - Output line wrapping/escaping prevents `_extract_lb_event_data` from |
| 54 | + finding the token. |
| 55 | + - `parse_progress_line` rejects the payload (missing required fields). |
| 56 | +3) **Events are dropped by dedupe** |
| 57 | + - `_EventDedupe` uses `(host, workload, repetition, status, type, message)`. |
| 58 | + If the message is repeated exactly, events are dropped. |
| 59 | +4) **Dashboard refresh path not hit** |
| 60 | + - `make_output_tee` not wired for the dashboard session in the current run |
| 61 | + mode, or dashboard refresh throttled. |
| 62 | +5) **k6 events emitted but not associated to current repetition** |
| 63 | + - Incorrect `repetition` or `total_repetitions` values in emitted events can |
| 64 | + cause events to be ignored or not reflected in the journal. |
| 65 | + |
| 66 | +## Diagnostics Checklist |
| 67 | +Run in order and capture evidence: |
| 68 | + |
| 69 | +1) Verify the stream file exists and contains `LB_EVENT` lines on the target: |
| 70 | + - On target: `tail -n 50 /tmp/lb_events.stream.log` |
| 71 | +2) Verify the polling task is printing those lines: |
| 72 | + - Add temporary `debug` in `stream_events_step.yml` to print how many |
| 73 | + LB_EVENT lines were read in each iteration (or print the last line offset). |
| 74 | +3) Verify the controller sees `LB_EVENT` in its raw output log: |
| 75 | + - Inspect the controller log file (if enabled) for `LB_EVENT` markers. |
| 76 | +4) Verify parsing works with real lines: |
| 77 | + - Copy a raw line from the run output and feed it to `_extract_lb_event_data` |
| 78 | + and `parse_progress_line`. |
| 79 | +5) Verify the dashboard is in the event pipeline path: |
| 80 | + - Ensure the run is using the TUI (not headless) and that |
| 81 | + `pipeline_output_callback` is used. |
| 82 | + |
| 83 | +## Observed Symptom Patterns to Capture |
| 84 | +- `LB_EVENT` lines present in file but not in dashboard. |
| 85 | +- `LB_EVENT` lines missing entirely in file. |
| 86 | +- `LB_EVENT` lines present but missing required fields or malformed JSON. |
| 87 | + |
| 88 | +## Expected Minimum Signal |
| 89 | +During a normal DFaaS run, you should see at least: |
| 90 | +- Runner events: "running" and a final "done/failed" |
| 91 | +- k6 event lines: `k6[config_id] log stream started`, some stdout lines, and `log stream stopped` |
| 92 | + |
| 93 | +If you do not see these, the issue is likely at steps 1-3 in the event flow. |
| 94 | + |
| 95 | +## Next Suggested Experiments |
| 96 | +1) Add a one-time sentinel `LB_EVENT` print in `stream_events_step.yml` |
| 97 | + (after reading the file) to verify the pipeline can display events. |
| 98 | +2) Force a synthetic event from `async_localrunner` right after startup |
| 99 | + to confirm ingestion and dashboard rendering. |
| 100 | +3) Log the parsed events in `make_progress_handler` before dedupe. |
| 101 | + |
| 102 | +## ROOT CAUSE IDENTIFIED (2026-01-02) |
| 103 | + |
| 104 | +**Bug Location:** `lb_app/services/run_events.py` - `JsonEventTailer._run()` |
| 105 | + |
| 106 | +**Issue:** Python 3.13+ raises `OSError: telling position disabled by next() call` |
| 107 | +when calling `fp.tell()` after using the file iterator (`for line in fp`). |
| 108 | + |
| 109 | +**Original Code:** |
| 110 | +```python |
| 111 | +for line in fp: |
| 112 | + self._pos = fp.tell() # OSError on Python 3.13+ |
| 113 | +``` |
| 114 | + |
| 115 | +**Fixed Code:** |
| 116 | +```python |
| 117 | +while True: |
| 118 | + line = fp.readline() |
| 119 | + if not line: |
| 120 | + break |
| 121 | + self._pos = fp.tell() # Works correctly |
| 122 | +``` |
| 123 | + |
| 124 | +**Impact:** The `JsonEventTailer` was silently failing to read events from |
| 125 | +the callback plugin's JSONL output file (`lb_events.jsonl`), causing all |
| 126 | +events to be dropped before reaching the dashboard. |
| 127 | + |
| 128 | +**Fix Applied:** Changed from `for line in fp` iteration to explicit |
| 129 | +`fp.readline()` loop to allow `fp.tell()` to work correctly. |
| 130 | + |
| 131 | +## REFINEMENT: Polling Task Suppression (2026-01-03) |
| 132 | + |
| 133 | +**Issue:** After fixing the JsonEventTailer, events appeared in the dashboard |
| 134 | +but were drowned out by polling loop task timing lines (Poll LB_EVENT stream, |
| 135 | +Delay, Skip polling, etc.). |
| 136 | + |
| 137 | +**Cause:** `AnsibleOutputFormatter._should_suppress_task()` was not suppressing |
| 138 | +polling tasks when a dashboard log_sink was active. |
| 139 | + |
| 140 | +**Fix:** Modified `_should_suppress_task()` to always suppress polling loop |
| 141 | +tasks regardless of log_sink presence. Added `Initialize polling status` to |
| 142 | +the suppress list. |
| 143 | + |
| 144 | +## REFINEMENT: Skip Old Events (2026-01-03) |
| 145 | + |
| 146 | +**Issue:** Events from previous runs appeared at the start of the dashboard |
| 147 | +log because `JsonEventTailer` started reading from position 0 (beginning of |
| 148 | +the `lb_events.jsonl` file). |
| 149 | + |
| 150 | +**Fix:** Modified `JsonEventTailer.start()` to initialize `_pos` to the |
| 151 | +current file size, so only events written after the tailer starts are read. |
| 152 | + |
0 commit comments