docs: add business context markdown cells after key notebook outputs

adityonugrohoid · claude · adityonugrohoid · commit 431e3d0d34ba · 2026-02-25T06:30:16.000+07:00
Insert 14 markdown cells interpreting MOS distributions, app-specific
sensitivity, SHAP dependence plots, and QoE band analysis with SLA context.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/notebooks/04_qoe_prediction.ipynb b/notebooks/04_qoe_prediction.ipynb
@@ -309,6 +309,12 @@
     "print(f\"Median MOS: {df[TARGET_COL].median():.2f}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "gd53b72mww8",
+   "source": "**Business Context:** The mean MOS of **3.87 +/- 0.70** places the average session in the \"Good\" category on the ITU-T scale, but with **25% of sessions below 3.45** (borderline \"Fair\"). This bottom quartile represents a significant opportunity: elevating these sessions to \"Good\" (>3.5) through targeted network optimization could measurably improve customer satisfaction and reduce churn. The standard deviation of 0.70 indicates substantial variability in user experience, suggesting that network conditions — not just application design — are a major driver of perceived quality.",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": 5,
@@ -569,6 +575,12 @@
     "    print(df[app_col].value_counts())"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "87qkuvr7rlu",
+   "source": "**Business Context:** Zero missing values and a reasonably balanced app distribution (browsing 30%, video 25%, social/VoIP/gaming ~15% each) confirm data quality for modeling. However, the **VoIP and gaming minorities** (~1,500 sessions each) are operationally significant because they represent the most latency-sensitive applications. These app types require stricter SLA thresholds than browsing or social media — a 50ms latency increase barely affects browsing but can make a VoIP call unintelligible. The model must maintain accuracy across all app types, not just the majority class.",
+   "metadata": {}
+  },
   {
    "cell_type": "markdown",
    "id": "4a3c64b9",
@@ -622,6 +634,12 @@
     "plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "mttpo756a4o",
+   "source": "**Business Context:** The boxplot reveals meaningful **MOS variation across application types**: gaming shows the lowest median MOS (~3.65) with the widest spread, indicating it is the most sensitive to network degradation. VoIP achieves the highest median (~3.99), likely because VoIP codecs are designed to be resilient within their operating range but fail sharply outside it. For network planning, this means **gaming and video streaming users should receive priority QoS treatment** during congestion events, as their perceived quality degrades faster than browsing or social media users under the same network conditions.",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": 8,
@@ -687,6 +705,12 @@
     "plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "x1kgt6d3e3o",
+   "source": "**Business Context:** Two key non-linear patterns emerge: **throughput shows diminishing returns above ~40 Mbps** — increasing bandwidth beyond this point yields minimal MOS improvement, meaning over-provisioning bandwidth is wasteful. In contrast, **latency has a roughly linear negative impact** on MOS throughout its range — every millisecond matters. This asymmetry has direct implications for network investment: latency reduction (e.g., edge computing, backhaul optimization) delivers consistent QoE improvement, while throughput upgrades beyond the saturation point yield diminishing returns.",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": 9,
@@ -747,6 +771,12 @@
     "print(mos_corr.round(4))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "9cyvsia9fs9",
+   "source": "**Business Context:** The correlation analysis confirms that **latency (-0.71) has a stronger impact on MOS than throughput (+0.68)**, making latency the optimization priority. Congestion level (-0.50) also shows strong negative correlation, validating that network load management directly impacts user experience. The implication for network optimization strategy is clear: **latency reduction should be Priority #1** for QoE improvement, followed by congestion management, then throughput enhancement. This prioritization should guide both real-time traffic management policies and long-term infrastructure investment decisions.",
+   "metadata": {}
+  },
   {
    "cell_type": "markdown",
    "id": "d59e8bf3",
@@ -1060,6 +1090,12 @@
     "df_features.head()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "k2guzycoarj",
+   "source": "**Business Context:** The 16 engineered features capture heterogeneous QoE drivers that raw KPIs miss: **app_sensitivity_score** encodes the known QoE sensitivity profile per application type (gaming is more latency-sensitive than browsing); **service_degradation** combines multiple degradation signals into a composite score; **spectral_efficiency** and **bandwidth_utilization** capture how effectively the network converts radio resources into throughput. These domain-engineered features bridge the gap between what the network measures (KPIs) and what users perceive (quality), enabling the model to learn the non-linear mapping from network state to user experience.",
+   "metadata": {}
+  },
   {
    "cell_type": "markdown",
    "id": "af568c0c",
@@ -1152,6 +1188,12 @@
     "print(f\"Target stats (test):  mean={y_test.mean():.3f}, std={y_test.std():.3f}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "llzadnywo5n",
+   "source": "**Business Context:** The 80/20 train/test split maintains consistent MOS distributions (mean ~3.87 in both sets), confirming no sampling bias. The random split is appropriate here because QoE sessions are independent — unlike time-series forecasting, there is no temporal leakage risk from individual session predictions. The similar standard deviations (0.709 train, 0.688 test) indicate that the model will encounter the full range of QoE conditions during evaluation, including the critical low-MOS sessions that drive SLA violations.",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": 14,
@@ -1211,6 +1253,12 @@
     "print(f\"Prediction range (test):  [{y_pred_test.min():.3f}, {y_pred_test.max():.3f}]\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "8i8top11dvh",
+   "source": "**Business Context:** The prediction range **[1.62, 4.79]** is narrower than the actual range [1.0, 5.0], demonstrating a conservative **regression-to-mean** effect common in tree-based models. The model slightly underestimates extreme high MOS (missing the 4.79-5.0 band) and overestimates extreme low MOS (missing the 1.0-1.62 band). For SLA applications, this means adding a **0.5 MOS safety margin** when setting QoE thresholds — if the model predicts MOS 3.0, the actual experience could be as low as 2.5. This conservative bias is preferable to over-optimistic predictions that would mask service degradation.",
+   "metadata": {}
+  },
   {
    "cell_type": "markdown",
    "id": "14752591",
@@ -1266,6 +1314,12 @@
     "print(f\"{'R-squared':<12} {r2_train:>12.4f} {r2_test:>12.4f}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "j3gynw3a38o",
+   "source": "**Business Context:** **RMSE 0.45 and R-squared 0.58** mean the model explains 58% of MOS variance with an average prediction error of less than half a MOS point. The remaining **42% unexplained variance** is expected for subjective quality scores — it reflects unobserved factors like user expectations, device quality, ambient noise (for VoIP), and content complexity (for video) that network metrics alone cannot capture. For practical deployment, a 0.45 RMSE means the model can reliably distinguish between \"Good\" (3.0-4.0) and \"Excellent\" (4.0+) sessions, which is sufficient for QoE-based traffic steering and SLA monitoring.",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": 17,
@@ -1325,6 +1379,12 @@
     "print(f\"Residual stats: mean={residuals.mean():.4f}, std={residuals.std():.4f}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "fkusrgnhvje",
+   "source": "**Business Context:** The predicted vs. actual scatter shows the model tracks well along the diagonal for mid-range MOS (3.0-4.5) but **underestimates extreme low and high MOS** values — visible as the scatter widening at the edges. The residual plot confirms symmetric errors centered at zero (no systematic bias). For SLA threshold setting, this means using a **0.5 MOS safety margin**: if the model predicts a session at MOS 3.0, set the operational alert at MOS 3.5 to account for prediction uncertainty. This approach ensures SLA violations are caught before they materialize rather than detected after the fact.",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": 18,
@@ -1440,6 +1500,12 @@
     "plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "zb4hv8gvml",
+   "source": "**Business Context:** The SHAP beeswarm confirms **latency as the dominant QoE driver** — high latency values (red dots) consistently push predictions toward lower MOS. Crucially, **service_degradation** (an engineered feature) ranks higher than several raw KPIs, validating that domain-specific feature engineering captures QoE dynamics better than raw telemetry. The bidirectional spread of SHAP values for throughput confirms the saturation effect seen in EDA: high throughput helps QoE, but only up to a point. Network optimization should prioritize latency reduction first, then address service degradation patterns identified by the composite metric.",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": 21,
@@ -1491,6 +1557,12 @@
     "print(f\"Top 3 features by mean |SHAP|: {top_feature_names}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "c12iigkbnja",
+   "source": "**Business Context:** The SHAP dependence plots reveal critical **non-linear relationships**: service_degradation shows a steep negative cliff around 0.3 — below this threshold, QoE is relatively stable, but above it, MOS drops sharply. Throughput exhibits clear **saturation above ~40 Mbps**, where additional bandwidth adds negligible QoE value. These non-linearities define actionable thresholds for network operations: maintain service_degradation below 0.3 and ensure minimum 40 Mbps throughput per session. Beyond these points, further investment in these specific KPIs yields diminishing returns and resources should be redirected to other optimization targets.",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": 22,
@@ -1535,6 +1607,12 @@
     "plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "ltewwijbs8m",
+   "source": "**Business Context:** **Service degradation ranks #1** in the SHAP importance bar chart, ahead of raw network metrics like latency and throughput. This is a key validation of domain feature engineering: the composite degradation metric — which combines multiple KPI signals into a single \"how degraded is this session\" score — captures QoE dynamics better than any individual raw telemetry measurement. For network monitoring dashboards, this means displaying the service_degradation index alongside traditional KPIs gives operators a more accurate view of user-perceived quality than raw metrics alone.",
+   "metadata": {}
+  },
   {
    "cell_type": "markdown",
    "id": "dcefe2bb",
@@ -1660,6 +1738,12 @@
     "    print(\"No application type column available for app-specific analysis.\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "o9xy9o9fhks",
+   "source": "**Business Context:** The QoE band analysis shows the **Excellent band (MOS >= 4.0, 46% of sessions) has the lowest MAE (0.34)**, meaning the model is most accurate for high-quality sessions. The **Poor band (MOS < 2.0) has the highest MAE (0.66)** but only 1% of sessions — prediction errors here matter less statistically but more operationally, as these sessions represent severe service failures. App-specific analysis reveals **gaming has the lowest mean MOS (3.65) and highest MAE (0.39)**, confirming it is both the most degraded and hardest-to-predict application type. Gaming sessions should receive lower SLA thresholds (e.g., MOS >= 3.5 instead of >= 4.0) and dedicated QoS policies to account for their heightened sensitivity to network conditions.",
+   "metadata": {}
+  },
   {
    "cell_type": "code",
    "execution_count": 25,
@@ -1784,4 +1868,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}