fix: invert speed percentile labels

bassosimone · bassosimone · commit 93280fffe869 · 2025-11-17T22:57:08.000+01:00
IQB needs to answer: "Can 95% of users perform this use case?" This requires checking if 95% of users meet ALL requirements. Example scenario ---------------- Gaming requirements: speed ≥ 10 Mbit/s AND latency ≤ 15 ms Two networks, with raw percentile data (before this commit): **Network Foo:** - p95 speed = 30 Mbit/s - p5 speed = 12 Mbit/s - p95 latency = 7 ms - p5 latency = 2 ms **Network Bar:** - p95 speed = 30 Mbit/s - p5 speed = 8 Mbit/s - p95 latency = 20 ms - p5 latency = 5 ms The percentile asymmetry problem --------------------------------- Percentile definitions (by convention): - pX = "X% of users have this value OR LESS" For Foo latency (p95 = 7ms, threshold ≤ 15ms): - 95% have ≤ 7ms - Since 7ms < 15ms threshold, we KNOW 95% pass ✓ DEFINITIVE For Bar latency (p95 = 20ms, threshold ≤ 15ms): - 95% have ≤ 20ms - But we DON'T know what % have ≤ 15ms - Could be 94% at 1ms + 1% at 18ms → 94% pass - Could be 10% at 1ms + 85% at 18ms → 10% pass - p95 alone is AMBIGUOUS ✗ For Foo speed (p95 = 30 Mbit/s, threshold ≥ 10 Mbit/s): - 95% have ≤ 30 Mbit/s (unhelpful direction!) - But we DON'T know what % have ≥ 10 Mbit/s - Could be 94% at 25 Mbit/s + 1% at 8 Mbit/s → 94% pass - Could be 50% at 25 Mbit/s + 45% at 8 Mbit/s → 50% pass - p95 alone is AMBIGUOUS ✗ For Foo speed (p5 = 12 Mbit/s, threshold ≥ 10 Mbit/s): - 5% have ≤ 12 Mbit/s, therefore 95% have > 12 Mbit/s - Since 12 Mbit/s > 10 Mbit/s threshold, we KNOW 95% pass ✓ DEFINITIVE For Bar speed (p5 = 8 Mbit/s, threshold ≥ 10 Mbit/s): - 95% have > 8 Mbit/s - But we DON'T know what % have ≥ 10 Mbit/s - p5 alone is AMBIGUOUS ✗ Pattern discovered ------------------ For "lower is better" metrics (latency, packet loss): - pX gives DEFINITIVE answer when: pX ≤ threshold - Useful percentile for IQB: p95 For "higher is better" metrics (speed): - pX gives DEFINITIVE answer when: p(100-X) ≥ threshold - Which is equivalent to: p5 ≥ threshold (for 95% coverage) - Useful percentile for IQB: p5 The asymmetry in code (before this commit): ```Python if latency_p95 <= threshold: latency_passes = True # Definitive else: latency_passes = None # Ambiguous, need more data if speed_p5 >= threshold: speed_passes = True # Definitive else: speed_passes = None # Ambiguous, need more data ``` This is error-prone: easy to accidentally use speed_p95 instead of speed_p5. Solution: Invert speed labels at data generation In BigQuery queries (data/query_*.sql), swap speed percentile labels: - OFFSET(5) raw → labeled as "download_p95" in JSON - OFFSET(95) raw → labeled as "download_p5" in JSON After inversion, "p95" uniformly means "performance that gives us a definitive answer when it meets the threshold": ```Python if speed_p95 >= threshold: # p95 is actually p5 raw (inverted) speed_passes = True else: speed_passes = None if latency_p95 <= threshold: # p95 is actually p95 raw (not inverted) latency_passes = True else: latency_passes = None ``` Why invert at data generation, not at runtime? 1. Availability guarantee: Never have to check "does p5 exist?" before attempting to answer the 95% coverage question 2. Single source of truth: Inversion logic lives in SQL queries only, not scattered across Python code 3. Self-documenting data: JSON files contain "quality percentiles" where pX always means "X% achieve this or better" 4. Simpler cache/calculator code: No metric-specific logic needed Trade-off: Speed percentile labels no longer match raw statistical definitions (p95 label contains p5 raw value). This is extensively documented in SQL comments and cache.py docstrings. See data/query_downloads.sql for detailed explanation and examples. Based on a comment by https://github.com/sermpezis/ inside a notes document. Hopefully I interepreted it correctly, otherwise TIL.
diff --git a/data/br_2024_10.json b/data/br_2024_10.json
@@ -10,48 +10,48 @@
   },
   "metrics": {
     "download_throughput_mbps": {
-      "p1": 0.15979623373499155,
-      "p5": 0.9501991252036766,
-      "p10": 3.101174869710966,
-      "p25": 15.0340700432778,
-      "p50": 51.9831305263177,
-      "p75": 158.38962702858973,
-      "p90": 330.3352983503099,
-      "p95": 456.0950392154999,
-      "p99": 696.5613392781584
+      "p1": 696.8102585656382,
+      "p5": 456.24926276472667,
+      "p10": 329.99833241434123,
+      "p25": 158.08221295858434,
+      "p50": 52.08522919032269,
+      "p75": 15.052793601948656,
+      "p90": 3.1272008078708624,
+      "p95": 0.9523541336337032,
+      "p99": 0.16179491817039293
     },
     "upload_throughput_mbps": {
-      "p1": 0.042563080079753776,
-      "p5": 0.07560071683921148,
-      "p10": 0.08980854096320207,
-      "p25": 5.545812099052701,
-      "p50": 30.78175191467136,
-      "p75": 88.37694460346944,
-      "p90": 181.64033113619195,
-      "p95": 255.97876412741525,
-      "p99": 394.3416893812533
+      "p1": 393.6290249806801,
+      "p5": 256.00644187498716,
+      "p10": 181.60570721295196,
+      "p25": 88.42259005024358,
+      "p50": 30.73281812980941,
+      "p75": 5.55669981856058,
+      "p90": 0.08981257546856133,
+      "p95": 0.07559917542134865,
+      "p99": 0.043266831359173155
     },
     "latency_ms": {
-      "p1": 1.394,
-      "p5": 3.637,
-      "p10": 4.958,
-      "p25": 9.079,
-      "p50": 19.953,
-      "p75": 52.065,
-      "p90": 184.738,
-      "p95": 234.072,
-      "p99": 273.0
+      "p1": 1.39,
+      "p5": 3.643,
+      "p10": 4.953,
+      "p25": 9.073,
+      "p50": 19.957,
+      "p75": 52.024,
+      "p90": 184.68,
+      "p95": 234.185,
+      "p99": 273.544
     },
     "packet_loss": {
       "p1": 0.0,
       "p5": 0.0,
       "p10": 0.0,
-      "p25": 1.1042755272820004e-05,
-      "p50": 0.004822712745559209,
-      "p75": 0.05811090765473097,
-      "p90": 0.13649207990035975,
-      "p95": 0.1987869577393624,
-      "p99": 0.3652163739953438
+      "p25": 1.0923089245161829e-05,
+      "p50": 0.0048178544016059515,
+      "p75": 0.058124757325470816,
+      "p90": 0.13651986085946857,
+      "p95": 0.1985210573594862,
+      "p99": 0.3680648144679889
     }
   }
 }
diff --git a/data/de_2024_10.json b/data/de_2024_10.json
@@ -10,48 +10,48 @@
   },
   "metrics": {
     "download_throughput_mbps": {
-      "p1": 0.22367850581560372,
-      "p5": 1.262769802856182,
-      "p10": 3.4166592054870026,
-      "p25": 13.817824595534129,
-      "p50": 45.24430302103892,
-      "p75": 100.56946051210859,
-      "p90": 248.78115747983244,
-      "p95": 377.8657642766346,
-      "p99": 741.7983223940372
+      "p1": 741.3863770285967,
+      "p5": 377.9433173862602,
+      "p10": 248.65806704804896,
+      "p25": 100.59657604456656,
+      "p50": 45.262074301346765,
+      "p75": 13.80200458802345,
+      "p90": 3.432561194292282,
+      "p95": 1.2581497389555467,
+      "p99": 0.22552302324036846
     },
     "upload_throughput_mbps": {
-      "p1": 0.04798033204768874,
-      "p5": 0.07565187888251705,
-      "p10": 0.19852741925194242,
-      "p25": 3.5715003423978087,
-      "p50": 17.172955392453527,
-      "p75": 36.63458526768415,
-      "p90": 53.192909502396375,
-      "p95": 101.34444079000329,
-      "p99": 285.7324202068485
+      "p1": 285.715497004709,
+      "p5": 101.84982169389747,
+      "p10": 53.243619429234855,
+      "p25": 36.62105866176215,
+      "p50": 17.1805215736349,
+      "p75": 3.556625227971489,
+      "p90": 0.19786786217149757,
+      "p95": 0.07565274320492381,
+      "p99": 0.04880458855925971
     },
     "latency_ms": {
-      "p1": 0.438,
-      "p5": 3.433,
-      "p10": 6.787,
+      "p1": 0.448,
+      "p5": 3.481,
+      "p10": 6.78,
       "p25": 11.589,
       "p50": 17.712,
-      "p75": 26.382,
-      "p90": 38.489,
-      "p95": 57.061,
-      "p99": 305.85
+      "p75": 26.381,
+      "p90": 38.464,
+      "p95": 57.313,
+      "p99": 304.595
     },
     "packet_loss": {
       "p1": 0.0,
       "p5": 0.0,
       "p10": 0.0,
       "p25": 0.0,
-      "p50": 0.00034573047467282084,
-      "p75": 0.016581558328885995,
-      "p90": 0.07073353719313655,
-      "p95": 0.11517449630011735,
-      "p99": 0.2521127443846117
+      "p50": 0.0003440108877366934,
+      "p75": 0.016605967886425984,
+      "p90": 0.07071063900421407,
+      "p95": 0.11531058751234509,
+      "p99": 0.25114064520339185
     }
   }
 }
diff --git a/data/query_downloads.sql b/data/query_downloads.sql
@@ -1,15 +1,47 @@
 SELECT
     client.Geo.CountryCode as country_code,
     COUNT(*) as sample_count,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(1)] as download_p1,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(5)] as download_p5,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(10)] as download_p10,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(25)] as download_p25,
+
+    -- ============================================================================
+    -- PERCENTILE LABELING CONVENTION FOR IQB QUALITY ASSESSMENT
+    -- ============================================================================
+    -- IQB Policy: p95 means "95% of users achieve this performance or better"
+    --
+    -- This allows threshold comparison: "Can 95% of users perform this use case?"
+    --
+    -- For "lower is better" metrics (latency, packet loss):
+    --   - Raw p95 = high value = "95% of users have ≤ 95ms latency"
+    --   - This directly answers: "Can 95% of users achieve ≤ threshold?"
+    --   - Label: OFFSET(95) → latency_p95 (no inversion needed) ✓
+    --
+    -- For "higher is better" metrics (throughput):
+    --   - Raw p95 = high value = "95% of users have ≤ 625 Mbit/s speed"
+    --   - But we need: "95% of users have ≥ X Mbit/s speed"
+    --   - Solution: Use p5 raw = "95% of users have ≥ 2.76 Mbit/s"
+    --   - Mathematical inversion: p(X)_quality = p(100-X)_raw
+    --   - Label: OFFSET(5) → download_p95 (inverted!) ✓
+    --
+    -- Example:
+    --   Requirement: ≥ 30 Mbit/s download, ≤ 200ms latency for 95% of users
+    --   Network A: download_p95 = 10 Mbit/s, latency_p95 = 100ms → FAIL (10 < 30)
+    --   Network B: download_p95 = 33 Mbit/s, latency_p95 = 50ms  → PASS
+    -- ============================================================================
+
+    -- Download throughput (higher is better - INVERTED LABELS!)
+    -- ⚠️  OFFSET(99) = top speed = worst of top 1% → labeled as p1
+    -- ⚠️  OFFSET(5) = 5th percentile raw = 95% have MORE → labeled as p95
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(99)] as download_p1,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(95)] as download_p5,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(90)] as download_p10,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(75)] as download_p25,
     APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(50)] as download_p50,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(75)] as download_p75,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(90)] as download_p90,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(95)] as download_p95,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(99)] as download_p99,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(25)] as download_p75,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(10)] as download_p90,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(5)] as download_p95,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(1)] as download_p99,
+
+    -- Latency/MinRTT (lower is better - no inversion)
+    -- Raw percentiles directly represent "X% of users have ≤ this latency"
     APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(1)] as latency_p1,
     APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(5)] as latency_p5,
     APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(10)] as latency_p10,
@@ -19,6 +51,9 @@ SELECT
     APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(90)] as latency_p90,
     APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(95)] as latency_p95,
     APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(99)] as latency_p99,
+
+    -- Packet Loss Rate (lower is better - no inversion)
+    -- Raw percentiles directly represent "X% of users have ≤ this loss rate"
     APPROX_QUANTILES(a.LossRate, 100)[OFFSET(1)] as loss_p1,
     APPROX_QUANTILES(a.LossRate, 100)[OFFSET(5)] as loss_p5,
     APPROX_QUANTILES(a.LossRate, 100)[OFFSET(10)] as loss_p10,
diff --git a/data/query_uploads.sql b/data/query_uploads.sql
@@ -1,15 +1,32 @@
 SELECT
     client.Geo.CountryCode as country_code,
     COUNT(*) as sample_count,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(1)] as upload_p1,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(5)] as upload_p5,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(10)] as upload_p10,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(25)] as upload_p25,
+
+    -- ============================================================================
+    -- PERCENTILE LABELING CONVENTION FOR IQB QUALITY ASSESSMENT
+    -- ============================================================================
+    -- IQB Policy: p95 means "95% of users achieve this performance or better"
+    --
+    -- Upload throughput is "higher is better", so:
+    --   - Raw p95 = "95% of users have ≤ X Mbit/s"
+    --   - But we need: "95% of users have ≥ Y Mbit/s"
+    --   - Solution: Use p5 raw (inverted labels)
+    --
+    -- See query_downloads.sql for detailed explanation and examples.
+    -- ============================================================================
+
+    -- Upload throughput (higher is better - INVERTED LABELS!)
+    -- ⚠️  OFFSET(99) = top speed = worst of top 1% → labeled as p1
+    -- ⚠️  OFFSET(5) = 5th percentile raw = 95% have MORE → labeled as p95
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(99)] as upload_p1,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(95)] as upload_p5,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(90)] as upload_p10,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(75)] as upload_p25,
     APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(50)] as upload_p50,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(75)] as upload_p75,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(90)] as upload_p90,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(95)] as upload_p95,
-    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(99)] as upload_p99
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(25)] as upload_p75,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(10)] as upload_p90,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(5)] as upload_p95,
+    APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(1)] as upload_p99
 FROM
     `measurement-lab.ndt.unified_uploads`
 WHERE
diff --git a/data/us_2024_10.json b/data/us_2024_10.json
@@ -10,48 +10,48 @@
   },
   "metrics": {
     "download_throughput_mbps": {
-      "p1": 0.37354810526833476,
-      "p5": 2.7494108827310177,
-      "p10": 7.6575433038007406,
-      "p25": 29.94873577502137,
-      "p50": 96.36533017831101,
-      "p75": 268.1810327939917,
-      "p90": 474.1768162996085,
-      "p95": 625.4494125653449,
-      "p99": 893.2782851912168
+      "p1": 891.8792927991478,
+      "p5": 625.7114881036548,
+      "p10": 474.3016262507926,
+      "p25": 268.42685523419607,
+      "p50": 96.34596771774324,
+      "p75": 29.912972037815766,
+      "p90": 7.635142044576455,
+      "p95": 2.7455472138124843,
+      "p99": 0.38098156744742
     },
     "upload_throughput_mbps": {
-      "p1": 0.06279911698366483,
-      "p5": 0.15105079102447938,
-      "p10": 1.0130561597157441,
-      "p25": 8.030055616329323,
-      "p50": 20.95814566696693,
-      "p75": 65.73945359925672,
-      "p90": 223.9767416770114,
-      "p95": 370.4336035390081,
-      "p99": 813.7319533731953
+      "p1": 816.2955641497589,
+      "p5": 369.1587515169648,
+      "p10": 224.4207438060576,
+      "p25": 65.7807279310557,
+      "p50": 20.964743814950857,
+      "p75": 8.038127970369711,
+      "p90": 1.0064079692417476,
+      "p95": 0.1521468780337755,
+      "p99": 0.06268772417401364
     },
     "latency_ms": {
-      "p1": 0.16,
-      "p5": 0.808,
-      "p10": 2.886,
-      "p25": 7.778,
-      "p50": 16.124,
-      "p75": 30.0,
-      "p90": 51.303,
-      "p95": 80.55,
-      "p99": 251.545
+      "p1": 0.159,
+      "p5": 0.802,
+      "p10": 2.894,
+      "p25": 7.783,
+      "p50": 16.128,
+      "p75": 30.002,
+      "p90": 51.178,
+      "p95": 80.743,
+      "p99": 252.507
     },
     "packet_loss": {
       "p1": 0.0,
       "p5": 0.0,
       "p10": 0.0,
       "p25": 0.0,
-      "p50": 0.000516724336793541,
-      "p75": 0.019090240380880846,
-      "p90": 0.07332944466732425,
-      "p95": 0.12018590164702943,
-      "p99": 0.253111989432024
+      "p50": 0.0005198544633386031,
+      "p75": 0.01908806003333987,
+      "p90": 0.07333410102683499,
+      "p95": 0.1205355721342832,
+      "p99": 0.2548062180643292
     }
   }
 }
diff --git a/library/src/iqb/cache.py b/library/src/iqb/cache.py
@@ -70,6 +70,29 @@ def get_data(
         Raises:
             FileNotFoundError: If requested data is not available in cache.
             ValueError: If requested percentile is not available in cached data.
+
+        ⚠️  PERCENTILE INTERPRETATION (CRITICAL!)
+        =========================================
+        IQB Policy: pX means "X% of users achieve this performance or better"
+
+        This enables threshold comparison: "Can 95% of users meet requirement?"
+
+        For "lower is better" metrics (latency, packet loss):
+          - Raw p95 = "95% of users have ≤ 80ms latency"
+          - Directly usable: latency_p95 ≤ threshold? ✓
+          - No inversion needed
+
+        For "higher is better" metrics (throughput):
+          - Raw p95 = "95% of users have ≤ 625 Mbit/s speed" ✗
+          - We need: "95% of users have ≥ X Mbit/s"
+          - Solution: Use p5 raw = "95% have more than this"
+          - Mathematical inversion: p(X)_quality = p(100-X)_raw
+          - Example: OFFSET(5) raw → labeled as "download_p95" in JSON
+
+        This inversion happens in BigQuery (see data/query_*.sql),
+        so this cache code treats all percentiles uniformly.
+        When you request percentile=95, you ALWAYS get
+        "performance that 95% of users achieve or better".
         """
         # Design Note
         # -----------
diff --git a/library/tests/iqb/cache_test.py b/library/tests/iqb/cache_test.py
@@ -104,9 +104,9 @@ def test_get_data_with_different_percentile(self, data_dir):
         assert "m-lab" in data_p50
         data_p50 = data_p50["m-lab"]
 
-        # p95 should be higher than p50 for throughput metrics
-        assert data_p95["download_throughput_mbps"] > data_p50["download_throughput_mbps"]
-        assert data_p95["upload_throughput_mbps"] > data_p50["upload_throughput_mbps"]
+        # p95 should be higher than p50 for throughput metrics (lower is worse)
+        assert data_p95["download_throughput_mbps"] < data_p50["download_throughput_mbps"]
+        assert data_p95["upload_throughput_mbps"] < data_p50["upload_throughput_mbps"]
 
         # p95 should be higher than p50 for latency (higher is worse)
         assert data_p95["latency_ms"] > data_p50["latency_ms"]