Skip to content

Commit 70f1473

Browse files
committed
refactor: align percentile semantics across metrics
On page 19 of the [IQB report 2025]( https://www.measurementlab.net/publications/IQB_report_2025.pdf) we read: > IQB uses the 95th percentile of a given > dataset to evaluate a given metric. In this > context, the 95th percentile is the value > below which 95% of the observed > measurements fall, which effectively > captures the upper bound of a typical user > experience while excluding extreme > outliers. For example, to assess whether a > region meets the network tier’s packet loss > criteria for high-quality gaming, IQB > calculates the 95th percentile of packet > loss measurements collected from users in > that region. The value is then compared to > the predefined threshold.  The formula example (page 33) shows taking the 95p of metrics without considerations regarding polarity. However, to produce consistent results, we need to take the polarity into account when taking the 95p. Let us illustrate the polarity issue with an example. We assume that the following holds in a specific ISP: (i) the 95p of latency being 10 ms means that 95% of samples have 10 ms or less (ii) the 95p of download speed being 22 Mbit/s means that 95% of samples have 22 Mbit/s or less Let us also assume that we're evaluating online gaming and that online gaming typically needs: (a) download speed >= 20 Mbit/s (b) latency <= 15 ms The latency percentile allows us to say that "most samples" (95%) in the given ISP show a latency (10 ms or better) lower than the required latency (15 ms). The speed percentile allows us to say that "few samples" (5%) in the given ISP show a download speed (22 Mbit/s or better) greater than the required one (20 Mbit/s). So, is online gaming possible in the ISP? The answer seems nonconclusive because of the imbalance in the samples we are considering and the different polarity between latency (higher is worse) and speed (higher is better). On paper, a better solution is to say "okay, for the speed, instead, we consider the 5p". Now, let us assume: (iii) the 5p of download speed being 21 Mbit/s means that 5% of samples have 21 Mbit/s or less Based on this we can say that 95% of users have 21 Mbit/s or more. Now, it's possible to write a statement regarding the download speed concerning "most samples". This allows us to conclude "most samples indicate that users can play online with this ISP". How to translate this into the actual code? **Approach I**: modify `cache.py` so that, when we request `percentile=X` we actually take the complementary percentile for latency and loss (or for download and upload speed). This change aligns the polarity and allows us to answer questions using uniform sample sizes. It is basically equivalent to what we manually did above. **Approach II**: swap the labels for latency and loss (or for download speed and upload speed) when querying BigQuery. This means `cache.py` uniformly accesses `percentile=X` with the understanding that 2/4 of the labels are swapped. The second approach is more robust because it guarantees that, if `percentile=X` is there then also the complementary percentile is there. In both cases, people reading the code will need to be aware of the polarity anyway. Based on a discussion with @sermpezis, I am going to swap the labels for latency and packet loss. The actual swapping operation is anyway irrelevant (it's mostly a matter of convention) and what matters is that we're aligning the polarity. The meaning of the approach we are choosing is that 95p is the cutoff where 95% of users have worse performance for the definition of worse that is implied by the metric (e.g., lower speed or higher latency). Obviously, it also holds the opposite, that is, that 5% of users have better performance. In conclusion, the implemented change aligns the sample size so that the same percentile label picked up by `cache.py` allows to make comparable statements with respect to better/worse.
1 parent 014e0a5 commit 70f1473

File tree

8 files changed

+202
-128
lines changed

8 files changed

+202
-128
lines changed

data/br_2024_10.json

Lines changed: 35 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -10,48 +10,48 @@
1010
},
1111
"metrics": {
1212
"download_throughput_mbps": {
13-
"p1": 0.15979623373499155,
14-
"p5": 0.9501991252036766,
15-
"p10": 3.101174869710966,
16-
"p25": 15.0340700432778,
17-
"p50": 51.9831305263177,
18-
"p75": 158.38962702858973,
19-
"p90": 330.3352983503099,
20-
"p95": 456.0950392154999,
21-
"p99": 696.5613392781584
13+
"p1": 0.1607952665485899,
14+
"p5": 0.9527639351896139,
15+
"p10": 3.0995791948048126,
16+
"p25": 15.027442481667213,
17+
"p50": 52.05003588762424,
18+
"p75": 157.9133050283035,
19+
"p90": 330.56865817664186,
20+
"p95": 456.85148065886335,
21+
"p99": 699.4150527600466
2222
},
2323
"upload_throughput_mbps": {
24-
"p1": 0.042563080079753776,
25-
"p5": 0.07560071683921148,
26-
"p10": 0.08980854096320207,
27-
"p25": 5.545812099052701,
28-
"p50": 30.78175191467136,
29-
"p75": 88.37694460346944,
30-
"p90": 181.64033113619195,
31-
"p95": 255.97876412741525,
32-
"p99": 394.3416893812533
24+
"p1": 0.04271425174361129,
25+
"p5": 0.07560308380999751,
26+
"p10": 0.08981165041087474,
27+
"p25": 5.53270523375538,
28+
"p50": 30.774797230694343,
29+
"p75": 88.4152374160525,
30+
"p90": 181.77498611764298,
31+
"p95": 255.68482970928804,
32+
"p99": 393.05831520691316
3333
},
3434
"latency_ms": {
35-
"p1": 1.394,
36-
"p5": 3.637,
37-
"p10": 4.958,
38-
"p25": 9.079,
35+
"p1": 274.874,
36+
"p5": 234.07,
37+
"p10": 184.463,
38+
"p25": 52.107,
3939
"p50": 19.953,
40-
"p75": 52.065,
41-
"p90": 184.738,
42-
"p95": 234.072,
43-
"p99": 273.0
40+
"p75": 9.072,
41+
"p90": 4.958,
42+
"p95": 3.64,
43+
"p99": 1.394
4444
},
4545
"packet_loss": {
46-
"p1": 0.0,
47-
"p5": 0.0,
48-
"p10": 0.0,
49-
"p25": 1.1042755272820004e-05,
50-
"p50": 0.004822712745559209,
51-
"p75": 0.05811090765473097,
52-
"p90": 0.13649207990035975,
53-
"p95": 0.1987869577393624,
54-
"p99": 0.3652163739953438
46+
"p1": 0.3683998584485473,
47+
"p5": 0.1989762612035137,
48+
"p10": 0.13678163876238772,
49+
"p25": 0.05818004802054541,
50+
"p50": 0.004794188738463603,
51+
"p75": 1.099611491732316e-05,
52+
"p90": 0.0,
53+
"p95": 0.0,
54+
"p99": 0.0
5555
}
5656
}
5757
}

data/de_2024_10.json

Lines changed: 36 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -10,48 +10,48 @@
1010
},
1111
"metrics": {
1212
"download_throughput_mbps": {
13-
"p1": 0.22367850581560372,
14-
"p5": 1.262769802856182,
15-
"p10": 3.4166592054870026,
16-
"p25": 13.817824595534129,
17-
"p50": 45.24430302103892,
18-
"p75": 100.56946051210859,
19-
"p90": 248.78115747983244,
20-
"p95": 377.8657642766346,
21-
"p99": 741.7983223940372
13+
"p1": 0.2225810589129801,
14+
"p5": 1.2528819154630235,
15+
"p10": 3.4209413897406646,
16+
"p25": 13.810432910425352,
17+
"p50": 45.27405796172478,
18+
"p75": 100.56551588414163,
19+
"p90": 248.79192117938703,
20+
"p95": 377.46883271114353,
21+
"p99": 741.7160739730066
2222
},
2323
"upload_throughput_mbps": {
24-
"p1": 0.04798033204768874,
25-
"p5": 0.07565187888251705,
26-
"p10": 0.19852741925194242,
27-
"p25": 3.5715003423978087,
28-
"p50": 17.172955392453527,
29-
"p75": 36.63458526768415,
30-
"p90": 53.192909502396375,
31-
"p95": 101.34444079000329,
32-
"p99": 285.7324202068485
24+
"p1": 0.04888864801257374,
25+
"p5": 0.07565371155122488,
26+
"p10": 0.20144847741476402,
27+
"p25": 3.571516158290839,
28+
"p50": 17.180642660658165,
29+
"p75": 36.60604286113131,
30+
"p90": 53.23036640523673,
31+
"p95": 101.60189503128285,
32+
"p99": 285.6942608280348
3333
},
3434
"latency_ms": {
35-
"p1": 0.438,
36-
"p5": 3.433,
37-
"p10": 6.787,
38-
"p25": 11.589,
39-
"p50": 17.712,
40-
"p75": 26.382,
41-
"p90": 38.489,
42-
"p95": 57.061,
43-
"p99": 305.85
35+
"p1": 304.0,
36+
"p5": 57.103,
37+
"p10": 38.461,
38+
"p25": 26.383,
39+
"p50": 17.711,
40+
"p75": 11.597,
41+
"p90": 6.791,
42+
"p95": 3.482,
43+
"p99": 0.438
4444
},
4545
"packet_loss": {
46-
"p1": 0.0,
47-
"p5": 0.0,
48-
"p10": 0.0,
49-
"p25": 0.0,
50-
"p50": 0.00034573047467282084,
51-
"p75": 0.016581558328885995,
52-
"p90": 0.07073353719313655,
53-
"p95": 0.11517449630011735,
54-
"p99": 0.2521127443846117
46+
"p1": 0.2521662550269682,
47+
"p5": 0.11532512559748044,
48+
"p10": 0.07065396520089343,
49+
"p25": 0.016604288718428745,
50+
"p50": 0.00034525252746869345,
51+
"p75": 0.0,
52+
"p90": 0.0,
53+
"p95": 0.0,
54+
"p99": 0.0
5555
}
5656
}
5757
}

data/query_downloads.sql

Lines changed: 51 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,33 @@
11
SELECT
22
client.Geo.CountryCode as country_code,
33
COUNT(*) as sample_count,
4+
5+
-- ============================================================================
6+
-- PERCENTILE LABELING CONVENTION FOR IQB QUALITY ASSESSMENT
7+
-- ============================================================================
8+
--
9+
-- For "higher is better" metrics (throughput):
10+
-- - Raw p95 = "95% of users have ≤ X Mbit/s"
11+
-- - Label: OFFSET(95) → download_p95 (standard statistical definition)
12+
-- - Interpretation: top ~5% of users have > p95 throughput
13+
--
14+
-- For "lower is better" metrics (latency, packet loss):
15+
-- - Raw p95 = "95% of users have ≤ X ms latency" (worst-case typical)
16+
-- - We want p95 to represent best-case typical (to match throughput semantics)
17+
-- - Solution: Invert labels - use raw p5 labeled as p95
18+
-- - Label: OFFSET(5) → latency_p95 (inverted!)
19+
-- - Interpretation: top ~5% of users (best latency) have < p95
20+
--
21+
-- Result: Uniform comparison logic where p95 always means "typical best
22+
-- performance" rather than "typical worst performance"
23+
--
24+
-- NOTE: This creates semantics where checking p95 thresholds asks
25+
-- "Can the top ~5% of users perform this use case?" - empirical validation
26+
-- against real data will determine if this interpretation is appropriate.
27+
-- ============================================================================
28+
29+
-- Download throughput (higher is better - NO INVERSION)
30+
-- Standard percentile labels matching statistical definition
431
APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(1)] as download_p1,
532
APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(5)] as download_p5,
633
APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(10)] as download_p10,
@@ -10,24 +37,32 @@ SELECT
1037
APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(90)] as download_p90,
1138
APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(95)] as download_p95,
1239
APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(99)] as download_p99,
13-
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(1)] as latency_p1,
14-
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(5)] as latency_p5,
15-
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(10)] as latency_p10,
16-
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(25)] as latency_p25,
40+
41+
-- Latency/MinRTT (lower is better - INVERTED LABELS!)
42+
-- ⚠️ OFFSET(99) = worst latency = top 1% worst users → labeled as p1
43+
-- ⚠️ OFFSET(5) = 5th percentile = best ~5% of users → labeled as p95
44+
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(99)] as latency_p1,
45+
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(95)] as latency_p5,
46+
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(90)] as latency_p10,
47+
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(75)] as latency_p25,
1748
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(50)] as latency_p50,
18-
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(75)] as latency_p75,
19-
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(90)] as latency_p90,
20-
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(95)] as latency_p95,
21-
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(99)] as latency_p99,
22-
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(1)] as loss_p1,
23-
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(5)] as loss_p5,
24-
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(10)] as loss_p10,
25-
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(25)] as loss_p25,
49+
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(25)] as latency_p75,
50+
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(10)] as latency_p90,
51+
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(5)] as latency_p95,
52+
APPROX_QUANTILES(a.MinRTT, 100)[OFFSET(1)] as latency_p99,
53+
54+
-- Packet Loss Rate (lower is better - INVERTED LABELS!)
55+
-- ⚠️ OFFSET(99) = worst loss = top 1% worst users → labeled as p1
56+
-- ⚠️ OFFSET(5) = 5th percentile = best ~5% of users → labeled as p95
57+
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(99)] as loss_p1,
58+
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(95)] as loss_p5,
59+
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(90)] as loss_p10,
60+
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(75)] as loss_p25,
2661
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(50)] as loss_p50,
27-
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(75)] as loss_p75,
28-
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(90)] as loss_p90,
29-
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(95)] as loss_p95,
30-
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(99)] as loss_p99
62+
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(25)] as loss_p75,
63+
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(10)] as loss_p90,
64+
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(5)] as loss_p95,
65+
APPROX_QUANTILES(a.LossRate, 100)[OFFSET(1)] as loss_p99
3166
FROM
3267
`measurement-lab.ndt.unified_downloads`
3368
WHERE

data/query_uploads.sql

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,19 @@
11
SELECT
22
client.Geo.CountryCode as country_code,
33
COUNT(*) as sample_count,
4+
5+
-- ============================================================================
6+
-- PERCENTILE LABELING CONVENTION FOR IQB QUALITY ASSESSMENT
7+
-- ============================================================================
8+
--
9+
-- Upload throughput is "higher is better", so we use standard percentile
10+
-- labels (no inversion).
11+
--
12+
-- See query_downloads.sql for detailed explanation and rationale.
13+
-- ============================================================================
14+
15+
-- Upload throughput (higher is better - NO INVERSION)
16+
-- Standard percentile labels matching statistical definition
417
APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(1)] as upload_p1,
518
APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(5)] as upload_p5,
619
APPROX_QUANTILES(a.MeanThroughputMbps, 100)[OFFSET(10)] as upload_p10,

data/us_2024_10.json

Lines changed: 36 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -10,48 +10,48 @@
1010
},
1111
"metrics": {
1212
"download_throughput_mbps": {
13-
"p1": 0.37354810526833476,
14-
"p5": 2.7494108827310177,
15-
"p10": 7.6575433038007406,
16-
"p25": 29.94873577502137,
17-
"p50": 96.36533017831101,
18-
"p75": 268.1810327939917,
19-
"p90": 474.1768162996085,
20-
"p95": 625.4494125653449,
21-
"p99": 893.2782851912168
13+
"p1": 0.3812327371493097,
14+
"p5": 2.7476349376135296,
15+
"p10": 7.638638876969803,
16+
"p25": 29.89884264277766,
17+
"p50": 96.2643838797045,
18+
"p75": 268.4085597033553,
19+
"p90": 474.14118932768235,
20+
"p95": 626.3132682965055,
21+
"p99": 893.0568504937463
2222
},
2323
"upload_throughput_mbps": {
24-
"p1": 0.06279911698366483,
25-
"p5": 0.15105079102447938,
26-
"p10": 1.0130561597157441,
27-
"p25": 8.030055616329323,
28-
"p50": 20.95814566696693,
29-
"p75": 65.73945359925672,
30-
"p90": 223.9767416770114,
31-
"p95": 370.4336035390081,
32-
"p99": 813.7319533731953
24+
"p1": 0.06257042412674083,
25+
"p5": 0.15144845324010167,
26+
"p10": 0.9992760254839029,
27+
"p25": 8.03213984894271,
28+
"p50": 20.98046809727222,
29+
"p75": 65.669501568909,
30+
"p90": 224.29692902729298,
31+
"p95": 368.91185081459395,
32+
"p99": 819.839280930373
3333
},
3434
"latency_ms": {
35-
"p1": 0.16,
36-
"p5": 0.808,
37-
"p10": 2.886,
38-
"p25": 7.778,
39-
"p50": 16.124,
40-
"p75": 30.0,
41-
"p90": 51.303,
42-
"p95": 80.55,
43-
"p99": 251.545
35+
"p1": 255.993,
36+
"p5": 80.759,
37+
"p10": 51.2,
38+
"p25": 30.0,
39+
"p50": 16.119,
40+
"p75": 7.783,
41+
"p90": 2.895,
42+
"p95": 0.804,
43+
"p99": 0.161
4444
},
4545
"packet_loss": {
46-
"p1": 0.0,
47-
"p5": 0.0,
48-
"p10": 0.0,
49-
"p25": 0.0,
50-
"p50": 0.000516724336793541,
51-
"p75": 0.019090240380880846,
52-
"p90": 0.07332944466732425,
53-
"p95": 0.12018590164702943,
54-
"p99": 0.253111989432024
46+
"p1": 0.2517569864889713,
47+
"p5": 0.11998957375627733,
48+
"p10": 0.07340665854872248,
49+
"p25": 0.019064946948168876,
50+
"p50": 0.0005185937475477769,
51+
"p75": 0.0,
52+
"p90": 0.0,
53+
"p95": 0.0,
54+
"p99": 0.0
5555
}
5656
}
5757
}

library/src/iqb/cache.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,30 @@ def get_data(
7070
Raises:
7171
FileNotFoundError: If requested data is not available in cache.
7272
ValueError: If requested percentile is not available in cached data.
73+
74+
⚠️ PERCENTILE INTERPRETATION (CRITICAL!)
75+
=========================================
76+
77+
For "higher is better" metrics (throughput):
78+
- Raw p95 = "95% of users have ≤ 625 Mbit/s speed"
79+
- Directly usable: download_p95 ≥ threshold?
80+
- No inversion needed (standard statistical definition)
81+
82+
For "lower is better" metrics (latency, packet loss):
83+
- Raw p95 = "95% of users have ≤ 80ms latency" (worst-case typical)
84+
- We want p95 to represent best-case typical (to match throughput)
85+
- Solution: Use p5 raw labeled as p95
86+
- Mathematical inversion: p(X)_labeled = p(100-X)_raw
87+
- Example: OFFSET(5) raw → labeled as "latency_p95" in JSON
88+
89+
This inversion happens in BigQuery (see data/query_*.sql),
90+
so this cache code treats all percentiles uniformly.
91+
92+
When you request percentile=95, you get the 95th percentile value
93+
that can be compared uniformly against thresholds.
94+
95+
NOTE: This creates semantics where p95 represents "typical best
96+
performance" - empirical validation will determine if appropriate.
7397
"""
7498
# Design Note
7599
# -----------

0 commit comments

Comments
 (0)